Releasing the Forecast Data Stream
Editor’s note: Forecast is a social prediction market: an exchange where participants use points to trade on the likelihood of future events. Community members pose questions about the future, make predictions, and contribute to discussions leveraging their collective knowledge. Download the app to join the community.
Today, we’re announcing the release of the public Forecast data stream. Visit the Forecast Data Homepage to download the anonymized log of all transactions, questions, and reason metadata from the Forecast community. If you’re already a Forecaster, you can also download your personal log by signing in (and if you’re not a Forecaster, might we suggest that you download the app?).
Why are we doing this?
We believe that Forecast can improve our collective understanding of current events. We’re already seeing glimmers of this effect at a tiny scale. See, for example, my colleague Miles’s analysis of Forecasters’ realtime response to breaking political news.
Collective understanding is only possible if there is trust in the system that produced it. And trust, we believe, is a corollary of transparency. We hope you will find interesting insights in this already-public data set about the world of today and tomorrow--please get in touch if you do! We also hope that this log provides more clarity into how Forecast works and gives you, our community, one more lever to hold us accountable to its continued function.
What is included in the data stream?
The data stream includes three data sets, each of which updates every 6 hours to include the most recent data. Data sets are available in two versions: All (which contains anonymized activity logs for all users and is available to any visitor to Forecast’s website) and Yours (which includes only your own activity and is available only after logging in). When downloaded, the zip file contains both a CSV and a JSON version of the same data.
Transactions includes an entry for every BUY/SELL action (the core forecasting action), along with entries for every REWARD (points given to the reason writer when other users support their reason), SETTLEMENT (points given when a question closes) and REFUND (points returned when a question is removed from the market without settlement).
Answers includes an entry for each question-answer pair. For example this question, ‘When will the Associated Press publish a tweet calling the 2020 US presidential election?’, would have an entry for ‘Before 3 AM on November 4’, ‘November 4, at or after 3 AM’, and so on.)
Reasons includes the anonymized user ID of the reason writer (which can be associated back to transactions), the question it is associated to, the time of creation, and any sources cited.
Get the data
Click here to visit the Forecast Data Homepage. For more detail about how to use this data, check out Miles’s post, and his explainer about how he put it together.
All of the data provided in this data stream is already publicly available on our website. We hope making it available in this format makes analysis simpler and more fun. As always please get in touch with any feedback.
Thank you Forecast for opening up your data stream! I had fun putting together a few visualizations of the community activity: https://coda.io/@david/forecast-data-explorer.