A More Detailed Look at the Model Challenge for Atrocity Prevention
As noted here, USAID and Humanity United announced the successful conclusion of the Model Challenge, run on the TopCoder platform in November, 2013. This Challenge was the fifth and final competition of the Atrocity Prevention Tech Challenge, a joint effort of USAID and Humanity United to source innovative technology applications for the purpose of preventing mass violence against civilians. The Model Challenge asked problem solvers to create algorithmic models that can help forecast when and where mass atrocities are likely to occur.
Out of the 618 submissions from over 1,000 contestants, five algorithms received prizes, with one solution in particular cleanly taking first prize. You can find the code for those algorithms, and a link to the data for the competition, here.
One of the main goals of the contest was to assess whether forecasting atrocities is even possible. This contest showed that it is, though much work remains to be done. This post goes into a bit more detail about the contest, our rationale for running it and the results.
(Note: A special thanks must be given to the researchers at the Harvard-NASA Tournament Lab for their work on this contest and the NASA Center of Excellence for Collaborative Innovation for their assistance throughout the Tech Challenge. The Tournament Lab is a partnership between NASA and Harvard to solve cutting-edge algorithm problems using the crowd.)
Why run the Model Challenge?
With the Model Challenge, we set out to identify potential sociopolitical indicators of an impending atrocity and to build predictive algorithms using a defined set of data sourced from public datasets. USAID and Humanity United sponsored this challenge as a contribution to the existing community of public and private actors committed to improving our ability to predict atrocities in order to better inform effective prevention efforts. We hope that the forecasts generated by these algorithms and their further iterations will factor into the considerations made by governments and other actors on how to handle a particular situation.
The Model Challenge was truly an experiment in how crowdsourcing and open innovation could meaningfully contribute to data and atrocity prevention issues, but a successful and informative one. Given the increasing availability of precise and dynamic international data, USAID and Humanity United hoped to leverage the creative potential of coders to predict local-level mass violence. The ultimate “marathon” contest, which closed in September, built on a series of prior contests that asked contestants to source public data, as well as define and design the parameters of the marathon contest itself. While we never expected the models generated by this contest to be silver bullets that perfectly predict when and where violence would occur, our engagement with academic experts and actors in the field of atrocity prevention has made clear that this contest could allow the general public to make significant contributions to the world’s ability to understand when and where mass violence occurs.
The Contest and Results
The marathon contest challenged contestants with a broad challenge statement: “Given data about various sociopolitical activities around the world and information about past atrocities, develop an algorithm to predict where and when atrocities will happen in the near future (i.e. within the next 30 days).” We provided contestants with a discrete collection of data from the Political Instability Task Force’s public data on violent events and the Global Database of Events, Language and Tone (GDELT) for socio-political indicators. This data was masked with generic labels to prevent competitors from identifying the original datasets and ‘writing to the test,’ or creating models based on pre-existing knowledge of the data. Contestants then submitted their best attempts to predict the mass violence events in the test-data over the course of three weeks. Submissions were scored in near-real-time, so contestants were able to see the current top score compared to their own and tweak their submissions as often as they liked before the end of the contest.
As mentioned above, five winners from China, France, Germany and Hungary were selected based on their performance, with a final “Ideation” prize awarded to a theoretical submission from the U.S., which proposed viewing the world’s regions as a fragile network in which shifts in tension can predict risk.
The solutions demonstrated a breadth of creative thinking and utilized a range of different statistical methods in their analysis. All five possess different strengths, arguing for further work to build tools that combine their abilities.
The top solution utilized a machine-learning method known as Random Forests to predict whether a given location will suffer an instance of mass violence. It was relatively strong in predicting new breakouts of mass violence and atrocities in locations where there had not been much violence in the past. It also made the most use of the socio-political data found in the GDELT dataset. Interestingly, it was not as effective in predicting repeating events of mass violence over time as the other winners, which again suggests that the winning algorithms could complement each other in further development.
The other winning solutions used a mix of different models that also warrant further analysis. Their strengths were in predicting instances of atrocities in regions with recent violence, and they made less use of non-violence data for the prediction.
One implicit goal of the contest was to assess whether forecasting atrocities at all is even possible. One might assume that mass violence and atrocities are too random or heterogeneous to predict, or that sophisticated quantitative modeling might add little value. As an attempt to test these assumptions, we feel the contest was clearly a success. At the very least, we learned that the data used in the contest provide rich, potentially promising opportunities for forecasting, and that much more work should be done to explore this subject.
(Note: For a more detailed quantitative analysis, we’ve made public all of the winning models and a summary of comments/analysis from TopCoder and Harvard’s experts here. Please feel free to make use of the models and improve upon them if you’re so inclined! Moving forward, we’re very excited to build on the enthusiasm for humanitarian data challenges this contest generated.)
Maurice Kent is the Senior Prize and Challenge Analyst in USAID’s Office of Science and Technology. You can reach him at firstname.lastname@example.org.