AI Fermi Prize Round 1 winners
Round 1 of the AI Fermi Prize closed February 10. Three prizes have been awarded to commenters who substantially contributed to our mission of tracking and improving the state-of-the-art of forecasts on AI progress.
The top prize ($400) goes to DanielFilan, for his work on building and improving a Guesstimate model of Starcraft timelines.
When will an RL agent match the best human performance on Starcraft II with no domain-specific hardcoded knowledge, trained using no more than $10,000 of compute of public hardware?
DanielFilan factored the question on Jan 6, 2019 at 4:56am
My sketchy estimate: https://www.getguesstimate.com/models/12302
DanielFilan commented on Jan 24, 2019 at 6:27pm
There's a problem with this estimate that also is a problem with a variety of similar models, that I'd like to write about. In this estimate, you estimate how many more samples it takes to learn Starcraft (in the future) than Dota (today), then use Moore's law to determine when you can afford that many samples. In reality, the sample-efficiency of learning Starcraft will increase over time, as we get better and better algorithms. Ideally, an estimate would construct a curve of algorithmic improvement, a moore's law curve, and see when they intersect.
DanielFilan commented on Jan 26, 2019 at 8:43pm
[Note: in the process of making this model, I realised that in the guesstimate model, I underestimated by a factor of 10 how much CPU compute OpenAI Five needed. That error is now corrected. I also got rid of the modifications to estimate the cost of super-human performance, since the question is about human-level performance.]
Seems that there were 16 TPU3s per agent and ~600 agents, and you needed 7 days of training to beat TLO (which seems like the right benchmark?). That gives you 16x600x7x24 = 1.6x10^6 TPU3-hours. A TPU3-hour costs about $8.00 for the public. Maybe DeepMind gets a factor of 3 price discount. That gives you 1.6x10^6x$8/3 = $4.3x10^6 for the TPU3 compute. You're probably also using CPUs though: As Gwern reminds us, OpenAI 5 used 128,000 pre-emptible CPUs. Let's say AlphaStar used about that many (maybe the sketchiest assumption). Pre-emptible CPUs cost $7x10^(-3)/hr. In total, that gives you 7x24x1.28x10^5x7x10^(-3) = $1.5x10^5. If you assume a discount, that's just a rounding error on the TPU3 cost.
That gives a pretty big update on cost to train super-human starcraft in 2018. Giving those things error bars and plugging them through the guesstimate model, you now get a point estimate of 42 years from 2018 (i.e. 2060), with a 42% chance of it happening before 2050, and a 25th percentile guess of 2043.
But, of course, the biggest question is how much sample efficiency will improve. I don't currently have a great model for that.
DanielFilan commented on Jan 28, 2019 at 5:18am
Of course, cheaper-than-expected human-level starcraft means that algorithmic improvement is quicker than you expected, and as such the hole in this model is more important than you might previously thought.
The second prize is shared. davidmanheim wins $100 for providing comments across a wide range of questions, which manage to be at once brief and provide a reliably strong signal.
kokotajlod also wins $100 for his work on generating two new questions, as well as several careful comments on the importance of superhuman micro abilities in AlphaStar.
We were excited to see users collaborating to make progress through commenting over the last weeks, and also want to extend honourable mentions to AlexanderPolta (especially his comment about growth in labor productivity) and AlexRay (for writing a large number of comments capturing insights during the December workshop in San Francisco).
Round 2 is now on, and will prize comments written between February 10 until February 23, 23:59 GMT. You can find more examples of prize-worthy commenting here.