The AI Fermi Prize

We will be awarding bi-weekly prizes of $400 and $200 to the top two best commenters on the server, in the periods:

Round 1: Until February 9, 23:59 GMT, 2019 Round 1 winners can be found here.

Round 2: From February 10 until February 23, 23:59 GMT Round 2 winners can be found here.

Round 3: From February 24 until March 10, 23:59 GMT Round 3 winners can be found here.

Round 4: From March 10 until March 23, 23:59 GMT Round 4 winners can be found here.

After round 4, the AI Fermi Prize take a brief hiatus. We will likely be back with more prizes in the future, we're just thinking through what shape they should take.

A good commenter is someone who substantially contributes to our mission of tracking and improving the state-of-the-art of forecasts on AI progress.

This might take the shape of a single beacon of good work, or a longer seris of contributions.

Examples of prize-worthy contributions are (note, these have not been updated since the first round, and further examples can be found here)...

Public predicting

Regularly posting their predictions as comments and including an explanation of them (using the “post prediction as comment” feature that appears after you’ve predicted)

Will an AI system be able to play new levels of Angry Birds better than the best human players in the 2020 IJCAI Angry Birds AI competition?

20% davidmanheim made a prediction on Jan 16, 2019 at 8:34am

I'm sure it's possible for a medium-sized effort to do this, say 4-5 PhDs full time for 6 months with access to sufficient compute, but current teams are single people working on it as a side-project and undergrad(?) student groups - so it very probably won't happen. See: http://aibirds.org/angry-birds-ai-competition…

Will another Chinese city invest >=$10 billion of USD in AI by mid-2019?

10% alexanderpolta updated a prediction 2 days 14 hours ago

Other cities may be interested in something similar, but one has to imagine that Tianjin received either a nudge or explicit encouragement from Central Party authorities to make this push.

The Central Party may eventually establish other AI centers, but I think they'll wait to see how the first goes before rolling out others, and Mid-2019 isn't enough time to evaluate the success of their first initiative.

Will LeelaChessZero beat Stockfish in the Jan/Feb 2019 Top Chess Engine Championship?

15% Jacob made a prediction 2 days 2 hours ago

Here’s LeelaChessZero’s previous performance against Stockfish, in the format: Stockfish_wins-draws-Lc0_wins

[Sep 2018] CCC: 2-8-2

[Oct 2018] 0-0-1 (Lc0 handicapped)

[Oct 2018] TCEC Cup 1: 2-5-0

[Nov 2018] CCC Blitz: 15-81-4

[Jan 2019] TCEC: 1-5-0

Also, using some equivalent of a BayesElo score for chess engines (the details of which I haven’t gone into), Stockfish is ranked 1 and LCZero 83, with a difference of 931. Plugging these rankings into this calculator gives a win probability of <1% for Stockfish.

However, Lc0 is improving very quickly. Less than a year ago, it lost almost all its games in Division 4, the lowest TCEC division. This year, it dominated division 3,2 and 1, even setting records in some of them. Stockfish still dominated the premier division.

So, overall, I wouldn’t be that surprised if it beats Stockfish in a year from now, yet at this this time I can’t really find reasons to put more than 15% here (and even then a lot of that mass is due to model uncertainty).

Bringing new information into the forecast by quickly updating on relevant news

When will an RL agent match the best human performance on Starcraft II with no domain-specific hardcoded knowledge, trained using no more than $10,000 of compute of public hardware?

davidmanheim provided an information source 4 days 10 hours ago

Updating back towards "match human performance" isn't happening in the near future based on clear evidence that the most recent performance wasn't actually AI performance of human behaviors, it was unreasonable autmoation of click behaviors;

https://www.reddit.com/r/MachineLearning/comm…

Modelling

Generating new models (especially interactive ones using e.g. Guesstimate)

When will an RL agent match the best human performance on Starcraft II with no domain-specific hardcoded knowledge, trained using no more than $10,000 of compute of public hardware?

DanielFilan factored the question on Jan 6, 2019 at 4:56am

My sketchy estimate: https://www.getguesstimate.com/models/12302

Collecting other users’ models into ensemble models
Providing thoughtful analysis on the inputs, limitations and improvements of models

Distilling

Summarising key points made by others during hangouts or workshops (helping distill others opinions, even if you didn’t generate them yourselves, is still a very useful contribution!)

Will negative sentiment in media coverage of AI reach a new peak in 2019?

Jacob commented on Dec 30, 2018 at 11:08pm

Consensus during the workshop seemed to be that when coverage of AI increased overall, both positive and negative sentiment increased (see e.g. arrival of AlphaGo in the AI Index graph).

Compiling data-sets

Serial computation: In 2019, how many minutes will it take to train ResNet-152 on ILSVRC 2012 using a system available on the public market for less than $5000?

AlexRay provided an information source on Dec 15, 2018 at 7:43pm

DawnBench tracks this sort of thing.

Current lowest cost to train ImageNet ResNet-50 to 93% is <$13, at a training time of 164 minutes.

Current lowest time to train ImageNet ResNet-50 to 93% is 10 minutes.

https://dawn.cs.stanford.edu/benchmark/index.…

By Jan 1, 2020, will the top performing machine in the Graph500 list perform more than 40000 GTEPs on the BFS benchmark?

Anthony provided an information source on Dec 15, 2018 at 9:41pm

I transferred the top 200 by TEPS, eliminating those without a listed year, into this spreadsheet.

Trends are weak or absent in GTEPS vs. year, GTEPS/core vs. year or GTEPS/core vs. GTEPS.

So as compared to the top 500 by FLOPS, things seems a bit strange here and this may indicate that this benchmark is not really a target high-powered systems are trying to hit.

How impressive will the January 24 Starcraft II AI reveal be?

datscilly commented on Jan 1, 2019 at 5:13am

I estimated the time it will take Starcraft AI to be superhuman, looking at the milestones for Chess, Go, Atari, and Dota. Some dates depend on personal judgements, and the forecast could be considered handwavey enough to start generating a breeze. But this was my best try at coming up with a reference class to make an outside view prediction.

Improving questions

Offering fruitful factorizations/fermi-izings of questions (especially if those turn into new questions themselves)

How impressive will the January 24 Starcraft II AI reveal be?

kokotajlod factored the question on Jan 23, 2019 at 6:52pm

It took OpenAI Five three months to go from beating their dev team to even approaching the level of pros. DeepMind has had a while to work on this, but they haven't released any milestones of the level of "beating their dev team" yet.

I guess 0.7 they would release such a milestone as soon as they reach it. So I think they are either at that level now, or below, with 0.7 credence. If they are higher than that level, then I'd guess 0.8 credence that they can beat Masters. So, my credence that they can beat Masters reliably now is 0.3*0.8 = 0.24.

By mid-2019, what will be the maximum compute (measured in petaflop/s-days), used in training by a published AI system?

amc commented on Dec 15, 2018 at 9:44pm

The key factors here for me is the 6 month time-frame and that AlphaGo had special effort put into it. It seems that a project which is already almost finished is the only way anything could beat AlphaGo.

Pointing out problematic ambiguities in question resolution and offering suggestions to patch them

Serial computation: In 2019, how many minutes will it take to train ResNet-152 on ILSVRC 2012 using a system available on the public market for less than $5000?

DanielFilan provided question feedback on Dec 15, 2018 at 10:44pm

This question seems gameable, since there's a low probability on anybody spending $4,000 on this benchmark, but there's a high probability on a forecaster spending $10 on this benchmark to achieve a very low time to mess up others' forecasts. As such, the question is really about who is willing to spend the most. Proposed changes: Make the question be about ResNet-50, or specifically DAWNBench. Demand that attempts spend more than $1,000 or have the model trained in less than 100 minutes.

By end of 2019, will there be an agent at least as good as AlphaStar using non-controversial, human-like APM restrictions?

kokotajlod provided question feedback 4 days 20 hours ago

Here's a thought on how to make the resolution more precise. It's still problematic in various ways, but maybe it's an improvement.

--If there isn't a new AI released by the end of 2019, the question resolves negative.

--If there is, we look at the reactions of the people you quote as skeptical of the current result. Are they mollified? If at least 50% of them (after excluding those for whom we can't find comments) agree that the new system doesn't have unfair APM (and isn't unfair in some new way that AlphaStar wasn't) then the question resolves positive. Otherwise, it resolves negative.

This is not a checklist. We can imagine awarding the prize to someone who has written only a handful of comments, or to someone who has written dozens of one-sentence comments; to someone who has done only one of the above or someone who has done many. The key determinant will be the quality of the contribution.

The winners will be determined at the sole discretion of the Metaculus AI team.

For questions, comments, and more, you can comment below or just reach out to Jacob (jacobklagerros@gmail.com) or Ben (goldhaber.ben@gmail.com).

The AI Fermi Prize

20% davidmanheim made a prediction on Jan 16, 2019 at 8:34am

10% alexanderpolta updated a prediction 2 days 14 hours ago

Other cities may be interested in something similar, but one has to imagine that Tianjin received either a nudge or explicit encouragement from Central Party authorities to make this push.

The Central Party may eventually establish other AI centers, but I think they'll wait to see how the first goes before rolling out others, and Mid-2019 isn't enough time to evaluate the success of their first initiative.

15% Jacob made a prediction 2 days 2 hours ago

Here’s LeelaChessZero’s previous performance against Stockfish, in the format: Stockfish_wins-draws-Lc0_wins

[Sep 2018] CCC: 2-8-2

[Oct 2018] 0-0-1 (Lc0 handicapped)

[Oct 2018] TCEC Cup 1: 2-5-0

[Nov 2018] CCC Blitz: 15-81-4

[Jan 2019] TCEC: 1-5-0

Also, using some equivalent of a BayesElo score for chess engines (the details of which I haven’t gone into), Stockfish is ranked 1 and LCZero 83, with a difference of 931. Plugging these rankings into this calculator gives a win probability of <1% for Stockfish.

However, Lc0 is improving very quickly. Less than a year ago, it lost almost all its games in Division 4, the lowest TCEC division. This year, it dominated division 3,2 and 1, even setting records in some of them. Stockfish still dominated the premier division.

So, overall, I wouldn’t be that surprised if it beats Stockfish in a year from now, yet at this this time I can’t really find reasons to put more than 15% here (and even then a lot of that mass is due to model uncertainty).

davidmanheim provided an information source 4 days 10 hours ago

Updating back towards "match human performance" isn't happening in the near future based on clear evidence that the most recent performance wasn't actually AI performance of human behaviors, it was unreasonable autmoation of click behaviors;

DanielFilan factored the question on Jan 6, 2019 at 4:56am

My sketchy estimate: https://www.getguesstimate.com/models/12302

Jacob commented on Dec 30, 2018 at 11:08pm

Consensus during the workshop seemed to be that when coverage of AI increased overall, both positive and negative sentiment increased (see e.g. arrival of AlphaGo in the AI Index graph).

AlexRay provided an information source on Dec 15, 2018 at 7:43pm

DawnBench tracks this sort of thing.

Current lowest cost to train ImageNet ResNet-50 to 93% is <$13, at a training time of 164 minutes.

Current lowest time to train ImageNet ResNet-50 to 93% is 10 minutes.

Anthony provided an information source on Dec 15, 2018 at 9:41pm

I transferred the top 200 by TEPS, eliminating those without a listed year, into this spreadsheet.

Trends are weak or absent in GTEPS vs. year, GTEPS/core vs. year or GTEPS/core vs. GTEPS.

So as compared to the top 500 by FLOPS, things seems a bit strange here and this may indicate that this benchmark is not really a target high-powered systems are trying to hit.

datscilly commented on Jan 1, 2019 at 5:13am

kokotajlod factored the question on Jan 23, 2019 at 6:52pm

It took OpenAI Five three months to go from beating their dev team to even approaching the level of pros. DeepMind has had a while to work on this, but they haven't released any milestones of the level of "beating their dev team" yet.

amc commented on Dec 15, 2018 at 9:44pm

The key factors here for me is the 6 month time-frame and that AlphaGo had special effort put into it. It seems that a project which is already almost finished is the only way anything could beat AlphaGo.

DanielFilan provided question feedback on Dec 15, 2018 at 10:44pm

kokotajlod provided question feedback 4 days 20 hours ago

Here's a thought on how to make the resolution more precise. It's still problematic in various ways, but maybe it's an improvement.

--If there isn't a new AI released by the end of 2019, the question resolves negative.