Forecasting AI

Your submission is now a Draft.

Once it's ready, please submit your draft for review by our team of Community Moderators. Thank you!

You have been invited to co-author this question.

When it is ready, the author will submit it for review by Community Moderators. Thanks for helping!


This question now needs to be reviewed by Community Moderators.

We have high standards for question quality. We also favor questions on our core topic areas or that we otherwise judge valuable. We may not publish questions that are not a good fit.

If your question has not received attention within a week, or is otherwise pressing, you may request review by tagging @moderators in a comment.

You have been invited to co-author this question.

It now needs to be approved by Community Moderators. Thanks for helping!


{{qctrl.question.predictionCount() | abbrNumber}} predictions
{{"myPredictionLabel" | translate}}:  
{{ qctrl.question.resolutionString() }}
{{qctrl.question.predictionCount() | abbrNumber}} predictions
My score: {{qctrl.question.player_log_score | logScorePrecision}}
Created by: ghabs and
co-authors , {{coauthor.username}}
AI Technical Benchmarks

Make a Prediction


The SuperGlue Benchmark measures progress in language understanding tasks.

The original benchmark, GLUE (General Language Understanding Evaluation) is a collection of language understanding tasks built on established existing datasets and selected to cover a diverse range of dataset sizes, text genres, and degrees of difficulty. The tasks were sourced from a survey of ML researchers, and it was launched in mid 2018. Several models have now surpassed the GLUE human baseline.

The new SuperGLUE benchmark contains a set of more difficult language understanding tasks. Human Level performance on the SuperGlue baseline is 89.8. The current best performing ML model as of July 19th, 2019 is BERT++ with a score of 71.5. Will language model performance have progressed enough that by next year one will have superhuman performance on the SuperGLUE benchmark?