Computation in PFLOP/s-days used by GPT-2, the largest model trained in OpenAI's "Language Models are Unsupervised Multitask Learners" paper (blog post here).
The estimate should not include computation used for hyperparameter tuning and architecture search.
Resolution by paper or other reliable announcement (this may already have resolved, but I haven't dug deep enough into the paper to find out. In either case, it will be good to gather guesstimates of it here on Metaculus AI).
The method of calculations should be as similar as possible to that used in the "AI and compute" article. Note also that this article estimates actual rather than theoretical FLOPS, assuming a GPU utilization at 33% and CPU utilization at 17%.
As a hint, OpenAI themselves estimate the previous GPT model to use 0.96 PFLOP/s-days; and mention that GPT-2 uses more than 10x the number of parameters and more than 10x the amount of data.