This question is related to others concerning GPT-2, including:
- how much computation it used
- whether there will be malicious of the technology and
- whether a non-binding agreement on dual-use publishing norms will be in place by end of 2019.
In their release of the GPT-2 language model, OpenAI wrote:
Other disciplines such as biotechnology and cybersecurity have long had active debates about responsible publication in cases with clear misuse potential, and we hope that our experiment will serve as a case study for more nuanced discussions of model and code release decisions in the AI community.
We’re now posting several questions to forecast the impact of this model and the policy surrounding it. Here we ask:
Before Jan 1st 2021, will an AI lab credibly claim to have made an improvement over the start-of-the-art on a non-trivial benchmark, but avoid publishing details of the model (such as data, parameter tunings, or implementation details) explicitly due to concerns surrounding malicious use?
The benchmark need not be a popular or widely used one; but it should not be a fringe or trivial benchmark (e.g. the question would not resolve by an arXiv pre-print from an unknown lab claiming to have beaten their own image-recognition benchmark, but without getting any discussion from the main AI capabilities community).