The HOList Leaderboard, recently released by researchers at Google, is a benchmark test for automated theorem provers.
Plausibly, if an automated model can prove theorems in a novel way that humans have not, it points to their use in advanced scientific inquiry.
As of May 27th, 2019, the current leader in the RL category has proven 36.55% of the theorems in the HOList dataset.
By end of 2019, what percentage of theorems will be proven by an RL model in the HOList benchmark?
The resolution criteria will be the publicly reported HOList benchmark website.