In this thread I want to invite a discussion on how to think about the problem of forecasting how hardware will contribute to progress in AI. Moreover, as I'm working to operationalise various questions on this topic, I'm especially keen to see specific suggestions for questions that are thought to be diagnostic of the role that hardware will play in driving progress.
Various figures in AI research have acknowledged the importance of compute for AI progress (e.g. Lecun, Bengio and Hinton, 2015). Since the training of neural networks is a somewhat parallel information processing problem (Schmidhuber, 2015), training speedups can readily be achieved by adding more compute, or by using processors with many thousands of cores (such as GPUs).
Additional compute, as well as specialised compute that exploits algorithm parallelizability, has enabled increases in the number of parameters of neural networks and in the size of data sets that those networks can be trained on, which has contributed to improved models (Brundage, 2016).
Moreover, the amount of money spent on compute has also been increasing. It has been estimated that, the cost of the largest AI experiments increases by an order of magnitude approximately every 1.1 – 1.4 years (AI Impacts, 2018).
Hence, we have the following rough picture:
More spending on compute + cheaper compute driven by Moore's law -> larger models trained on larger data sets -> better models
On the supply-side, we have Moore's law setting the pace for improvements in price-performance of hardware. The processes driving Moore's law were in place long before the advent of AI and are likely to be mostly independent of how much AI labs demand more compute. We can therefore consider price-performance trends mostly in isolation from demand-side considerations.
On the demand side, it's unclear how larger models and larger data sets will contribute to AI progress. Empirically, it has been observed that for SOTA models in translation, language modelling, image classification and speech recognition there are diminishing marginal returns to data set size in terms of reducing generalization error (Hestness et al., 2017).
Supply side
- By how much will hardware manufacturers be able to increase compute performance in the future?
- e.g. what will the mean of the year-over-year growth rate of the sum of teralops of the all 500 supercomputers in the TOP500 be in the three-year period ending 2027?
- e.g. what will the average year-over-year decline in price-per-teraflops from 2024 to 2027 for Nvidia GPUs be?
- If performance improvements level off, e.g. price performance improves at single-digits year-over-year, when will this happen?
- Do application specific integrated circuits for deep learning (such as TPUs or IPUs) have more potential for generating training speedups?
Demand side
- To what extent will increased model size, and models trained on more data result in substantial economic advantages for particular applications?
- How much will AI labs spend on training a translation mode/ language model/ image classifier/ speech recognition model?
- e.g. What is the cost of hardware used in the most expensive AI experiment by the end of 2024?
- How strongly will the returns to increasing training data set size diminish for different applications in the future?
- Will innovations improve the data efficiency of deep learning methods?
- How strongly will the returns to model size diminish for different applications in the future?