This paper basically says: LLMs require exponentially more data to get linear improvements in zero-shot performance. The marginal increase in performance by putting more data diminishes when more data are used for training.

My interpretation is that, unless the training algorithm is improved and/or we can generate exponentially more (labelled) data (plus exponentially more energy, CO2 emission, and $), we will never see linear increase in performance of these LLMs. It will peak, if it does not already peak.

arxiv.org/pdf/2404.04125