Scaling Laws

TagLast edit: Jun 18, 2023, 11:35 PM by riley

Scaling Laws refer to the observed trend that the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as one varies the amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, or number of training steps) follows variants of power laws.

External links

“Broken Neural Scaling Laws” paper

Scaling laws graph from Scaling Laws for Neural Language Models

“Can AI Scaling Continue Through 2030?”, Epoch AI (yes)

gwernAug 24, 2024, 1:40 AM

130 points

4 comments3 min readLW link

(epochai.org)

chinchilla’s wild implications

nostalgebraistJul 31, 2022, 1:18 AM

424 points

128 comments10 min readLW link 1 review

Ethan Caballero on Broken Neural Scaling Laws, Deception, and Recursive Self Improvement

Michaël Trazzi and Ethan Caballero

Nov 4, 2022, 6:09 PM

16 points

11 comments10 min readLW link

(theinsideview.ai)

/r/MLScaling: new subreddit for NN scaling research/discussion

gwernOct 30, 2020, 8:50 PM

21 points

0 comments1 min readLW link

(www.reddit.com)

What will GPT-2030 look like?

jsteinhardtJun 7, 2023, 11:40 PM

185 points

43 comments23 min readLW link

(bounded-regret.ghost.io)

My ML Scaling bibliography

gwernOct 23, 2021, 2:41 PM

35 points

9 comments1 min readLW link

(www.gwern.net)

Google’s new text-to-image model—Parti, a demonstration of scaling benefits

KaydenJun 22, 2022, 8:00 PM

32 points

4 comments1 min readLW link

Thoughts on the Alignment Implications of Scaling Language Models

leogaoJun 2, 2021, 9:32 PM

82 points

11 comments17 min readLW link

Inverse Scaling Prize: Second Round Winners

Ian McKenzie, Sam Bowman and Ethan Perez

Jan 24, 2023, 8:12 PM

58 points

17 comments15 min readLW link

On AI Scaling

harsimonyFeb 5, 2025, 8:24 PM

6 points

3 comments8 min readLW link

(splittinginfinity.substack.com)

Musings on LLM Scale (Jul 2024)

Vladimir_NesovJul 3, 2024, 6:35 PM

34 points

0 comments3 min readLW link

Densing Law of LLMs

Bogdan Ionut CirsteaDec 8, 2024, 7:35 PM

9 points

2 comments1 min readLW link

(arxiv.org)

[Question] Nonlinear limitations of ReLUs

magfrumpOct 26, 2023, 6:51 PM

13 points

1 comment1 min readLW link

Musings on Text Data Wall (Oct 2024)

Vladimir_NesovOct 5, 2024, 7:00 PM

40 points

2 comments5 min readLW link

NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

OzyrusOct 11, 2021, 3:28 PM

51 points

36 comments1 min readLW link

(developer.nvidia.com)

[Link] Training Compute-Optimal Large Language Models

nostalgebraistMar 31, 2022, 6:01 PM

51 points

23 comments1 min readLW link

(arxiv.org)

An Optimistic 2027 Timeline

YitzApr 6, 2025, 4:39 PM

13 points

16 comments9 min readLW link

A closer look at chess scalings (into the past)

hippkeJul 15, 2021, 8:13 AM

50 points

14 comments4 min readLW link

[Question] Is there a “critical threshold” for LLM scaling laws?

Logan ZoellnerMar 30, 2024, 12:23 PM

7 points

1 comment1 min readLW link

Ethan Caballero on Private Scaling Progress

Michaël TrazziMay 5, 2022, 6:32 PM

63 points

2 comments2 min readLW link

(theinsideview.github.io)

[Linkpost] Scaling Laws for Generative Mixed-Modal Language Models

Amal Jan 12, 2023, 2:24 PM

15 points

2 comments1 min readLW link

(arxiv.org)

Dmitry’s Koan

Dmitry VaintrobJan 10, 2025, 4:27 AM

44 points

8 comments22 min readLW link

Inverse scaling can become U-shaped

Edouard HarrisNov 8, 2022, 7:04 PM

27 points

15 comments1 min readLW link

(arxiv.org)

The effect of horizon length on scaling laws

Jacob_HiltonFeb 1, 2023, 3:59 AM

23 points

2 comments1 min readLW link

(arxiv.org)

o1: A Technical Primer

Jesse HooglandDec 9, 2024, 7:09 PM

170 points

19 comments9 min readLW link

(www.youtube.com)

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg and Maximilian Kaufmann

Sep 4, 2023, 12:54 PM

109 points

16 comments5 min readLW link

(arxiv.org)

Smoke without fire is scary

Adam JermynOct 4, 2022, 9:08 PM

52 points

22 comments4 min readLW link

Scaling Laws for Reward Model Overoptimization

leogao, John Schulman and Jacob_Hilton

Oct 20, 2022, 12:20 AM

103 points

13 comments1 min readLW link

(arxiv.org)

Transformative AI and Compute [Summary]

lennartSep 26, 2021, 11:41 AM

14 points

0 comments9 min readLW link

Scaling Laws and Superposition

Pavan KattaApr 10, 2024, 3:36 PM

9 points

4 comments5 min readLW link

(www.pavankatta.com)

Why Recursive Self-Improvement Might Not Be the Existential Risk We Fear

Nassim_ANov 24, 2024, 5:17 PM

1 point

0 comments9 min readLW link

Machine Learning Model Sizes and the Parameter Gap [abridged]

Pablo VillalobosJul 18, 2022, 4:51 PM

20 points

0 comments1 min readLW link

(epochai.org)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)

RP and agg

Feb 9, 2024, 7:00 AM

50 points

6 comments3 min readLW link

Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez, Ian McKenzie and Sam Bowman

Jun 27, 2022, 3:58 PM

171 points

14 comments7 min readLW link

Parameter counts in Machine Learning

Jsevillamol and Pablo Villalobos

Jun 19, 2021, 4:04 PM

47 points

18 comments7 min readLW link

Massive Scaling Should be Frowned Upon

harsimonyNov 17, 2022, 8:43 AM

5 points

6 comments5 min readLW link

Why Job Displacement Predictions are Wrong: Explanations of Cognitive Automation

Moritz WallawitschMay 30, 2023, 8:43 PM

−4 points

0 comments8 min readLW link

Scaling Laws Literature Review

Pablo VillalobosJan 27, 2023, 7:57 PM

36 points

1 comment4 min readLW link

(epochai.org)

Log-linear Scaling is Worth the Cost due to Gains in Long-Horizon Tasks

shash42Apr 7, 2025, 9:50 PM

15 points

2 comments1 min readLW link

prÆy

oimrqsJan 11, 2025, 7:42 PM

1 point

0 comments1 min readLW link

The Perceptron Controversy

Yuxi_LiuJan 10, 2024, 11:07 PM

65 points

18 comments1 min readLW link

(yuxi-liu-wired.github.io)

Parameter Scaling Comes for RL, Maybe

1a3ornJan 24, 2023, 1:55 PM

100 points

3 comments14 min readLW link

The Quantization Model of Neural Scaling

nzMar 31, 2023, 4:02 PM

17 points

0 comments1 min readLW link

(arxiv.org)

[Linkpost] Applicability of scaling laws to vision encoding models

Bogdan Ionut CirsteaAug 5, 2023, 11:10 AM

11 points

2 comments1 min readLW link

How should DeepMind’s Chinchilla revise our AI forecasts?

Cleo NardoSep 15, 2022, 5:54 PM

35 points

12 comments13 min readLW link

Predicting AGI by the Turing Test

Yuxi_LiuJan 22, 2024, 4:22 AM

21 points

2 comments10 min readLW link

(yuxi-liu-wired.github.io)

Trends in GPU price-performance

Marius Hobbhahn and Tamay

Jul 1, 2022, 3:51 PM

85 points

13 comments1 min readLW link 1 review

(epochai.org)

A Quick Note on AI Scaling Asymptotes

alyssavanceMay 25, 2022, 2:55 AM

44 points

7 comments1 min readLW link

Estimating training compute of Deep Learning models

lennart, Jsevillamol, Marius Hobbhahn, Tamay Besiroglu and anson.ho

Jan 20, 2022, 4:12 PM

37 points

4 comments1 min readLW link

Compute Trends Across Three eras of Machine Learning

Jsevillamol, Pablo Villalobos, lennart, Marius Hobbhahn, Tamay Besiroglu and anson.ho

Feb 16, 2022, 2:18 PM

94 points

13 comments2 min readLW link

[linkpost] The final AI benchmark: BIG-bench

RomanSJun 10, 2022, 8:53 AM

25 points

21 comments1 min readLW link

Compute Governance and Conclusions—Transformative AI and Compute [3/4]

lennartOct 14, 2021, 8:23 AM

13 points

0 comments5 min readLW link

Causal confusion as an argument against the scaling hypothesis

RobertKirk and David Scott Krueger (formerly: capybaralet)

Jun 20, 2022, 10:54 AM

86 points

30 comments15 min readLW link

Implications of the inference scaling paradigm for AI safety

Ryan KiddJan 14, 2025, 2:14 AM

93 points

70 comments5 min readLW link

What’s new at FAR AI

AdamGleave and EuanMcLean

Dec 4, 2023, 9:18 PM

41 points

0 comments5 min readLW link

(far.ai)

[Question] Clarifying how misalignment can arise from scaling LLMs

UtilAug 19, 2023, 2:16 PM

3 points

1 comment1 min readLW link

[Question] Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’

Nick_GreigNov 18, 2022, 12:46 PM

15 points

2 comments1 min readLW link

Some Arguments Against Strong Scaling

Joar SkalseJan 13, 2023, 12:04 PM

25 points

21 comments16 min readLW link

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

Nov 8, 2023, 11:37 AM

49 points

0 comments18 min readLW link

Proposal: Scaling laws for RL generalization

axiomanOct 1, 2021, 9:32 PM

14 points

12 comments11 min readLW link

How I’m thinking about GPT-N

delton137Jan 17, 2022, 5:11 PM

54 points

21 comments18 min readLW link

Analyzing DeepMind’s Probabilistic Methods for Evaluating Agent Capabilities

Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn and Jérémy Scheurer

Jul 22, 2024, 4:17 PM

69 points

0 comments16 min readLW link

Intelligence Is Jagged

Adam TrainFeb 19, 2025, 7:08 AM

6 points

1 comment3 min readLW link

How much chess engine progress is about adapting to bigger computers?

paulfchristianoJul 7, 2021, 10:35 PM

114 points

23 comments6 min readLW link

Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM

57 points

13 comments3 min readLW link

Compute Research Questions and Metrics—Transformative AI and Compute [4/4]

lennartNov 28, 2021, 10:49 PM

7 points

0 comments16 min readLW link

Forecasting Compute—Transformative AI and Compute [2/4]

lennartOct 2, 2021, 3:54 PM

17 points

0 comments19 min readLW link

Neural Scaling Laws Rooted in the Data Distribution

aribrillFeb 20, 2025, 9:22 PM

7 points

0 comments1 min readLW link

(arxiv.org)

How LLMs Learn: What We Know, What We Don’t (Yet) Know, and What Comes Next

JonasbJul 9, 2024, 9:58 AM

2 points

0 comments16 min readLW link

(www.denominations.io)

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

RobertKirkJul 20, 2023, 9:56 AM

39 points

2 comments5 min readLW link

Whisper’s Wild Implications

Ollie JJan 3, 2023, 12:17 PM

19 points

6 comments5 min readLW link

How to measure FLOP/s for Neural Networks empirically?

Marius HobbhahnNov 29, 2021, 3:18 PM

16 points

5 comments7 min readLW link

Compute Trends — Comparison to OpenAI’s AI and Compute

lennart, Jsevillamol, Pablo Villalobos, Marius Hobbhahn, Tamay Besiroglu and anson.ho

Mar 12, 2022, 6:09 PM

23 points

3 comments3 min readLW link

Data and “tokens” a 30 year old human “trains” on

Jose Miguel Cruz y CelisMay 23, 2023, 5:34 AM

14 points

15 comments1 min readLW link

What is Compute? - Transformative AI and Compute [1/4]

lennartSep 23, 2021, 4:25 PM

27 points

9 comments19 min readLW link

plex Sep 24, 2021, 3:07 PM
3 points
Is it not possible to use images in tags? Or am I just using the wrong syntax?
- plex Oct 22, 2021, 7:44 PM
  1 point
  Parent
  It is possible, you just paste the image apparently, thanks Yoav Ravid for the tip.

Scal­ing Laws

External links

Scaling Laws