Scaling Laws

TagLast edit: 18 Jun 2023 23:35 UTC by riley

Scaling Laws refer to the observed trend that the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as one varies the amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, or number of training steps) follows variants of power laws.

External links

“Broken Neural Scaling Laws” paper

Scaling laws graph from Scaling Laws for Neural Language Models

/r/MLScaling: new subreddit for NN scaling research/discussion

gwern30 Oct 2020 20:50 UTC

21 points

0 comments1 min readLW link

(www.reddit.com)

Thoughts on the Alignment Implications of Scaling Language Models

leogao2 Jun 2021 21:32 UTC

82 points

11 comments17 min readLW link

Parameter counts in Machine Learning

Jsevillamol and Pablo Villalobos

19 Jun 2021 16:04 UTC

47 points

16 comments7 min readLW link

How much chess engine progress is about adapting to bigger computers?

paulfchristiano7 Jul 2021 22:35 UTC

114 points

23 comments6 min readLW link

A closer look at chess scalings (into the past)

hippke15 Jul 2021 8:13 UTC

49 points

14 comments4 min readLW link

What is Compute? - Transformative AI and Compute [1/4]

lennart23 Sep 2021 16:25 UTC

27 points

9 comments19 min readLW link

Transformative AI and Compute [Summary]

lennart26 Sep 2021 11:41 UTC

13 points

0 comments9 min readLW link

Proposal: Scaling laws for RL generalization

axioman1 Oct 2021 21:32 UTC

14 points

12 comments11 min readLW link

Forecasting Compute—Transformative AI and Compute [2/4]

lennart2 Oct 2021 15:54 UTC

17 points

0 comments19 min readLW link

NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

Ozyrus11 Oct 2021 15:28 UTC

51 points

36 comments1 min readLW link

(developer.nvidia.com)

Compute Governance and Conclusions—Transformative AI and Compute [3/4]

lennart14 Oct 2021 8:23 UTC

13 points

0 comments5 min readLW link

My ML Scaling bibliography

gwern23 Oct 2021 14:41 UTC

35 points

9 comments1 min readLW link

(www.gwern.net)

Compute Research Questions and Metrics—Transformative AI and Compute [4/4]

lennart28 Nov 2021 22:49 UTC

7 points

0 comments16 min readLW link

How to measure FLOP/s for Neural Networks empirically?

Marius Hobbhahn29 Nov 2021 15:18 UTC

16 points

5 comments7 min readLW link

How I’m thinking about GPT-N

delton13717 Jan 2022 17:11 UTC

54 points

21 comments18 min readLW link

Estimating training compute of Deep Learning models

lennart, Jsevillamol, Marius Hobbhahn, Tamay Besiroglu and anson.ho

20 Jan 2022 16:12 UTC

37 points

4 comments1 min readLW link

Compute Trends Across Three eras of Machine Learning

Jsevillamol, Pablo Villalobos, lennart, Marius Hobbhahn, Tamay Besiroglu and anson.ho

16 Feb 2022 14:18 UTC

94 points

13 comments2 min readLW link

Compute Trends — Comparison to OpenAI’s AI and Compute

lennart, Jsevillamol, Pablo Villalobos, Marius Hobbhahn, Tamay Besiroglu and anson.ho

12 Mar 2022 18:09 UTC

23 points

3 comments3 min readLW link

[Link] Training Compute-Optimal Large Language Models

nostalgebraist31 Mar 2022 18:01 UTC

51 points

23 comments1 min readLW link

(arxiv.org)

Ethan Caballero on Private Scaling Progress

Michaël Trazzi5 May 2022 18:32 UTC

63 points

2 comments2 min readLW link

(theinsideview.github.io)

A Quick Note on AI Scaling Asymptotes

alyssavance25 May 2022 2:55 UTC

44 points

7 comments1 min readLW link

[linkpost] The final AI benchmark: BIG-bench

RomanS10 Jun 2022 8:53 UTC

25 points

21 comments1 min readLW link

Causal confusion as an argument against the scaling hypothesis

RobertKirk and David Scott Krueger (formerly: capybaralet)

20 Jun 2022 10:54 UTC

85 points

30 comments18 min readLW link

Google’s new text-to-image model—Parti, a demonstration of scaling benefits

Kayden22 Jun 2022 20:00 UTC

32 points

4 comments1 min readLW link

Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez, Ian McKenzie and Sam Bowman

27 Jun 2022 15:58 UTC

169 points

14 comments7 min readLW link

Trends in GPU price-performance

Marius Hobbhahn and Tamay

1 Jul 2022 15:51 UTC

85 points

12 comments1 min readLW link 1 review

(epochai.org)

Machine Learning Model Sizes and the Parameter Gap [abridged]

Pablo Villalobos18 Jul 2022 16:51 UTC

20 points

0 comments1 min readLW link

(epochai.org)

chinchilla’s wild implications

nostalgebraist31 Jul 2022 1:18 UTC

410 points

128 comments10 min readLW link 1 review

How should DeepMind’s Chinchilla revise our AI forecasts?

Cleo Nardo15 Sep 2022 17:54 UTC

35 points

12 comments13 min readLW link

Smoke without fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC

51 points

22 comments4 min readLW link

Scaling Laws for Reward Model Overoptimization

leogao, John Schulman and Jacob_Hilton

20 Oct 2022 0:20 UTC

102 points

13 comments1 min readLW link

(arxiv.org)

Ethan Caballero on Broken Neural Scaling Laws, Deception, and Recursive Self Improvement

Michaël Trazzi and Ethan Caballero

4 Nov 2022 18:09 UTC

13 points

11 comments10 min readLW link

(theinsideview.ai)

Inverse scaling can become U-shaped

Edouard Harris8 Nov 2022 19:04 UTC

27 points

15 comments1 min readLW link

(arxiv.org)

Massive Scaling Should be Frowned Upon

harsimony17 Nov 2022 8:43 UTC

4 points

6 comments5 min readLW link

[Question] Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’

Nick_Greig18 Nov 2022 12:46 UTC

15 points

2 comments1 min readLW link

Whisper’s Wild Implications

Ollie J3 Jan 2023 12:17 UTC

19 points

6 comments5 min readLW link

[Linkpost] Scaling Laws for Generative Mixed-Modal Language Models

Amal 12 Jan 2023 14:24 UTC

15 points

2 comments1 min readLW link

(arxiv.org)

Some Arguments Against Strong Scaling

Joar Skalse13 Jan 2023 12:04 UTC

25 points

21 comments16 min readLW link

Parameter Scaling Comes for RL, Maybe

1a3orn24 Jan 2023 13:55 UTC

98 points

3 comments14 min readLW link

Inverse Scaling Prize: Second Round Winners

Ian McKenzie, Sam Bowman and Ethan Perez

24 Jan 2023 20:12 UTC

58 points

17 comments15 min readLW link

Scaling Laws Literature Review

Pablo Villalobos27 Jan 2023 19:57 UTC

36 points

1 comment4 min readLW link

(epochai.org)

The effect of horizon length on scaling laws

Jacob_Hilton1 Feb 2023 3:59 UTC

23 points

2 comments1 min readLW link

(arxiv.org)

The Quantization Model of Neural Scaling

nz31 Mar 2023 16:02 UTC

17 points

0 comments1 min readLW link

(arxiv.org)

Data and “tokens” a 30 year old human “trains” on

Jose Miguel Cruz y Celis23 May 2023 5:34 UTC

15 points

15 comments1 min readLW link

Why Job Displacement Predictions are Wrong: Explanations of Cognitive Automation

Moritz Wallawitsch30 May 2023 20:43 UTC

−4 points

0 comments8 min readLW link

What will GPT-2030 look like?

jsteinhardt7 Jun 2023 23:40 UTC

182 points

42 comments23 min readLW link

(bounded-regret.ghost.io)

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

RobertKirk20 Jul 2023 9:56 UTC

38 points

2 comments5 min readLW link

[Linkpost] Applicability of scaling laws to vision encoding models

Bogdan Ionut Cirstea5 Aug 2023 11:10 UTC

11 points

2 comments1 min readLW link

[Question] Clarifying how misalignment can arise from scaling LLMs

Util19 Aug 2023 14:16 UTC

3 points

1 comment1 min readLW link

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, lberglund, Asa Cooper Stickland, Meg and Maximilian Kaufmann

4 Sep 2023 12:54 UTC

106 points

16 comments5 min readLW link

(arxiv.org)

[Question] Nonlinear limitations of ReLUs

magfrump26 Oct 2023 18:51 UTC

13 points

1 comment1 min readLW link

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

8 Nov 2023 11:37 UTC

49 points

0 comments18 min readLW link

What’s new at FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 21:18 UTC

40 points

0 comments5 min readLW link

(far.ai)

The Perceptron Controversy

Yuxi_Liu10 Jan 2024 23:07 UTC

64 points

18 comments1 min readLW link

(yuxi-liu-wired.github.io)

Predicting AGI by the Turing Test

Yuxi_Liu22 Jan 2024 4:22 UTC

21 points

2 comments10 min readLW link

(yuxi-liu-wired.github.io)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)

RP and agg

9 Feb 2024 7:00 UTC

50 points

6 comments3 min readLW link

Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search

Arjun Panickssery12 Feb 2024 0:56 UTC

53 points

13 comments3 min readLW link

[Question] Is there a “critical threshold” for LLM scaling laws?

Logan Zoellner30 Mar 2024 12:23 UTC

7 points

1 comment1 min readLW link

Scaling Laws and Superposition

Pavan Katta10 Apr 2024 15:36 UTC

7 points

4 comments5 min readLW link

(www.pavankatta.com)

plex 24 Sep 2021 15:07 UTC
3 points
Is it not possible to use images in tags? Or am I just using the wrong syntax?
- plex 22 Oct 2021 19:44 UTC
  1 point
  Parent
  It is possible, you just paste the image apparently, thanks Yoav Ravid for the tip.

Scal­ing Laws

External links

Scaling Laws