RSS

Scal­ing Laws

TagLast edit: 18 Jun 2023 23:35 UTC by riley

Scaling Laws refer to the observed trend that the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as one varies the amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, or number of training steps) follows variants of power laws.

External links

Scaling laws graph from Scaling Laws for Neural Language Models

“Can AI Scal­ing Con­tinue Through 2030?”, Epoch AI (yes)

gwern24 Aug 2024 1:40 UTC
136 points
4 comments3 min readLW link
(epochai.org)

chin­chilla’s wild implications

nostalgebraist31 Jul 2022 1:18 UTC
425 points
128 comments10 min readLW link1 review

/​r/​MLS­cal­ing: new sub­red­dit for NN scal­ing re­search/​discussion

gwern30 Oct 2020 20:50 UTC
21 points
0 comments1 min readLW link
(www.reddit.com)

What will GPT-2030 look like?

jsteinhardt7 Jun 2023 23:40 UTC
185 points
43 comments23 min readLW link
(bounded-regret.ghost.io)

My ML Scal­ing bibliography

gwern23 Oct 2021 14:41 UTC
35 points
9 comments1 min readLW link
(www.gwern.net)

Google’s new text-to-image model—Parti, a demon­stra­tion of scal­ing benefits

Kayden22 Jun 2022 20:00 UTC
32 points
4 comments1 min readLW link

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogao2 Jun 2021 21:32 UTC
82 points
11 comments17 min readLW link

In­verse Scal­ing Prize: Se­cond Round Winners

24 Jan 2023 20:12 UTC
58 points
17 comments15 min readLW link

On AI Scaling

harsimony5 Feb 2025 20:24 UTC
6 points
3 comments8 min readLW link
(splittinginfinity.substack.com)

Mus­ings on LLM Scale (Jul 2024)

Vladimir_Nesov3 Jul 2024 18:35 UTC
34 points
0 comments3 min readLW link

Dens­ing Law of LLMs

Bogdan Ionut Cirstea8 Dec 2024 19:35 UTC
9 points
2 comments1 min readLW link
(arxiv.org)

[Question] Non­lin­ear limi­ta­tions of ReLUs

magfrump26 Oct 2023 18:51 UTC
13 points
1 comment1 min readLW link

Mus­ings on Text Data Wall (Oct 2024)

Vladimir_Nesov5 Oct 2024 19:00 UTC
41 points
2 comments5 min readLW link

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

Ozyrus11 Oct 2021 15:28 UTC
51 points
36 comments1 min readLW link
(developer.nvidia.com)

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraist31 Mar 2022 18:01 UTC
51 points
23 comments1 min readLW link
(arxiv.org)

An “Op­ti­mistic” 2027 Timeline

Yitz6 Apr 2025 16:39 UTC
13 points
16 comments9 min readLW link

A closer look at chess scal­ings (into the past)

hippke15 Jul 2021 8:13 UTC
50 points
14 comments4 min readLW link

[Question] Is there a “crit­i­cal thresh­old” for LLM scal­ing laws?

Logan Zoellner30 Mar 2024 12:23 UTC
7 points
1 comment1 min readLW link

Su­per­hu­man Coders in AI 2027 - Not So Fast

1 May 2025 18:56 UTC
67 points
0 comments5 min readLW link

Ethan Ca­ballero on Pri­vate Scal­ing Progress

Michaël Trazzi5 May 2022 18:32 UTC
63 points
2 comments2 min readLW link
(theinsideview.github.io)

[Linkpost] Scal­ing Laws for Gen­er­a­tive Mixed-Mo­dal Lan­guage Models

Amal 12 Jan 2023 14:24 UTC
15 points
2 comments1 min readLW link
(arxiv.org)

Dmitry’s Koan

Dmitry Vaintrob10 Jan 2025 4:27 UTC
44 points
8 comments22 min readLW link

In­verse scal­ing can be­come U-shaped

Edouard Harris8 Nov 2022 19:04 UTC
27 points
15 comments1 min readLW link
(arxiv.org)

The effect of hori­zon length on scal­ing laws

Jacob_Hilton1 Feb 2023 3:59 UTC
23 points
2 comments1 min readLW link
(arxiv.org)

o1: A Tech­ni­cal Primer

Jesse Hoogland9 Dec 2024 19:09 UTC
172 points
19 comments9 min readLW link
(www.youtube.com)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
109 points
17 comments5 min readLW link
(arxiv.org)

The Spiral of Coherence

[Error communicating with LW2 server]22 Nov 2025 21:22 UTC
1 point
0 comments47 min readLW link

Smoke with­out fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC
52 points
22 comments4 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

20 Oct 2022 0:20 UTC
103 points
13 comments1 min readLW link
(arxiv.org)

Trans­for­ma­tive AI and Com­pute [Sum­mary]

lennart26 Sep 2021 11:41 UTC
14 points
0 comments9 min readLW link

Scal­ing Laws and Superposition

Pavan Katta10 Apr 2024 15:36 UTC
9 points
4 comments5 min readLW link
(www.pavankatta.com)

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC
1 point
0 comments9 min readLW link

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo Villalobos18 Jul 2022 16:51 UTC
20 points
0 comments1 min readLW link
(epochai.org)

Trans­fer learn­ing and gen­er­al­iza­tion-qua-ca­pa­bil­ity in Bab­bage and Davinci (or, why di­vi­sion is bet­ter than Span­ish)

RP and agg
9 Feb 2024 7:00 UTC
50 points
6 comments3 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

27 Jun 2022 15:58 UTC
171 points
14 comments7 min readLW link

Pa­ram­e­ter counts in Ma­chine Learning

19 Jun 2021 16:04 UTC
47 points
18 comments7 min readLW link

Mas­sive Scal­ing Should be Frowned Upon

harsimony17 Nov 2022 8:43 UTC
5 points
6 comments5 min readLW link

Why Job Dis­place­ment Pre­dic­tions are Wrong: Ex­pla­na­tions of Cog­ni­tive Automation

Moritz Wallawitsch30 May 2023 20:43 UTC
−5 points
0 comments8 min readLW link

Scal­ing Laws Liter­a­ture Review

Pablo Villalobos27 Jan 2023 19:57 UTC
36 points
1 comment4 min readLW link
(epochai.org)

Log-lin­ear Scal­ing is Worth the Cost due to Gains in Long-Hori­zon Tasks

shash427 Apr 2025 21:50 UTC
16 points
2 comments1 min readLW link

prÆy

oimrqs11 Jan 2025 19:42 UTC
1 point
0 comments1 min readLW link

The Per­cep­tron Controversy

Yuxi_Liu10 Jan 2024 23:07 UTC
65 points
18 comments1 min readLW link
(yuxi-liu-wired.github.io)

Pa­ram­e­ter Scal­ing Comes for RL, Maybe

1a3orn24 Jan 2023 13:55 UTC
100 points
3 comments14 min readLW link

The Sta­bil­ity of Un­der­stand­ing: What Com­pres­sion De­cay Re­veals About LLMs

rb12516 Nov 2025 18:48 UTC
1 point
0 comments2 min readLW link

The Quan­ti­za­tion Model of Neu­ral Scaling

nz31 Mar 2023 16:02 UTC
17 points
0 comments1 min readLW link
(arxiv.org)

[Linkpost] Ap­pli­ca­bil­ity of scal­ing laws to vi­sion en­cod­ing models

Bogdan Ionut Cirstea5 Aug 2023 11:10 UTC
11 points
2 comments1 min readLW link

How should Deep­Mind’s Chin­chilla re­vise our AI fore­casts?

Cleo Nardo15 Sep 2022 17:54 UTC
35 points
12 comments13 min readLW link

Pre­dict­ing AGI by the Tur­ing Test

Yuxi_Liu22 Jan 2024 4:22 UTC
21 points
2 comments10 min readLW link
(yuxi-liu-wired.github.io)

Why Scal­ing Creates “Out-of-Nowhere” Jumps

Deckard14 Aug 2025 20:26 UTC
1 point
0 comments1 min readLW link

Trends in GPU price-performance

1 Jul 2022 15:51 UTC
85 points
13 comments1 min readLW link1 review
(epochai.org)

Two Hy­pothe­ses to Bridge the Gap Between Gen­eral Rel­a­tivity and Quan­tum Mechanics

MaybeItWorks29 May 2025 7:15 UTC
1 point
0 comments1 min readLW link

A Quick Note on AI Scal­ing Asymptotes

alyssavance25 May 2022 2:55 UTC
44 points
7 comments1 min readLW link

Es­ti­mat­ing train­ing com­pute of Deep Learn­ing models

20 Jan 2022 16:12 UTC
37 points
4 comments1 min readLW link

Com­pute Trends Across Three eras of Ma­chine Learning

16 Feb 2022 14:18 UTC
94 points
13 comments2 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanS10 Jun 2022 8:53 UTC
25 points
21 comments1 min readLW link

Com­pute Gover­nance and Con­clu­sions—Trans­for­ma­tive AI and Com­pute [3/​4]

lennart14 Oct 2021 8:23 UTC
13 points
0 comments5 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
86 points
30 comments15 min readLW link

Im­pli­ca­tions of the in­fer­ence scal­ing paradigm for AI safety

Ryan Kidd14 Jan 2025 2:14 UTC
96 points
70 comments5 min readLW link

What’s new at FAR AI

4 Dec 2023 21:18 UTC
41 points
0 comments5 min readLW link
(far.ai)

[Question] Clar­ify­ing how mis­al­ign­ment can arise from scal­ing LLMs

Util19 Aug 2023 14:16 UTC
3 points
1 comment1 min readLW link

[Question] Up­dates on scal­ing laws for foun­da­tion mod­els from ′ Tran­scend­ing Scal­ing Laws with 0.1% Ex­tra Com­pute’

Nick_Greig18 Nov 2022 12:46 UTC
15 points
2 comments1 min readLW link

Some Ar­gu­ments Against Strong Scaling

Joar Skalse13 Jan 2023 12:04 UTC
25 points
21 comments16 min readLW link

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

8 Nov 2023 11:37 UTC
49 points
0 comments18 min readLW link

Pro­posal: Scal­ing laws for RL generalization

axioman1 Oct 2021 21:32 UTC
14 points
12 comments11 min readLW link

How I’m think­ing about GPT-N

delton13717 Jan 2022 17:11 UTC
54 points
21 comments18 min readLW link

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

22 Jul 2024 16:17 UTC
69 points
0 comments16 min readLW link

Trends – Ar­tifi­cial Intelligence

Archimedes3 Jun 2025 0:48 UTC
1 point
1 comment1 min readLW link
(www.bondcap.com)

In­tel­li­gence Is Jagged

Adam Train19 Feb 2025 7:08 UTC
6 points
1 comment3 min readLW link

How much chess en­g­ine progress is about adapt­ing to big­ger com­put­ers?

paulfchristiano7 Jul 2021 22:35 UTC
114 points
23 comments6 min readLW link

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun Panickssery12 Feb 2024 0:56 UTC
57 points
13 comments3 min readLW link

Com­pute Re­search Ques­tions and Met­rics—Trans­for­ma­tive AI and Com­pute [4/​4]

lennart28 Nov 2021 22:49 UTC
7 points
0 comments16 min readLW link

Fore­cast­ing Com­pute—Trans­for­ma­tive AI and Com­pute [2/​4]

lennart2 Oct 2021 15:54 UTC
17 points
0 comments19 min readLW link

Neu­ral Scal­ing Laws Rooted in the Data Distribution

aribrill20 Feb 2025 21:22 UTC
8 points
0 comments1 min readLW link
(arxiv.org)

How LLMs Learn: What We Know, What We Don’t (Yet) Know, and What Comes Next

Jonasb9 Jul 2024 9:58 UTC
2 points
0 comments16 min readLW link
(www.denominations.io)

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirk20 Jul 2023 9:56 UTC
39 points
2 comments5 min readLW link

Whisper’s Wild Implications

Ollie J3 Jan 2023 12:17 UTC
24 points
6 comments5 min readLW link

The Math of Mean­ing: A Po­ten­tial Law of Se­man­tic Structure

Erichcurtis9113 Aug 2025 3:34 UTC
1 point
0 comments2 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
16 points
5 comments7 min readLW link

Com­pute Trends — Com­par­i­son to OpenAI’s AI and Compute

12 Mar 2022 18:09 UTC
24 points
3 comments3 min readLW link

Data and “to­kens” a 30 year old hu­man “trains” on

Jose Miguel Cruz y Celis23 May 2023 5:34 UTC
16 points
15 comments1 min readLW link

What is Com­pute? - Trans­for­ma­tive AI and Com­pute [1/​4]

lennart23 Sep 2021 16:25 UTC
27 points
9 comments19 min readLW link