RSS

Scal­ing Laws

TagLast edit: 18 Jun 2023 23:35 UTC by riley

Scaling Laws refer to the observed trend that the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as one varies the amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, or number of training steps) follows variants of power laws.

External links

Scaling laws graph from Scaling Laws for Neural Language Models

“Can AI Scal­ing Con­tinue Through 2030?”, Epoch AI (yes)

gwern24 Aug 2024 1:40 UTC
129 points
4 comments3 min readLW link
(epochai.org)

chin­chilla’s wild implications

nostalgebraist31 Jul 2022 1:18 UTC
420 points
128 comments10 min readLW link1 review

What will GPT-2030 look like?

jsteinhardt7 Jun 2023 23:40 UTC
185 points
43 comments23 min readLW link
(bounded-regret.ghost.io)

/​r/​MLS­cal­ing: new sub­red­dit for NN scal­ing re­search/​discussion

gwern30 Oct 2020 20:50 UTC
21 points
0 comments1 min readLW link
(www.reddit.com)

Ethan Ca­ballero on Bro­ken Neu­ral Scal­ing Laws, De­cep­tion, and Re­cur­sive Self Improvement

4 Nov 2022 18:09 UTC
16 points
11 comments10 min readLW link
(theinsideview.ai)

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogao2 Jun 2021 21:32 UTC
82 points
11 comments17 min readLW link

My ML Scal­ing bibliography

gwern23 Oct 2021 14:41 UTC
35 points
9 comments1 min readLW link
(www.gwern.net)

Google’s new text-to-image model—Parti, a demon­stra­tion of scal­ing benefits

Kayden22 Jun 2022 20:00 UTC
32 points
4 comments1 min readLW link

[Linkpost] Scal­ing Laws for Gen­er­a­tive Mixed-Mo­dal Lan­guage Models

Amal 12 Jan 2023 14:24 UTC
15 points
2 comments1 min readLW link
(arxiv.org)

In­verse Scal­ing Prize: Se­cond Round Winners

24 Jan 2023 20:12 UTC
58 points
17 comments15 min readLW link

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraist31 Mar 2022 18:01 UTC
51 points
23 comments1 min readLW link
(arxiv.org)

Ethan Ca­ballero on Pri­vate Scal­ing Progress

Michaël Trazzi5 May 2022 18:32 UTC
63 points
2 comments2 min readLW link
(theinsideview.github.io)

The effect of hori­zon length on scal­ing laws

Jacob_Hilton1 Feb 2023 3:59 UTC
23 points
2 comments1 min readLW link
(arxiv.org)

[Question] Non­lin­ear limi­ta­tions of ReLUs

magfrump26 Oct 2023 18:51 UTC
13 points
1 comment1 min readLW link

Mus­ings on Text Data Wall (Oct 2024)

Vladimir_Nesov5 Oct 2024 19:00 UTC
20 points
2 comments5 min readLW link

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

Ozyrus11 Oct 2021 15:28 UTC
51 points
36 comments1 min readLW link
(developer.nvidia.com)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
108 points
16 comments5 min readLW link
(arxiv.org)

In­verse scal­ing can be­come U-shaped

Edouard Harris8 Nov 2022 19:04 UTC
27 points
15 comments1 min readLW link
(arxiv.org)

Mus­ings on LLM Scale (Jul 2024)

Vladimir_Nesov3 Jul 2024 18:35 UTC
33 points
0 comments3 min readLW link

[Question] Is there a “crit­i­cal thresh­old” for LLM scal­ing laws?

Logan Zoellner30 Mar 2024 12:23 UTC
7 points
1 comment1 min readLW link

A closer look at chess scal­ings (into the past)

hippke15 Jul 2021 8:13 UTC
50 points
14 comments4 min readLW link

Pa­ram­e­ter counts in Ma­chine Learning

19 Jun 2021 16:04 UTC
47 points
18 comments7 min readLW link

How much chess en­g­ine progress is about adapt­ing to big­ger com­put­ers?

paulfchristiano7 Jul 2021 22:35 UTC
114 points
23 comments6 min readLW link

Trans­for­ma­tive AI and Com­pute [Sum­mary]

lennart26 Sep 2021 11:41 UTC
14 points
0 comments9 min readLW link

What is Com­pute? - Trans­for­ma­tive AI and Com­pute [1/​4]

lennart23 Sep 2021 16:25 UTC
27 points
9 comments19 min readLW link

Pro­posal: Scal­ing laws for RL generalization

axioman1 Oct 2021 21:32 UTC
14 points
12 comments11 min readLW link

Fore­cast­ing Com­pute—Trans­for­ma­tive AI and Com­pute [2/​4]

lennart2 Oct 2021 15:54 UTC
17 points
0 comments19 min readLW link

Com­pute Gover­nance and Con­clu­sions—Trans­for­ma­tive AI and Com­pute [3/​4]

lennart14 Oct 2021 8:23 UTC
13 points
0 comments5 min readLW link

Com­pute Re­search Ques­tions and Met­rics—Trans­for­ma­tive AI and Com­pute [4/​4]

lennart28 Nov 2021 22:49 UTC
7 points
0 comments16 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
16 points
5 comments7 min readLW link

How I’m think­ing about GPT-N

delton13717 Jan 2022 17:11 UTC
54 points
21 comments18 min readLW link

Es­ti­mat­ing train­ing com­pute of Deep Learn­ing models

20 Jan 2022 16:12 UTC
37 points
4 comments1 min readLW link

Com­pute Trends Across Three eras of Ma­chine Learning

16 Feb 2022 14:18 UTC
94 points
13 comments2 min readLW link

Com­pute Trends — Com­par­i­son to OpenAI’s AI and Compute

12 Mar 2022 18:09 UTC
23 points
3 comments3 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanS10 Jun 2022 8:53 UTC
25 points
21 comments1 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
86 points
30 comments15 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

27 Jun 2022 15:58 UTC
171 points
14 comments7 min readLW link

Trends in GPU price-performance

1 Jul 2022 15:51 UTC
85 points
12 comments1 min readLW link1 review
(epochai.org)

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo Villalobos18 Jul 2022 16:51 UTC
20 points
0 comments1 min readLW link
(epochai.org)

A Quick Note on AI Scal­ing Asymptotes

alyssavance25 May 2022 2:55 UTC
44 points
7 comments1 min readLW link

How should Deep­Mind’s Chin­chilla re­vise our AI fore­casts?

Cleo Nardo15 Sep 2022 17:54 UTC
35 points
12 comments13 min readLW link

Smoke with­out fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC
51 points
22 comments4 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

20 Oct 2022 0:20 UTC
103 points
13 comments1 min readLW link
(arxiv.org)

Mas­sive Scal­ing Should be Frowned Upon

harsimony17 Nov 2022 8:43 UTC
4 points
6 comments5 min readLW link

[Question] Up­dates on scal­ing laws for foun­da­tion mod­els from ′ Tran­scend­ing Scal­ing Laws with 0.1% Ex­tra Com­pute’

Nick_Greig18 Nov 2022 12:46 UTC
15 points
2 comments1 min readLW link

Some Ar­gu­ments Against Strong Scaling

Joar Skalse13 Jan 2023 12:04 UTC
26 points
21 comments16 min readLW link

Whisper’s Wild Implications

Ollie J3 Jan 2023 12:17 UTC
19 points
6 comments5 min readLW link

Pa­ram­e­ter Scal­ing Comes for RL, Maybe

1a3orn24 Jan 2023 13:55 UTC
99 points
3 comments14 min readLW link

Scal­ing Laws Liter­a­ture Review

Pablo Villalobos27 Jan 2023 19:57 UTC
36 points
1 comment4 min readLW link
(epochai.org)

The Per­cep­tron Controversy

Yuxi_Liu10 Jan 2024 23:07 UTC
65 points
18 comments1 min readLW link
(yuxi-liu-wired.github.io)

Pre­dict­ing AGI by the Tur­ing Test

Yuxi_Liu22 Jan 2024 4:22 UTC
21 points
2 comments10 min readLW link
(yuxi-liu-wired.github.io)

Trans­fer learn­ing and gen­er­al­iza­tion-qua-ca­pa­bil­ity in Bab­bage and Davinci (or, why di­vi­sion is bet­ter than Span­ish)

RP and agg
9 Feb 2024 7:00 UTC
50 points
6 comments3 min readLW link

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun Panickssery12 Feb 2024 0:56 UTC
55 points
13 comments3 min readLW link

Scal­ing Laws and Superposition

Pavan Katta10 Apr 2024 15:36 UTC
9 points
4 comments5 min readLW link
(www.pavankatta.com)

How LLMs Learn: What We Know, What We Don’t (Yet) Know, and What Comes Next

Jonasb9 Jul 2024 9:58 UTC
2 points
0 comments16 min readLW link
(www.denominations.io)

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

22 Jul 2024 16:17 UTC
69 points
0 comments16 min readLW link

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC
1 point
0 comments9 min readLW link

The Quan­ti­za­tion Model of Neu­ral Scaling

nz31 Mar 2023 16:02 UTC
17 points
0 comments1 min readLW link
(arxiv.org)

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

8 Nov 2023 11:37 UTC
49 points
0 comments18 min readLW link

What’s new at FAR AI

4 Dec 2023 21:18 UTC
41 points
0 comments5 min readLW link
(far.ai)

Data and “to­kens” a 30 year old hu­man “trains” on

Jose Miguel Cruz y Celis23 May 2023 5:34 UTC
15 points
15 comments1 min readLW link

Why Job Dis­place­ment Pre­dic­tions are Wrong: Ex­pla­na­tions of Cog­ni­tive Automation

Moritz Wallawitsch30 May 2023 20:43 UTC
−4 points
0 comments8 min readLW link

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirk20 Jul 2023 9:56 UTC
39 points
2 comments5 min readLW link

[Linkpost] Ap­pli­ca­bil­ity of scal­ing laws to vi­sion en­cod­ing models

Bogdan Ionut Cirstea5 Aug 2023 11:10 UTC
11 points
2 comments1 min readLW link

[Question] Clar­ify­ing how mis­al­ign­ment can arise from scal­ing LLMs

Util19 Aug 2023 14:16 UTC
3 points
1 comment1 min readLW link