RSS

Scal­ing Laws

TagLast edit: 4 Oct 2021 19:20 UTC by plex

Scaling Laws refer to the observed trend of some machine learning architectures (notably transformers) to scale their performance on predictable power law when given more compute, data, or parameters (model size), assuming they are not bottlenecked on one of the other resources. This has been observed as highly consistent over more than six orders of magnitude.

Scaling laws graph from Scaling Laws for Neural Language Models

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogao2 Jun 2021 21:32 UTC
79 points
11 comments17 min readLW link

/​r/​MLS­cal­ing: new sub­red­dit for NN scal­ing re­search/​discussion

gwern30 Oct 2020 20:50 UTC
20 points
0 comments1 min readLW link
(www.reddit.com)

My ML Scal­ing bibliography

gwern23 Oct 2021 14:41 UTC
35 points
9 comments1 min readLW link
(www.gwern.net)

Google’s new text-to-image model—Parti, a demon­stra­tion of scal­ing benefits

Kayden22 Jun 2022 20:00 UTC
32 points
4 comments1 min readLW link

A closer look at chess scal­ings (into the past)

hippke15 Jul 2021 8:13 UTC
47 points
14 comments4 min readLW link

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

Ozyrus11 Oct 2021 15:28 UTC
51 points
36 comments1 min readLW link
(developer.nvidia.com)

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraist31 Mar 2022 18:01 UTC
50 points
23 comments1 min readLW link
(arxiv.org)

Ethan Ca­ballero on Pri­vate Scal­ing Progress

Michaël Trazzi5 May 2022 18:32 UTC
54 points
1 comment2 min readLW link
(theinsideview.github.io)

Pa­ram­e­ter counts in Ma­chine Learning

19 Jun 2021 16:04 UTC
45 points
16 comments7 min readLW link

How much chess en­g­ine progress is about adapt­ing to big­ger com­put­ers?

paulfchristiano7 Jul 2021 22:35 UTC
112 points
23 comments6 min readLW link

Trans­for­ma­tive AI and Com­pute [Sum­mary]

lennart26 Sep 2021 11:41 UTC
12 points
0 comments9 min readLW link

What is Com­pute? - Trans­for­ma­tive AI and Com­pute [1/​4]

lennart23 Sep 2021 16:25 UTC
24 points
8 comments19 min readLW link

Pro­posal: Scal­ing laws for RL generalization

flodorner1 Oct 2021 21:32 UTC
14 points
10 comments11 min readLW link

Fore­cast­ing Com­pute—Trans­for­ma­tive AI and Com­pute [2/​4]

lennart2 Oct 2021 15:54 UTC
17 points
0 comments19 min readLW link

Com­pute Gover­nance and Con­clu­sions—Trans­for­ma­tive AI and Com­pute [3/​4]

lennart14 Oct 2021 8:23 UTC
13 points
0 comments5 min readLW link

Com­pute Re­search Ques­tions and Met­rics—Trans­for­ma­tive AI and Com­pute [4/​4]

lennart28 Nov 2021 22:49 UTC
6 points
0 comments16 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
15 points
3 comments7 min readLW link

How I’m think­ing about GPT-N

delton13717 Jan 2022 17:11 UTC
44 points
21 comments18 min readLW link

Es­ti­mat­ing train­ing com­pute of Deep Learn­ing models

20 Jan 2022 16:12 UTC
33 points
4 comments1 min readLW link

Com­pute Trends Across Three eras of Ma­chine Learning

16 Feb 2022 14:18 UTC
91 points
13 comments2 min readLW link

Com­pute Trends — Com­par­i­son to OpenAI’s AI and Compute

12 Mar 2022 18:09 UTC
23 points
3 comments3 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanS10 Jun 2022 8:53 UTC
29 points
17 comments1 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
80 points
26 comments18 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

27 Jun 2022 15:58 UTC
151 points
10 comments7 min readLW link