RSS

Lan­guage Models

TagLast edit: 24 Sep 2021 14:16 UTC by plex

Language Models are a class of AI trained on text, usually to predict the next word or a word which has been obscured. They have the ability to generate novel prose or code based on an initial prompt, which gives rise to a kind of natural language programming called prompt engineering. The most popular architecture for very large language models is called a transformer, which follows consistent scaling laws with respect to the size of the model being trained, meaning that a larger model trained with the same amount of compute will produce results which are better by a predictable amount (when measured by the ‘perplexity’, or how surprised the AI is by a test set of human-generated text).

See also

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC
64 points
14 comments13 min readLW link

Trans­former Circuits

evhub22 Dec 2021 21:09 UTC
127 points
4 comments3 min readLW link
(transformer-circuits.pub)

Test­ing PaLM prompts on GPT3

Yitz6 Apr 2022 5:21 UTC
100 points
15 comments8 min readLW link

Cog­ni­tive Bi­ases in Large Lan­guage Models

Jan25 Sep 2021 20:59 UTC
17 points
3 comments12 min readLW link
(universalprior.substack.com)

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

Ozyrus11 Oct 2021 15:28 UTC
51 points
36 comments1 min readLW link
(developer.nvidia.com)

NLP Po­si­tion Paper: When Com­bat­ting Hype, Pro­ceed with Caution

Sam Bowman15 Oct 2021 20:57 UTC
46 points
15 comments1 min readLW link

Fore­cast­ing progress in lan­guage models

28 Oct 2021 20:40 UTC
52 points
5 comments11 min readLW link
(www.metaculus.com)

Deep­mind’s Go­pher—more pow­er­ful than GPT-3

hath8 Dec 2021 17:06 UTC
86 points
27 comments1 min readLW link
(deepmind.com)

Teaser: Hard-cod­ing Trans­former Models

MadHatter12 Dec 2021 22:04 UTC
71 points
19 comments1 min readLW link

Lan­guage Model Align­ment Re­search Internship

Ethan Perez13 Dec 2021 19:53 UTC
68 points
1 comment1 min readLW link

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

Tom Lieberum24 Dec 2021 18:05 UTC
15 points
2 comments3 min readLW link

A one-ques­tion Tur­ing test for GPT-3

22 Jan 2022 18:17 UTC
84 points
23 comments5 min readLW link

[ASoT] Some thoughts about LM monologue limi­ta­tions and ELK

leogao30 Mar 2022 14:26 UTC
9 points
0 comments2 min readLW link

Pro­ce­du­rally eval­u­at­ing fac­tual ac­cu­racy: a re­quest for research

Jacob_Hilton30 Mar 2022 16:37 UTC
24 points
2 comments6 min readLW link

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraist31 Mar 2022 18:01 UTC
50 points
23 comments1 min readLW link
(arxiv.org)

In­flec­tion AI: New startup re­lated to lan­guage models

Nisan2 Apr 2022 5:35 UTC
21 points
1 comment1 min readLW link

New Scal­ing Laws for Large Lan­guage Models

1a3orn1 Apr 2022 20:41 UTC
199 points
20 comments5 min readLW link

How to train your trans­former

p.b.7 Apr 2022 9:34 UTC
5 points
0 comments8 min readLW link

Lan­guage Model Tools for Align­ment Research

Logan Riggs8 Apr 2022 17:32 UTC
23 points
0 comments2 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
43 points
40 comments1 min readLW link

[Linkpost] New multi-modal Deep­mind model fus­ing Chin­chilla with images and videos

p.b.30 Apr 2022 3:47 UTC
52 points
16 comments1 min readLW link

[Question] What would a 10T Chin­chilla cost?

Tomás B.3 May 2022 14:48 UTC
17 points
6 comments1 min readLW link

Boot­strap­ping Lan­guage Models

harsimony27 May 2022 19:43 UTC
3 points
0 comments2 min readLW link

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogao2 Jun 2021 21:32 UTC
79 points
11 comments17 min readLW link

[AN #144]: How lan­guage mod­els can also be fine­tuned for non-lan­guage tasks

Rohin Shah2 Apr 2021 17:20 UTC
19 points
0 comments6 min readLW link
(mailchi.mp)

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
54 points
24 comments6 min readLW link

[Question] How does OpenAI’s lan­guage model af­fect our AI timeline es­ti­mates?

jimrandomh15 Feb 2019 3:11 UTC
50 points
7 comments1 min readLW link

Build­ing AGI Us­ing Lan­guage Models

leogao9 Nov 2020 16:33 UTC
11 points
1 comment1 min readLW link
(leogao.dev)

Suffi­ciently Ad­vanced Lan­guage Models Can Do Re­in­force­ment Learning

Zachary Robertson2 Aug 2020 15:32 UTC
21 points
7 comments7 min readLW link

The case for al­ign­ing nar­rowly su­per­hu­man models

Ajeya Cotra5 Mar 2021 22:29 UTC
180 points
74 comments38 min readLW link

The Codex Skep­tic FAQ

Michaël Trazzi24 Aug 2021 16:01 UTC
48 points
24 comments2 min readLW link

On lan­guage mod­el­ing and fu­ture ab­stract rea­son­ing research

alexlyzhov25 Mar 2021 17:43 UTC
3 points
1 comment1 min readLW link
(docs.google.com)

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

Struc­tured Tasks for Lan­guage Models

Zachary Robertson29 Jul 2020 14:17 UTC
5 points
0 comments1 min readLW link

[AN #164]: How well can lan­guage mod­els write code?

Rohin Shah15 Sep 2021 17:20 UTC
13 points
7 comments9 min readLW link
(mailchi.mp)

[AN #113]: Check­ing the eth­i­cal in­tu­itions of large lan­guage models

Rohin Shah19 Aug 2020 17:10 UTC
23 points
0 comments9 min readLW link
(mailchi.mp)

New GPT-3 competitor

Quintin Pope12 Aug 2021 7:05 UTC
32 points
10 comments1 min readLW link

OpenAI Codex: First Impressions

Rishit Vora13 Aug 2021 16:52 UTC
49 points
8 comments4 min readLW link
(sixeleven.in)

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC
31 points
15 comments1 min readLW link

Truth­ful and hon­est AI

29 Oct 2021 7:28 UTC
41 points
1 comment13 min readLW link

larger lan­guage mod­els may dis­ap­point you [or, an eter­nally un­finished draft]

nostalgebraist26 Nov 2021 23:08 UTC
221 points
28 comments31 min readLW link

Hard-Cod­ing Neu­ral Computation

MadHatter13 Dec 2021 4:35 UTC
30 points
8 comments27 min readLW link

Ev­i­dence Sets: Towards In­duc­tive-Bi­ases based Anal­y­sis of Pro­saic AGI

bayesian_kitten16 Dec 2021 22:41 UTC
19 points
10 comments21 min readLW link

GPT-3: a dis­ap­point­ing paper

nostalgebraist29 May 2020 19:06 UTC
68 points
44 comments8 min readLW link1 review

A Sum­mary Of An­thropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC
75 points
0 comments8 min readLW link

How I’m think­ing about GPT-N

delton13717 Jan 2022 17:11 UTC
44 points
21 comments18 min readLW link

Ex­trap­o­lat­ing GPT-N performance

Lanrian18 Dec 2020 21:41 UTC
96 points
31 comments25 min readLW link1 review

2+2: On­tolog­i­cal Framework

Lyrialtus1 Feb 2022 1:07 UTC
−15 points
2 comments15 min readLW link

EleutherAI’s GPT-NeoX-20B release

leogao10 Feb 2022 6:56 UTC
29 points
3 comments1 min readLW link
(eaidata.bmk.sh)

New GPT3 Im­pres­sive Ca­pa­bil­ities—In­struc­tGPT3 [1/​2]

WayZ13 Mar 2022 10:58 UTC
71 points
10 comments7 min readLW link

Gears-Level Men­tal Models of Trans­former Interpretability

KevinRoWang29 Mar 2022 20:09 UTC
45 points
3 comments6 min readLW link

My agenda for re­search into trans­former ca­pa­bil­ities—Introduction

p.b.5 Apr 2022 21:23 UTC
11 points
1 comment3 min readLW link

Re­search agenda: Can trans­form­ers do sys­tem 2 think­ing?

p.b.6 Apr 2022 13:31 UTC
18 points
0 comments2 min readLW link

PaLM in “Ex­trap­o­lat­ing GPT-N perfor­mance”

Lanrian6 Apr 2022 13:05 UTC
76 points
17 comments2 min readLW link

Re­search agenda—Build­ing a multi-modal chess-lan­guage model

p.b.7 Apr 2022 12:25 UTC
8 points
2 comments2 min readLW link

Is GPT3 a Good Ra­tion­al­ist? - In­struc­tGPT3 [2/​2]

WayZ7 Apr 2022 13:46 UTC
10 points
0 comments7 min readLW link

Elicit: Lan­guage Models as Re­search Assistants

9 Apr 2022 14:56 UTC
62 points
5 comments13 min readLW link

[Question] “Frag­ility of Value” vs. LLMs

Not Relevant13 Apr 2022 2:02 UTC
32 points
32 comments1 min readLW link

Why Copi­lot Ac­cel­er­ates Timelines

Michaël Trazzi26 Apr 2022 22:06 UTC
31 points
14 comments7 min readLW link

A pos­si­ble check against mo­ti­vated rea­son­ing us­ing elicit.org

david reinstein18 May 2022 20:52 UTC
4 points
0 comments1 min readLW link

RL with KL penalties is bet­ter seen as Bayesian inference

25 May 2022 9:23 UTC
39 points
3 comments12 min readLW link