RSS

Lan­guage Models

TagLast edit: 11 May 2023 8:20 UTC by Yaakov T

Language models are computer programs made to estimate the likelihood of a piece of text. “Hello, how are you?” is likely. “Hello, fnarg horses” is unlikely.

Language models can answer questions by estimating the likelihood of possible question-and-answer pairs, selecting the most likely question-and-answer pair. “Q: How are You? A: Very well, thank you” is a likely question-and-answer pair. “Q: How are You? A: Correct horse battery staple” is an unlikely question-and-answer pair.

The language models most relevant to AI safety are language models based on “deep learning”. Deep-learning-based language models can be “trained” to understand language better, by exposing them to text written by humans. There is a lot of human-written text on the internet, providing loads of training material.

Deep-learning-based language models are getting bigger and better trained. As the models become stronger, they get new skills. These skills include arithmetic, explaining jokes, programming, and solving math problems.

There is a potential risk of these models developing dangerous capabilities as they grow larger and better trained. What additional skills will they develop given a few years?

See also

[Question] How does OpenAI’s lan­guage model af­fect our AI timeline es­ti­mates?

jimrandomh15 Feb 2019 3:11 UTC
50 points
7 comments1 min readLW link

GPT-3: a dis­ap­point­ing paper

nostalgebraist29 May 2020 19:06 UTC
65 points
43 comments8 min readLW link1 review

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

[AN #113]: Check­ing the eth­i­cal in­tu­itions of large lan­guage models

Rohin Shah19 Aug 2020 17:10 UTC
23 points
0 comments9 min readLW link
(mailchi.mp)

Build­ing AGI Us­ing Lan­guage Models

leogao9 Nov 2020 16:33 UTC
11 points
1 comment1 min readLW link
(leogao.dev)

Ex­trap­o­lat­ing GPT-N performance

Lukas Finnveden18 Dec 2020 21:41 UTC
108 points
31 comments22 min readLW link1 review

The case for al­ign­ing nar­rowly su­per­hu­man models

Ajeya Cotra5 Mar 2021 22:29 UTC
184 points
75 comments38 min readLW link1 review

On lan­guage mod­el­ing and fu­ture ab­stract rea­son­ing research

alexlyzhov25 Mar 2021 17:43 UTC
3 points
1 comment1 min readLW link
(docs.google.com)

[AN #144]: How lan­guage mod­els can also be fine­tuned for non-lan­guage tasks

Rohin Shah2 Apr 2021 17:20 UTC
19 points
0 comments6 min readLW link
(mailchi.mp)

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogao2 Jun 2021 21:32 UTC
82 points
11 comments17 min readLW link

New GPT-3 competitor

Quintin Pope12 Aug 2021 7:05 UTC
32 points
10 comments1 min readLW link

OpenAI Codex: First Impressions

specbug13 Aug 2021 16:52 UTC
49 points
8 comments4 min readLW link
(sixeleven.in)

The Codex Skep­tic FAQ

Michaël Trazzi24 Aug 2021 16:01 UTC
49 points
24 comments2 min readLW link

[AN #164]: How well can lan­guage mod­els write code?

Rohin Shah15 Sep 2021 17:20 UTC
13 points
7 comments9 min readLW link
(mailchi.mp)

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
58 points
24 comments6 min readLW link

Cog­ni­tive Bi­ases in Large Lan­guage Models

Jan25 Sep 2021 20:59 UTC
18 points
3 comments12 min readLW link
(universalprior.substack.com)

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

Ozyrus11 Oct 2021 15:28 UTC
51 points
36 comments1 min readLW link
(developer.nvidia.com)

NLP Po­si­tion Paper: When Com­bat­ting Hype, Pro­ceed with Caution

Sam Bowman15 Oct 2021 20:57 UTC
46 points
14 comments1 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC
31 points
15 comments1 min readLW link

Fore­cast­ing progress in lan­guage models

28 Oct 2021 20:40 UTC
62 points
6 comments11 min readLW link
(www.metaculus.com)

Truth­ful and hon­est AI

29 Oct 2021 7:28 UTC
42 points
1 comment13 min readLW link

larger lan­guage mod­els may dis­ap­point you [or, an eter­nally un­finished draft]

nostalgebraist26 Nov 2021 23:08 UTC
253 points
31 comments31 min readLW link2 reviews

Deep­mind’s Go­pher—more pow­er­ful than GPT-3

hath8 Dec 2021 17:06 UTC
86 points
26 comments1 min readLW link
(deepmind.com)

Teaser: Hard-cod­ing Trans­former Models

MadHatter12 Dec 2021 22:04 UTC
74 points
19 comments1 min readLW link

Hard-Cod­ing Neu­ral Computation

MadHatter13 Dec 2021 4:35 UTC
34 points
8 comments27 min readLW link

Lan­guage Model Align­ment Re­search Internships

Ethan Perez13 Dec 2021 19:53 UTC
74 points
1 comment1 min readLW link

Ev­i­dence Sets: Towards In­duc­tive-Bi­ases based Anal­y­sis of Pro­saic AGI

bayesian_kitten16 Dec 2021 22:41 UTC
22 points
10 comments21 min readLW link

Trans­former Circuits

evhub22 Dec 2021 21:09 UTC
144 points
4 comments3 min readLW link
(transformer-circuits.pub)

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

Tom Lieberum24 Dec 2021 18:05 UTC
16 points
2 comments3 min readLW link

A Sum­mary Of An­thropic’s First Paper

Sam Ringer30 Dec 2021 0:48 UTC
82 points
1 comment8 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC
65 points
14 comments13 min readLW link

How I’m think­ing about GPT-N

delton13717 Jan 2022 17:11 UTC
54 points
21 comments18 min readLW link

A one-ques­tion Tur­ing test for GPT-3

22 Jan 2022 18:17 UTC
84 points
25 comments5 min readLW link

2+2: On­tolog­i­cal Framework

Lyrialtus1 Feb 2022 1:07 UTC
−15 points
2 comments12 min readLW link

QNR prospects are im­por­tant for AI al­ign­ment research

Eric Drexler3 Feb 2022 15:20 UTC
85 points
12 comments11 min readLW link1 review

EleutherAI’s GPT-NeoX-20B release

leogao10 Feb 2022 6:56 UTC
30 points
3 comments1 min readLW link
(eaidata.bmk.sh)

New GPT3 Im­pres­sive Ca­pa­bil­ities—In­struc­tGPT3 [1/​2]

simeon_c13 Mar 2022 10:58 UTC
72 points
10 comments7 min readLW link

Gears-Level Men­tal Models of Trans­former Interpretability

KevinRoWang29 Mar 2022 20:09 UTC
70 points
4 comments6 min readLW link

[ASoT] Some thoughts about LM monologue limi­ta­tions and ELK

leogao30 Mar 2022 14:26 UTC
10 points
0 comments2 min readLW link

Pro­ce­du­rally eval­u­at­ing fac­tual ac­cu­racy: a re­quest for research

Jacob_Hilton30 Mar 2022 16:37 UTC
25 points
2 comments6 min readLW link

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraist31 Mar 2022 18:01 UTC
51 points
23 comments1 min readLW link
(arxiv.org)

New Scal­ing Laws for Large Lan­guage Models

1a3orn1 Apr 2022 20:41 UTC
243 points
22 comments5 min readLW link

In­flec­tion AI: New startup re­lated to lan­guage models

Nisan2 Apr 2022 5:35 UTC
21 points
1 comment1 min readLW link

My agenda for re­search into trans­former ca­pa­bil­ities—Introduction

p.b.5 Apr 2022 21:23 UTC
11 points
1 comment3 min readLW link

Test­ing PaLM prompts on GPT3

Yitz6 Apr 2022 5:21 UTC
103 points
14 comments8 min readLW link

PaLM in “Ex­trap­o­lat­ing GPT-N perfor­mance”

Lukas Finnveden6 Apr 2022 13:05 UTC
83 points
19 comments2 min readLW link

Re­search agenda: Can trans­form­ers do sys­tem 2 think­ing?

p.b.6 Apr 2022 13:31 UTC
20 points
0 comments2 min readLW link

How to train your trans­former

p.b.7 Apr 2022 9:34 UTC
6 points
0 comments8 min readLW link

Re­search agenda—Build­ing a multi-modal chess-lan­guage model

p.b.7 Apr 2022 12:25 UTC
8 points
2 comments2 min readLW link

Is GPT3 a Good Ra­tion­al­ist? - In­struc­tGPT3 [2/​2]

simeon_c7 Apr 2022 13:46 UTC
11 points
0 comments7 min readLW link

Lan­guage Model Tools for Align­ment Research

Logan Riggs8 Apr 2022 17:32 UTC
28 points
0 comments2 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
47 points
42 comments1 min readLW link

Elicit: Lan­guage Models as Re­search Assistants

9 Apr 2022 14:56 UTC
71 points
6 comments13 min readLW link

[Question] “Frag­ility of Value” vs. LLMs

Not Relevant13 Apr 2022 2:02 UTC
34 points
33 comments1 min readLW link

Why Copi­lot Ac­cel­er­ates Timelines

Michaël Trazzi26 Apr 2022 22:06 UTC
35 points
14 comments7 min readLW link

[Linkpost] New multi-modal Deep­mind model fus­ing Chin­chilla with images and videos

p.b.30 Apr 2022 3:47 UTC
53 points
18 comments1 min readLW link

A pos­si­ble check against mo­ti­vated rea­son­ing us­ing elicit.org

david reinstein18 May 2022 20:52 UTC
3 points
0 comments1 min readLW link

RL with KL penalties is bet­ter seen as Bayesian inference

25 May 2022 9:23 UTC
114 points
17 comments12 min readLW link

Boot­strap­ping Lan­guage Models

harsimony27 May 2022 19:43 UTC
7 points
5 comments2 min readLW link

Paper: Teach­ing GPT3 to ex­press un­cer­tainty in words

Owain_Evans31 May 2022 13:27 UTC
97 points
7 comments4 min readLW link

Who mod­els the mod­els that model mod­els? An ex­plo­ra­tion of GPT-3′s in-con­text model fit­ting ability

Lovre7 Jun 2022 19:37 UTC
112 points
15 comments9 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanS10 Jun 2022 8:53 UTC
25 points
21 comments1 min readLW link

In­ves­ti­gat­ing causal un­der­stand­ing in LLMs

14 Jun 2022 13:57 UTC
28 points
6 comments13 min readLW link

Con­tra Hofs­tadter on GPT-3 Nonsense

rictic15 Jun 2022 21:53 UTC
236 points
24 comments2 min readLW link

Lamda is not an LLM

Kevin19 Jun 2022 11:13 UTC
7 points
10 comments1 min readLW link
(www.wired.com)

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
85 points
30 comments18 min readLW link

Con­di­tion­ing Gen­er­a­tive Models

Adam Jermyn25 Jun 2022 22:15 UTC
24 points
18 comments10 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

27 Jun 2022 15:58 UTC
169 points
14 comments7 min readLW link

Yann LeCun, A Path Towards Au­tonomous Ma­chine In­tel­li­gence [link]

Bill Benzon27 Jun 2022 23:29 UTC
5 points
1 comment1 min readLW link

Assess­ing AlephAlphas Mul­ti­modal Model

p.b.28 Jun 2022 9:28 UTC
30 points
5 comments3 min readLW link

[Linkpost] Solv­ing Quan­ti­ta­tive Rea­son­ing Prob­lems with Lan­guage Models

Yitz30 Jun 2022 18:58 UTC
76 points
15 comments2 min readLW link
(storage.googleapis.com)

GPT-3 Catch­ing Fish in Morse Code

Megan Kinniment30 Jun 2022 21:22 UTC
117 points
27 comments8 min readLW link

Minerva

Algon1 Jul 2022 20:06 UTC
35 points
6 comments2 min readLW link
(ai.googleblog.com)

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_Hilton13 Jul 2022 21:58 UTC
57 points
3 comments1 min readLW link
(github.com)

Train­ing goals for large lan­guage models

Johannes Treutlein18 Jul 2022 7:09 UTC
28 points
5 comments19 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
58 points
8 comments20 min readLW link

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth Barnes19 Jul 2022 4:55 UTC
95 points
6 comments2 min readLW link

Con­di­tion­ing Gen­er­a­tive Models with Restrictions

Adam Jermyn21 Jul 2022 20:33 UTC
18 points
4 comments8 min readLW link

[Question] Im­pact of ” ‘Let’s think step by step’ is all you need”?

yrimon24 Jul 2022 20:59 UTC
20 points
2 comments1 min readLW link

chin­chilla’s wild implications

nostalgebraist31 Jul 2022 1:18 UTC
410 points
128 comments11 min readLW link1 review

Ex­ter­nal­ized rea­son­ing over­sight: a re­search di­rec­tion for lan­guage model alignment

tamera3 Aug 2022 12:03 UTC
130 points
23 comments6 min readLW link

Trans­former lan­guage mod­els are do­ing some­thing more general

Numendil3 Aug 2022 21:13 UTC
53 points
6 comments2 min readLW link

Emer­gent Abil­ities of Large Lan­guage Models [Linkpost]

aogara10 Aug 2022 18:02 UTC
25 points
2 comments1 min readLW link
(arxiv.org)

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

11 Aug 2022 17:45 UTC
182 points
59 comments13 min readLW link1 review

A lit­tle play­ing around with Blen­der­bot3

Nathan Helm-Burger12 Aug 2022 16:06 UTC
9 points
0 comments1 min readLW link

Con­di­tion­ing, Prompts, and Fine-Tuning

Adam Jermyn17 Aug 2022 20:52 UTC
38 points
9 comments4 min readLW link

[Question] Are lan­guage mod­els close to the su­per­hu­man level in philos­o­phy?

Roman Leventov19 Aug 2022 4:43 UTC
6 points
2 comments2 min readLW link

Google AI in­te­grates PaLM with robotics: SayCan up­date [Linkpost]

Evan R. Murphy24 Aug 2022 20:54 UTC
25 points
0 comments1 min readLW link
(sites.research.google)

A Test for Lan­guage Model Consciousness

Ethan Perez25 Aug 2022 19:41 UTC
18 points
14 comments9 min readLW link

Strat­egy For Con­di­tion­ing Gen­er­a­tive Models

1 Sep 2022 4:34 UTC
31 points
4 comments18 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC
594 points
161 comments41 min readLW link8 reviews
(generative.ink)

Is train­ing data go­ing to be diluted by AI-gen­er­ated con­tent?

Hannes Thurnherr7 Sep 2022 18:13 UTC
10 points
7 comments1 min readLW link

Alex­aTM − 20 Billion Pa­ram­e­ter Model With Im­pres­sive Performance

ViktorThink9 Sep 2022 21:46 UTC
5 points
0 comments1 min readLW link

How should Deep­Mind’s Chin­chilla re­vise our AI fore­casts?

Cleo Nardo15 Sep 2022 17:54 UTC
35 points
12 comments13 min readLW link

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmz17 Sep 2022 3:55 UTC
143 points
12 comments6 min readLW link1 review

Sparse tri­nary weighted RNNs as a path to bet­ter lan­guage model interpretability

Am8ryllis17 Sep 2022 19:48 UTC
19 points
13 comments3 min readLW link

[Question] If we have Hu­man-level chat­bots, won’t we end up be­ing ruled by pos­si­ble peo­ple?

Erlja Jkdf.20 Sep 2022 13:59 UTC
5 points
13 comments1 min readLW link

An Un­ex­pected GPT-3 De­ci­sion in a Sim­ple Gam­ble

hatta_afiq25 Sep 2022 16:46 UTC
8 points
4 comments1 min readLW link

Brief Notes on Transformers

Adam Jermyn26 Sep 2022 14:46 UTC
46 points
3 comments2 min readLW link

In­verse Scal­ing Prize: Round 1 Winners

26 Sep 2022 19:57 UTC
93 points
16 comments4 min readLW link
(irmckenzie.co.uk)

Paper: Large Lan­guage Models Can Self-im­prove [Linkpost]

Evan R. Murphy2 Oct 2022 1:29 UTC
52 points
14 comments1 min readLW link
(openreview.net)

Re­call and Re­gur­gi­ta­tion in GPT2

Megan Kinniment3 Oct 2022 19:35 UTC
43 points
1 comment26 min readLW link

Smoke with­out fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC
51 points
22 comments4 min readLW link

Re­sults from the lan­guage model hackathon

Esben Kran10 Oct 2022 8:29 UTC
22 points
1 comment4 min readLW link

They gave LLMs ac­cess to physics simulators

ryan_b17 Oct 2022 21:21 UTC
50 points
18 comments1 min readLW link
(arxiv.org)

Is GPT-N bounded by hu­man ca­pa­bil­ities? No.

Cleo Nardo17 Oct 2022 23:26 UTC
46 points
8 comments2 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

What will the scaled up GATO look like? (Up­dated with ques­tions)

Amal 25 Oct 2022 12:44 UTC
34 points
22 comments1 min readLW link

[simu­la­tion] 4chan user claiming to be the at­tor­ney hired by Google’s sen­tient chat­bot LaMDA shares wild de­tails of encounter

janus10 Nov 2022 21:39 UTC
19 points
1 comment13 min readLW link
(generative.ink)

LLMs may cap­ture key com­po­nents of hu­man agency

catubc17 Nov 2022 20:14 UTC
26 points
0 comments4 min readLW link

Hu­man-level Full-Press Di­plo­macy (some bare facts).

Cleo Nardo22 Nov 2022 20:59 UTC
50 points
7 comments3 min readLW link

Gliders in Lan­guage Models

Alexandre Variengien25 Nov 2022 0:38 UTC
30 points
11 comments10 min readLW link

Did ChatGPT just gaslight me?

ThomasW1 Dec 2022 5:41 UTC
123 points
45 comments9 min readLW link
(aiwatchtower.substack.com)

[ASoT] Fine­tun­ing, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC
44 points
8 comments5 min readLW link

Chat GPT’s views on Me­ta­physics and Ethics

Cole Killian3 Dec 2022 18:12 UTC
5 points
3 comments1 min readLW link
(twitter.com)

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahoj3 Dec 2022 20:32 UTC
1 point
8 comments1 min readLW link

Is the “Valley of Con­fused Ab­strac­tions” real?

jacquesthibs5 Dec 2022 13:36 UTC
19 points
11 comments2 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

5 Dec 2022 20:28 UTC
40 points
19 comments10 min readLW link

Shh, don’t tell the AI it’s likely to be evil

naterush6 Dec 2022 3:35 UTC
19 points
9 comments1 min readLW link

[Question] Does a LLM have a util­ity func­tion?

Dagon9 Dec 2022 17:19 UTC
17 points
11 comments1 min readLW link

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC
40 points
2 comments5 min readLW link

A brain­teaser for lan­guage models

Adam Scherlis12 Dec 2022 2:43 UTC
47 points
3 comments2 min readLW link

An ex­plo­ra­tion of GPT-2′s em­bed­ding weights

Adam Scherlis13 Dec 2022 0:46 UTC
41 points
4 comments10 min readLW link

Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Supervision

Xodarap14 Dec 2022 12:32 UTC
45 points
1 comment1 min readLW link
(arxiv.org)

Ex­tract­ing and Eval­u­at­ing Causal Direc­tion in LLMs’ Activations

14 Dec 2022 14:33 UTC
29 points
5 comments11 min readLW link

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie Steiner18 Dec 2022 14:14 UTC
32 points
0 comments2 min readLW link

Prop­er­ties of cur­rent AIs and some pre­dic­tions of the evolu­tion of AI from the per­spec­tive of scale-free the­o­ries of agency and reg­u­la­tive development

Roman Leventov20 Dec 2022 17:13 UTC
33 points
3 comments36 min readLW link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

20 Dec 2022 20:08 UTC
100 points
34 comments1 min readLW link
(www.anthropic.com)

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Akash20 Dec 2022 21:39 UTC
18 points
2 comments11 min readLW link

Notes on Meta’s Di­plo­macy-Play­ing AI

Erich_Grunewald22 Dec 2022 11:34 UTC
9 points
2 comments14 min readLW link
(www.erichgrunewald.com)

How evolu­tion­ary lineages of LLMs can plan their own fu­ture and act on these plans

Roman Leventov25 Dec 2022 18:11 UTC
39 points
16 comments8 min readLW link

Mlyyrczo

lsusr26 Dec 2022 7:58 UTC
41 points
14 comments3 min readLW link

Re­cent ad­vances in Nat­u­ral Lan­guage Pro­cess­ing—Some Woolly spec­u­la­tions (2019 es­say on se­man­tics and lan­guage mod­els)

philosophybear27 Dec 2022 2:11 UTC
1 point
0 comments7 min readLW link

‘simu­la­tor’ fram­ing and con­fu­sions about LLMs

Beth Barnes31 Dec 2022 23:38 UTC
104 points
11 comments4 min readLW link

Large lan­guage mod­els can provide “nor­ma­tive as­sump­tions” for learn­ing hu­man preferences

Stuart_Armstrong2 Jan 2023 19:39 UTC
29 points
12 comments3 min readLW link

MAKE IT BETTER (a po­etic demon­stra­tion of the ba­nal­ity of GPT-3)

rogersbacon2 Jan 2023 20:47 UTC
7 points
2 comments5 min readLW link

On the nat­u­ral­is­tic study of the lin­guis­tic be­hav­ior of ar­tifi­cial intelligence

Bill Benzon3 Jan 2023 9:06 UTC
1 point
0 comments4 min readLW link

Whisper’s Wild Implications

Ollie J3 Jan 2023 12:17 UTC
19 points
6 comments5 min readLW link

The Limit of Lan­guage Models

DragonGod6 Jan 2023 23:53 UTC
43 points
26 comments4 min readLW link

How it feels to have your mind hacked by an AI

blaked12 Jan 2023 0:33 UTC
354 points
219 comments17 min readLW link

[Linkpost] Scal­ing Laws for Gen­er­a­tive Mixed-Mo­dal Lan­guage Models

Amal 12 Jan 2023 14:24 UTC
15 points
2 comments1 min readLW link
(arxiv.org)

Pro­posal for In­duc­ing Steganog­ra­phy in LMs

Logan Riggs12 Jan 2023 22:15 UTC
22 points
2 comments2 min readLW link

Some Ar­gu­ments Against Strong Scaling

Joar Skalse13 Jan 2023 12:04 UTC
25 points
21 comments16 min readLW link

[Question] Ba­sic Ques­tion about LLMs: how do they know what task to perform

Garak14 Jan 2023 13:13 UTC
1 point
3 comments1 min readLW link

Spec­u­la­tion on Path-Depen­dance in Large Lan­guage Models.

NickyP15 Jan 2023 20:42 UTC
16 points
2 comments7 min readLW link

Un­der­stand­ing the diffu­sion of large lan­guage mod­els: summary

Ben Cottier16 Jan 2023 1:37 UTC
26 points
1 comment1 min readLW link

Lan­guage mod­els can gen­er­ate su­pe­rior text com­pared to their input

ChristianKl17 Jan 2023 10:57 UTC
47 points
28 comments1 min readLW link

Thoughts on re­fus­ing harm­ful re­quests to large lan­guage models

William_S19 Jan 2023 19:49 UTC
30 points
4 comments2 min readLW link

Cri­tique of some re­cent philos­o­phy of LLMs’ minds

Roman Leventov20 Jan 2023 12:53 UTC
51 points
8 comments20 min readLW link

Emo­tional at­tach­ment to AIs opens doors to problems

Igor Ivanov22 Jan 2023 20:28 UTC
20 points
10 comments4 min readLW link

ChatGPT in­ti­mates a tan­ta­l­iz­ing fu­ture; its core LLM is or­ga­nized on mul­ti­ple lev­els; and it has bro­ken the idea of think­ing.

Bill Benzon24 Jan 2023 19:05 UTC
5 points
0 comments5 min readLW link

In­verse Scal­ing Prize: Se­cond Round Winners

24 Jan 2023 20:12 UTC
58 points
17 comments15 min readLW link

In­ner Misal­ign­ment in “Si­mu­la­tor” LLMs

Adam Scherlis31 Jan 2023 8:33 UTC
84 points
11 comments4 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Large lan­guage mod­els as predictors

2 Feb 2023 20:28 UTC
88 points
4 comments13 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Outer al­ign­ment via care­ful conditioning

2 Feb 2023 20:28 UTC
70 points
13 comments57 min readLW link

SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

5 Feb 2023 22:02 UTC
660 points
204 comments12 min readLW link

SolidGoldMag­ikarp II: tech­ni­cal de­tails and more re­cent findings

6 Feb 2023 19:09 UTC
109 points
45 comments13 min readLW link

Con­di­tion­ing Pre­dic­tive Models: The case for competitiveness

6 Feb 2023 20:08 UTC
20 points
3 comments11 min readLW link

Early situ­a­tional aware­ness and its im­pli­ca­tions, a story

Jacob Pfau6 Feb 2023 20:45 UTC
29 points
6 comments3 min readLW link

Two very differ­ent ex­pe­riences with ChatGPT

Sherrinford7 Feb 2023 13:09 UTC
38 points
15 comments5 min readLW link

On The Cur­rent Sta­tus Of AI Dating

Nikita Brancatisano7 Feb 2023 20:00 UTC
52 points
8 comments6 min readLW link

Con­di­tion­ing Pre­dic­tive Models: In­ter­ac­tions with other approaches

8 Feb 2023 18:19 UTC
32 points
2 comments11 min readLW link

Notes on the Math­e­mat­ics of LLM Architectures

Spencer Becker-Kahn9 Feb 2023 1:45 UTC
12 points
2 comments1 min readLW link
(drive.google.com)

Con­di­tion­ing Pre­dic­tive Models: De­ploy­ment strategy

9 Feb 2023 20:59 UTC
28 points
0 comments10 min readLW link

A note on ‘semiotic physics’

metasemi11 Feb 2023 5:12 UTC
11 points
13 comments6 min readLW link

In Defense of Chat­bot Romance

Kaj_Sotala11 Feb 2023 14:30 UTC
123 points
52 comments11 min readLW link
(kajsotala.fi)

LLM Ba­sics: Embed­ding Spaces—Trans­former To­ken Vec­tors Are Not Points in Space

NickyP13 Feb 2023 18:52 UTC
70 points
11 comments15 min readLW link

[Question] Is In­struc­tGPT Fol­low­ing In­struc­tions in Other Lan­guages Sur­pris­ing?

DragonGod13 Feb 2023 23:26 UTC
39 points
15 comments1 min readLW link

Bing Chat is blatantly, ag­gres­sively misaligned

evhub15 Feb 2023 5:29 UTC
396 points
167 comments2 min readLW link

A poem co-writ­ten by ChatGPT

Sherrinford16 Feb 2023 10:17 UTC
13 points
0 comments7 min readLW link

Pow­er­ful mesa-op­ti­mi­sa­tion is already here

Roman Leventov17 Feb 2023 4:59 UTC
35 points
1 comment2 min readLW link
(arxiv.org)

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC
112 points
62 comments3 min readLW link

Microsoft and OpenAI, stop tel­ling chat­bots to role­play as AI

hold_my_fish17 Feb 2023 19:55 UTC
49 points
10 comments1 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC
109 points
27 comments11 min readLW link

Stop post­ing prompt in­jec­tions on Twit­ter and call­ing it “mis­al­ign­ment”

lc19 Feb 2023 2:21 UTC
138 points
9 comments1 min readLW link

The idea that ChatGPT is sim­ply “pre­dict­ing” the next word is, at best, misleading

Bill Benzon20 Feb 2023 11:32 UTC
55 points
87 comments5 min readLW link

Syd­ney the Bin­gena­tor Can’t Think, But It Still Threat­ens People

Valentin Baltadzhiev20 Feb 2023 18:37 UTC
−3 points
2 comments8 min readLW link

[Preprint] Pre­train­ing Lan­guage Models with Hu­man Preferences

Giulio21 Feb 2023 11:44 UTC
12 points
0 comments1 min readLW link
(arxiv.org)

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
133 points
18 comments11 min readLW link

What do lan­guage mod­els know about fic­tional char­ac­ters?

skybrian22 Feb 2023 5:58 UTC
6 points
0 comments4 min readLW link

[Question] In­ject­ing noise to GPT to get mul­ti­ple answers

bipolo22 Feb 2023 20:02 UTC
1 point
1 comment1 min readLW link

Hello, Elua.

Tamsin Leake23 Feb 2023 5:19 UTC
37 points
18 comments4 min readLW link
(carado.moe)

Meta “open sources” LMs com­pet­i­tive with Chin­chilla, PaLM, and code-davinci-002 (Paper)

LawrenceC24 Feb 2023 19:57 UTC
38 points
19 comments1 min readLW link
(research.facebook.com)

A Pro­posed Test to Deter­mine the Ex­tent to Which Large Lan­guage Models Un­der­stand the Real World

Bruce G24 Feb 2023 20:20 UTC
4 points
7 comments8 min readLW link

Evil au­to­com­plete: Ex­is­ten­tial Risk and Next-To­ken Predictors

Yitz28 Feb 2023 8:47 UTC
9 points
3 comments5 min readLW link

How truth­ful can LLMs be: a the­o­ret­i­cal per­spec­tive with a re­quest for help from ex­perts on The­o­ret­i­cal CS

sergia1 Mar 2023 18:39 UTC
3 points
7 comments3 min readLW link

Reflec­tion Mechanisms as an Align­ment Tar­get—At­ti­tudes on “near-term” AI

2 Mar 2023 4:29 UTC
20 points
0 comments8 min readLW link

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC
615 points
187 comments16 min readLW link

Si­tu­a­tional aware­ness in Large Lan­guage Models

Simon Möller3 Mar 2023 18:59 UTC
28 points
2 comments7 min readLW link

Google’s PaLM-E: An Em­bod­ied Mul­ti­modal Lan­guage Model

SandXbox7 Mar 2023 4:11 UTC
86 points
7 comments1 min readLW link
(palm-e.github.io)

The View from 30,000 Feet: Pre­face to the Se­cond EleutherAI Retrospective

7 Mar 2023 16:22 UTC
14 points
0 comments4 min readLW link
(blog.eleuther.ai)

Lan­guage mod­els are not in­her­ently safe

Olli Järviniemi7 Mar 2023 21:15 UTC
11 points
1 comment3 min readLW link

Against LLM Reductionism

Erich_Grunewald8 Mar 2023 15:52 UTC
137 points
16 comments18 min readLW link
(www.erichgrunewald.com)

Stop call­ing it “jailbreak­ing” ChatGPT

Templarrr10 Mar 2023 11:41 UTC
10 points
9 comments2 min readLW link

The is­sue of mean­ing in large lan­guage mod­els (LLMs)

Bill Benzon11 Mar 2023 23:00 UTC
1 point
34 comments8 min readLW link

GPT can write Quines now (GPT-4)

Andrew_Critch14 Mar 2023 19:18 UTC
111 points
30 comments1 min readLW link

No­kens: A po­ten­tial method of in­ves­ti­gat­ing glitch tokens

Hoagy15 Mar 2023 16:23 UTC
18 points
0 comments4 min readLW link

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcs15 Mar 2023 17:55 UTC
178 points
41 comments1 min readLW link

[Question] Will 2023 be the last year you can write short sto­ries and re­ceive most of the in­tel­lec­tual credit for writ­ing them?

lc16 Mar 2023 21:36 UTC
20 points
11 comments1 min readLW link

Grad­ual take­off, fast failure

Max H16 Mar 2023 22:02 UTC
15 points
4 comments5 min readLW link

Su­per-Luigi = Luigi + (Luigi—Waluigi)

Alexei17 Mar 2023 15:27 UTC
16 points
9 comments1 min readLW link

[Question] Are nested jailbreaks in­evitable?

judson17 Mar 2023 17:43 UTC
1 point
0 comments1 min readLW link

In­stan­ti­at­ing an agent with GPT-4 and text-davinci-003

Max H19 Mar 2023 23:57 UTC
13 points
3 comments32 min readLW link

What does it mean for an LLM such as GPT to be al­igned /​ good /​ pos­i­tive im­pact?

PashaKamyshev20 Mar 2023 9:21 UTC
4 points
3 comments10 min readLW link

RLHF does not ap­pear to differ­en­tially cause mode-collapse

20 Mar 2023 15:39 UTC
95 points
9 comments3 min readLW link

Emer­gent Analog­i­cal Rea­son­ing in Large Lan­guage Models

Roman Leventov22 Mar 2023 5:18 UTC
13 points
2 comments1 min readLW link
(arxiv.org)

GPT-4 al­ign­ing with aca­sual de­ci­sion the­ory when in­structed to play games, but in­cludes a CDT ex­pla­na­tion that’s in­cor­rect if they differ

Christopher King23 Mar 2023 16:16 UTC
7 points
4 comments8 min readLW link

Does GPT-4 ex­hibit agency when sum­ma­riz­ing ar­ti­cles?

Christopher King24 Mar 2023 15:49 UTC
16 points
2 comments5 min readLW link

More ex­per­i­ments in GPT-4 agency: writ­ing memos

Christopher King24 Mar 2023 17:51 UTC
5 points
2 comments10 min readLW link

Hut­ter-Prize for Prompts

rokosbasilisk24 Mar 2023 21:26 UTC
5 points
10 comments1 min readLW link

Chronos­ta­sis: The Time-Cap­sule Co­nun­drum of Lan­guage Models

RationalMindset26 Mar 2023 18:54 UTC
−5 points
0 comments1 min readLW link

If it quacks like a duck...

RationalMindset26 Mar 2023 18:54 UTC
−4 points
0 comments4 min readLW link

Sen­tience in Machines—How Do We Test for This Ob­jec­tively?

Mayowa Osibodu26 Mar 2023 18:56 UTC
−2 points
0 comments2 min readLW link
(www.researchgate.net)

LLM Mo­du­lar­ity: The Separa­bil­ity of Ca­pa­bil­ities in Large Lan­guage Models

NickyP26 Mar 2023 21:57 UTC
97 points
3 comments41 min readLW link

CAIS-in­spired ap­proach to­wards safer and more in­ter­pretable AGIs

Peter Hroššo27 Mar 2023 14:36 UTC
13 points
7 comments1 min readLW link

GPT-4 is bad at strate­gic thinking

Christopher King27 Mar 2023 15:11 UTC
22 points
8 comments1 min readLW link

the ten­sor is a lonely place

jml627 Mar 2023 18:22 UTC
−11 points
0 comments4 min readLW link
(ekjsgrjelrbno.substack.com)

Three of my be­liefs about up­com­ing AGI

Robert_AIZI27 Mar 2023 20:27 UTC
6 points
0 comments3 min readLW link
(aizi.substack.com)

The Prospect of an AI Winter

Erich_Grunewald27 Mar 2023 20:55 UTC
62 points
24 comments15 min readLW link
(www.erichgrunewald.com)

Adapt­ing to Change: Over­com­ing Chronos­ta­sis in AI Lan­guage Models

RationalMindset28 Mar 2023 14:32 UTC
−1 points
0 comments6 min readLW link

[Question] Why no ma­jor LLMs with mem­ory?

Kaj_Sotala28 Mar 2023 16:34 UTC
41 points
15 comments1 min readLW link

Cor­rigi­bil­ity, Self-Dele­tion, and Iden­ti­cal Strawberries

Robert_AIZI28 Mar 2023 16:54 UTC
8 points
2 comments6 min readLW link
(aizi.substack.com)

[Question] Which parts of the ex­ist­ing in­ter­net are already likely to be in (GPT-5/​other soon-to-be-trained LLMs)’s train­ing cor­pus?

AnnaSalamon29 Mar 2023 5:17 UTC
49 points
2 comments1 min readLW link

Role Ar­chi­tec­tures: Ap­ply­ing LLMs to con­se­quen­tial tasks

Eric Drexler30 Mar 2023 15:00 UTC
53 points
7 comments9 min readLW link

The Quan­ti­za­tion Model of Neu­ral Scaling

nz31 Mar 2023 16:02 UTC
17 points
0 comments1 min readLW link
(arxiv.org)

GPT-4 busted? Clear self-in­ter­est when sum­ma­riz­ing ar­ti­cles about it­self vs when ar­ti­cle talks about Claude, LLaMA, or DALL·E 2

Christopher King31 Mar 2023 17:05 UTC
6 points
4 comments4 min readLW link

Imag­ine a world where Microsoft em­ploy­ees used Bing

Christopher King31 Mar 2023 18:36 UTC
6 points
2 comments2 min readLW link

AI Safety via Luck

Jozdien1 Apr 2023 20:13 UTC
74 points
6 comments11 min readLW link

Why I Think the Cur­rent Tra­jec­tory of AI Re­search has Low P(doom) - LLMs

GaPa1 Apr 2023 20:35 UTC
2 points
1 comment10 min readLW link

In­vo­ca­tions: The Other Ca­pa­bil­ities Over­hang?

Robert_AIZI4 Apr 2023 13:38 UTC
29 points
4 comments4 min readLW link
(aizi.substack.com)

[Question] Where to be­gin in ML/​AI?

Jake the Student6 Apr 2023 20:45 UTC
8 points
4 comments1 min readLW link

Pre-reg­is­ter­ing a study

Robert_AIZI7 Apr 2023 15:46 UTC
10 points
0 comments6 min readLW link
(aizi.substack.com)

Up­com­ing Changes in Large Lan­guage Models

Andrew Keenan Richardson8 Apr 2023 3:41 UTC
43 points
8 comments4 min readLW link
(mechanisticmind.com)

Con­tra LeCun on “Au­tore­gres­sive LLMs are doomed”

rotatingpaguro10 Apr 2023 4:05 UTC
19 points
18 comments8 min readLW link

LW is prob­a­bly not the place for “I asked this LLM (x) and here’s what it said!”, but where is?

lillybaeum12 Apr 2023 10:12 UTC
21 points
3 comments1 min readLW link

Scaf­folded LLMs as nat­u­ral lan­guage computers

beren12 Apr 2023 10:47 UTC
92 points
10 comments11 min readLW link

[Question] Goals of model vs. goals of simu­lacra?

dr_s12 Apr 2023 13:02 UTC
5 points
7 comments1 min readLW link

Nat­u­ral lan­guage alignment

Jacy Reese Anthis12 Apr 2023 19:02 UTC
30 points
2 comments2 min readLW link

Was Homer a stochas­tic par­rot? Mean­ing in liter­ary texts and LLMs

Bill Benzon13 Apr 2023 16:44 UTC
7 points
4 comments3 min readLW link

LLMs and hal­lu­ci­na­tion, like white on rice?

Bill Benzon14 Apr 2023 19:53 UTC
5 points
0 comments3 min readLW link

The ‘ pe­ter­todd’ phenomenon

mwatkins15 Apr 2023 0:59 UTC
176 points
50 comments38 min readLW link

Smar­tyHead­erCode: anoma­lous to­kens for GPT3.5 and GPT-4

AdamYedidia15 Apr 2023 22:35 UTC
71 points
18 comments6 min readLW link

The Soul of the Writer (on LLMs, the psy­chol­ogy of writ­ers, and the na­ture of in­tel­li­gence)

rogersbacon16 Apr 2023 16:02 UTC
11 points
1 comment3 min readLW link
(www.secretorum.life)

An al­ter­na­tive of PPO to­wards alignment

ml hkust17 Apr 2023 17:58 UTC
2 points
2 comments4 min readLW link

No, re­ally, it pre­dicts next to­kens.

simon18 Apr 2023 3:47 UTC
58 points
37 comments3 min readLW link

Lan­guage Models are a Po­ten­tially Safe Path to Hu­man-Level AGI

Nadav Brandes20 Apr 2023 0:40 UTC
28 points
6 comments8 min readLW link

A poem writ­ten by a fancy autocomplete

Christopher King20 Apr 2023 2:31 UTC
1 point
0 comments1 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher King20 Apr 2023 19:57 UTC
2 points
7 comments3 min readLW link

Read­abil­ity is mostly a waste of characters

vlad.proex21 Apr 2023 22:05 UTC
21 points
7 comments3 min readLW link

[Question] Could trans­former net­work mod­els learn mo­tor plan­ning like they can learn lan­guage and image gen­er­a­tion?

mu_(negative)23 Apr 2023 17:24 UTC
2 points
4 comments1 min readLW link

Do LLMs dream of emer­gent sheep?

shminux24 Apr 2023 3:26 UTC
15 points
2 comments1 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC
7 points
0 comments4 min readLW link

Im­ple­ment­ing a Trans­former from scratch in PyTorch—a write-up on my experience

Mislav Jurić25 Apr 2023 20:51 UTC
16 points
0 comments10 min readLW link

LM Si­tu­a­tional Aware­ness, Eval­u­a­tion Pro­posal: Vio­lat­ing Imitation

Jacob Pfau26 Apr 2023 22:53 UTC
13 points
2 comments2 min readLW link

Ro­mance, mi­s­un­der­stand­ing, so­cial stances, and the hu­man LLM

Kaj_Sotala27 Apr 2023 12:59 UTC
69 points
32 comments16 min readLW link

AI doom from an LLM-plateau-ist perspective

Steven Byrnes27 Apr 2023 13:58 UTC
144 points
23 comments6 min readLW link

What are the limits of su­per­in­tel­li­gence?

rainy27 Apr 2023 18:29 UTC
4 points
3 comments5 min readLW link

LLMs and com­pu­ta­tion complexity

Jonathan Marcus28 Apr 2023 17:48 UTC
55 points
29 comments5 min readLW link

Find­ing Neu­rons in a Haystack: Case Stud­ies with Sparse Probing

3 May 2023 13:30 UTC
30 points
5 comments2 min readLW link
(arxiv.org)

Resi­d­ual stream norms grow ex­po­nen­tially over the for­ward pass

7 May 2023 0:46 UTC
72 points
24 comments11 min readLW link

LLM cog­ni­tion is prob­a­bly not hu­man-like

Max H8 May 2023 1:22 UTC
26 points
14 comments7 min readLW link

A Search for More ChatGPT /​ GPT-3.5 /​ GPT-4 “Un­speak­able” Glitch Tokens

Martin Fell9 May 2023 14:36 UTC
22 points
9 comments6 min readLW link

Lan­guage mod­els can ex­plain neu­rons in lan­guage models

nz9 May 2023 17:29 UTC
23 points
0 comments1 min readLW link
(openai.com)

New OpenAI Paper—Lan­guage mod­els can ex­plain neu­rons in lan­guage models

ViktorThink10 May 2023 7:46 UTC
47 points
14 comments1 min readLW link

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

13 May 2023 18:42 UTC
416 points
97 comments50 min readLW link

PCAST Work­ing Group on Gen­er­a­tive AI In­vites Public Input

Christopher King13 May 2023 22:49 UTC
7 points
0 comments1 min readLW link
(terrytao.wordpress.com)

LLM Guardrails Should Have Bet­ter Cus­tomer Ser­vice Tuning

Jiao Bu13 May 2023 22:54 UTC
2 points
0 comments2 min readLW link

My cur­rent work­flow to study the in­ter­nal mechanisms of LLM

Yulu Pi16 May 2023 15:27 UTC
3 points
0 comments1 min readLW link

[Question] Is there a ‘time se­ries fore­cast­ing’ equiv­a­lent of AIXI?

Solenoid_Entity17 May 2023 4:35 UTC
12 points
2 comments1 min readLW link

Microsoft and Google us­ing LLMs for Cybersecurity

Phosphorous18 May 2023 17:42 UTC
6 points
0 comments5 min readLW link

The Com­pleat Cybornaut

19 May 2023 8:44 UTC
64 points
2 comments16 min readLW link

See­ing Ghosts by GPT-4

Christopher King20 May 2023 0:11 UTC
−13 points
0 comments1 min readLW link

Trans­former Ar­chi­tec­ture Choice for Re­sist­ing Prompt In­jec­tion and Jail-Break­ing Attacks

RogerDearnaley21 May 2023 8:29 UTC
9 points
1 comment4 min readLW link

Why I Believe LLMs Do Not Have Hu­man-like Emotions

OneManyNone22 May 2023 15:46 UTC
8 points
6 comments7 min readLW link

Data and “to­kens” a 30 year old hu­man “trains” on

Jose Miguel Cruz y Celis23 May 2023 5:34 UTC
15 points
15 comments1 min readLW link

Align­ing an H-JEPA agent via train­ing on the out­puts of an LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:08 UTC
12 points
10 comments30 min readLW link

An LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:12 UTC
16 points
0 comments12 min readLW link

LIMA: Less Is More for Alignment

Ulisse Mini30 May 2023 17:10 UTC
16 points
6 comments1 min readLW link
(arxiv.org)

PaLM-2 & GPT-4 in “Ex­trap­o­lat­ing GPT-N perfor­mance”

Lukas Finnveden30 May 2023 18:33 UTC
55 points
6 comments6 min readLW link

Pro­gram­ming AGI is impossible

Áron Ecsenyi30 May 2023 23:05 UTC
1 point
0 comments4 min readLW link

“LLMs Don’t Have a Co­her­ent Model of the World”—What it Means, Why it Mat­ters

Davidmanheim1 Jun 2023 7:46 UTC
30 points
2 comments7 min readLW link

Open Source LLMs Can Now Ac­tively Lie

Josh Levy1 Jun 2023 22:03 UTC
6 points
0 comments3 min readLW link

Un­faith­ful Ex­pla­na­tions in Chain-of-Thought Prompting

miles3 Jun 2023 0:22 UTC
38 points
8 comments7 min readLW link

LEAst-squares Con­cept Era­sure (LEACE)

tricky_labyrinth7 Jun 2023 21:51 UTC
68 points
10 comments1 min readLW link
(twitter.com)

[Linkpost] Scal­ing laws for lan­guage en­cod­ing mod­els in fMRI

Bogdan Ionut Cirstea8 Jun 2023 10:52 UTC
30 points
0 comments1 min readLW link

[Linkpost] Large Lan­guage Models Con­verge on Brain-Like Word Representations

Bogdan Ionut Cirstea11 Jun 2023 11:20 UTC
36 points
12 comments1 min readLW link

Me­taAI: less is less for al­ign­ment.

Cleo Nardo13 Jun 2023 14:08 UTC
68 points
17 comments5 min readLW link

[Linkpost] Map­ping Brains with Lan­guage Models: A Survey

Bogdan Ionut Cirstea16 Jun 2023 9:49 UTC
5 points
0 comments1 min readLW link

[Linkpost] Faith and Fate: Limits of Trans­form­ers on Compositionality

Joe Kwon16 Jun 2023 15:04 UTC
19 points
4 comments1 min readLW link
(arxiv.org)

Ex­per­i­ments in Eval­u­at­ing Steer­ing Vectors

Gytis Daujotas19 Jun 2023 15:11 UTC
32 points
3 comments4 min readLW link

OpenAI in­tro­duces func­tion call­ing for GPT-4

20 Jun 2023 1:58 UTC
24 points
3 comments4 min readLW link
(openai.com)

Re­la­tional Speaking

jefftk21 Jun 2023 14:40 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

“text­books are all you need”

bhauth21 Jun 2023 17:06 UTC
65 points
18 comments2 min readLW link
(arxiv.org)

Us­ing Claude to con­vert di­a­log tran­scripts into great posts?

mako yass21 Jun 2023 20:19 UTC
6 points
4 comments4 min readLW link

Challenge pro­posal: small­est pos­si­ble self-hard­en­ing back­door for RLHF

Christopher King29 Jun 2023 16:56 UTC
7 points
0 comments2 min readLW link

Ele­ments of Com­pu­ta­tional Philos­o­phy, Vol. I: Truth

1 Jul 2023 11:44 UTC
11 points
6 comments1 min readLW link
(compphil.github.io)

[Linkpost] A shared lin­guis­tic space for trans­mit­ting our thoughts from brain to brain in nat­u­ral conversations

Bogdan Ionut Cirstea1 Jul 2023 13:57 UTC
17 points
2 comments1 min readLW link

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwern3 Jul 2023 0:48 UTC
410 points
54 comments7 min readLW link
(www.youtube.com)

The world where LLMs are possible

Ape in the coat10 Jul 2023 8:00 UTC
20 points
10 comments3 min readLW link

Goal-Direc­tion for Si­mu­lated Agents

Raymond D12 Jul 2023 17:06 UTC
33 points
2 comments6 min readLW link

Un­safe AI as Dy­nam­i­cal Systems

Robert_AIZI14 Jul 2023 15:31 UTC
11 points
0 comments3 min readLW link
(aizi.substack.com)

Ac­ti­va­tion adding ex­per­i­ments with llama-7b

Nina Rimsky16 Jul 2023 4:17 UTC
49 points
1 comment3 min readLW link

Quick Thoughts on Lan­guage Models

RohanS18 Jul 2023 20:38 UTC
6 points
0 comments4 min readLW link

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirk20 Jul 2023 9:56 UTC
38 points
2 comments5 min readLW link

Case for Foun­da­tion Models be­yond English

Varshul Gupta21 Jul 2023 13:59 UTC
1 point
0 comments3 min readLW link
(dubverseblack.substack.com)

GPTs’ abil­ity to keep a se­cret is weirdly prompt-dependent

22 Jul 2023 12:21 UTC
31 points
0 comments9 min readLW link

An­ti­ci­pa­tion in LLMs

derek shiller24 Jul 2023 15:53 UTC
6 points
0 comments13 min readLW link

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC
122 points
14 comments8 min readLW link

GPT-4 can catch sub­tle cross-lan­guage trans­la­tion mistakes

Michael Tontchev27 Jul 2023 1:39 UTC
7 points
1 comment1 min readLW link

Re­duc­ing syco­phancy and im­prov­ing hon­esty via ac­ti­va­tion steering

Nina Rimsky28 Jul 2023 2:46 UTC
116 points
14 comments9 min readLW link

AI Aware­ness through In­ter­ac­tion with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC
7 points
5 comments3 min readLW link

Univer­sal and Trans­fer­able Ad­ver­sar­ial At­tacks on Aligned Lan­guage Models [pa­per link]

Sodium29 Jul 2023 3:21 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

Water­mark­ing con­sid­ered over­rated?

DanielFilan31 Jul 2023 21:36 UTC
18 points
4 comments1 min readLW link

[Linkpost] De­cep­tion Abil­ities Emerged in Large Lan­guage Models

Bogdan Ionut Cirstea3 Aug 2023 17:28 UTC
12 points
0 comments1 min readLW link

[Linkpost] Mul­ti­modal Neu­rons in Pre­trained Text-Only Transformers

Bogdan Ionut Cirstea4 Aug 2023 15:29 UTC
11 points
0 comments1 min readLW link

Ex­plor­ing the Mul­ti­verse of Large Lan­guage Models

franky6 Aug 2023 2:38 UTC
1 point
0 comments5 min readLW link

Model Or­ganisms of Misal­ign­ment: The Case for a New Pillar of Align­ment Research

8 Aug 2023 1:30 UTC
305 points
26 comments18 min readLW link

A Sim­ple The­ory Of Consciousness

SherlockHolmes8 Aug 2023 18:05 UTC
2 points
5 comments1 min readLW link
(peterholmes.medium.com)

In­flec­tion.ai is a ma­jor AGI lab

nikola9 Aug 2023 1:05 UTC
137 points
13 comments2 min readLW link

Mo­du­lat­ing syco­phancy in an RLHF model via ac­ti­va­tion steering

Nina Rimsky9 Aug 2023 7:06 UTC
64 points
20 comments12 min readLW link

Google Deep­Mind’s RT-2

SandXbox11 Aug 2023 11:26 UTC
9 points
1 comment1 min readLW link
(robotics-transformer2.github.io)

Co­her­ence Ther­apy with LLMs—quick demo

Chipmonk14 Aug 2023 3:34 UTC
19 points
11 comments1 min readLW link

[Question] Any re­search in “probe-tun­ing” of LLMs?

Roman Leventov15 Aug 2023 21:01 UTC
20 points
3 comments1 min readLW link

Memetic Judo #3: The In­tel­li­gence of Stochas­tic Par­rots v.2

Max TK20 Aug 2023 15:18 UTC
8 points
33 comments6 min readLW link

Large Lan­guage Models will be Great for Censorship

Ethan Edwards21 Aug 2023 19:03 UTC
183 points
14 comments8 min readLW link
(ethanedwards.substack.com)

[Question] Would it be use­ful to col­lect the con­texts, where var­i­ous LLMs think the same?

Martin Vlach24 Aug 2023 22:01 UTC
6 points
1 comment1 min readLW link

Xanadu, GPT, and Beyond: An ad­ven­ture of the mind

Bill Benzon27 Aug 2023 16:19 UTC
2 points
0 comments5 min readLW link

An In­ter­pretabil­ity Illu­sion for Ac­ti­va­tion Patch­ing of Ar­bi­trary Subspaces

29 Aug 2023 1:04 UTC
74 points
3 comments1 min readLW link

An ad­ver­sar­ial ex­am­ple for Direct Logit At­tri­bu­tion: mem­ory man­age­ment in gelu-4l

30 Aug 2023 17:36 UTC
17 points
0 comments8 min readLW link
(arxiv.org)

Re­port on An­a­lyz­ing Con­no­ta­tion Frames in Evolv­ing Wikipe­dia Biographies

Maira30 Aug 2023 22:02 UTC
1 point
0 comments4 min readLW link

Can an LLM iden­tify ring-com­po­si­tion in a liter­ary text? [ChatGPT]

Bill Benzon1 Sep 2023 14:18 UTC
4 points
2 comments11 min readLW link

[Linkpost] Large lan­guage mod­els con­verge to­ward hu­man-like con­cept organization

Bogdan Ionut Cirstea2 Sep 2023 6:00 UTC
22 points
1 comment1 min readLW link

What must be the case that ChatGPT would have mem­o­rized “To be or not to be”? – Three kinds of con­cep­tual ob­jects for LLMs

Bill Benzon3 Sep 2023 18:39 UTC
19 points
0 comments12 min readLW link

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
106 points
16 comments5 min readLW link
(arxiv.org)

World, mind, and learn­abil­ity: A note on the meta­phys­i­cal struc­ture of the cos­mos [& LLMs]

Bill Benzon5 Sep 2023 12:19 UTC
4 points
1 comment5 min readLW link

Ac­tAdd: Steer­ing Lan­guage Models with­out Optimization

6 Sep 2023 17:21 UTC
105 points
3 comments2 min readLW link
(arxiv.org)

Au­to­mat­i­cally find­ing fea­ture vec­tors in the OV cir­cuits of Trans­form­ers with­out us­ing probing

Jacob Dunefsky12 Sep 2023 17:38 UTC
13 points
0 comments29 min readLW link

Un­cov­er­ing La­tent Hu­man Wel­lbe­ing in LLM Embeddings

14 Sep 2023 1:40 UTC
32 points
7 comments8 min readLW link
(far.ai)

[un­ti­tled post]

verwindung14 Sep 2023 16:22 UTC
1 point
0 comments1 min readLW link

Can I take ducks home from the park?

dynomight14 Sep 2023 21:03 UTC
64 points
8 comments3 min readLW link
(dynomight.net)

Image Hi­jacks: Ad­ver­sar­ial Images can Con­trol Gen­er­a­tive Models at Runtime

20 Sep 2023 15:23 UTC
58 points
9 comments1 min readLW link
(arxiv.org)

Notes on ChatGPT’s “mem­ory” for strings and for events

Bill Benzon20 Sep 2023 18:12 UTC
3 points
0 comments10 min readLW link

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
154 points
7 comments5 min readLW link

A quick re­mark on so-called “hal­lu­ci­na­tions” in LLMs and hu­mans

Bill Benzon23 Sep 2023 12:17 UTC
4 points
4 comments1 min readLW link

Paper: LLMs trained on “A is B” fail to learn “B is A”

23 Sep 2023 19:55 UTC
120 points
73 comments4 min readLW link
(arxiv.org)

Eval­u­at­ing hid­den di­rec­tions on the util­ity dataset: clas­sifi­ca­tion, steer­ing and removal

25 Sep 2023 17:19 UTC
25 points
3 comments7 min readLW link

Dis­cur­sive Com­pe­tence in ChatGPT, Part 2: Me­mory for Texts

Bill Benzon28 Sep 2023 16:34 UTC
1 point
0 comments3 min readLW link

Ex­pec­ta­tions for Gem­ini: hope­fully not a big deal

Maxime Riché2 Oct 2023 15:38 UTC
15 points
5 comments1 min readLW link

Some Quick Fol­low-Up Ex­per­i­ments to “Taken out of con­text: On mea­sur­ing situ­a­tional aware­ness in LLMs”

miles3 Oct 2023 2:22 UTC
31 points
0 comments9 min readLW link

What would it mean to un­der­stand how a large lan­guage model (LLM) works? Some quick notes.

Bill Benzon3 Oct 2023 15:11 UTC
20 points
4 comments8 min readLW link

Graph­i­cal ten­sor no­ta­tion for interpretability

Jordan Taylor4 Oct 2023 8:04 UTC
132 points
8 comments19 min readLW link

En­tan­gle­ment and in­tu­ition about words and mean­ing

Bill Benzon4 Oct 2023 14:16 UTC
4 points
0 comments2 min readLW link

[Question] What ev­i­dence is there of LLM’s con­tain­ing world mod­els?

Chris_Leong4 Oct 2023 14:33 UTC
17 points
17 comments1 min readLW link

I don’t find the lie de­tec­tion re­sults that sur­pris­ing (by an au­thor of the pa­per)

JanB4 Oct 2023 17:10 UTC
97 points
8 comments3 min readLW link

An ex­pla­na­tion for ev­ery to­ken: us­ing an LLM to sam­ple an­other LLM

Max H11 Oct 2023 0:53 UTC
34 points
4 comments11 min readLW link

LLMs — Pure Rea­son Without The Critique

Rosco-Hunter11 Oct 2023 13:11 UTC
5 points
0 comments3 min readLW link

Un­der­stand­ing LLMs: Some ba­sic ob­ser­va­tions about words, syn­tax, and dis­course [w/​ a con­jec­ture about grokking]

Bill Benzon11 Oct 2023 19:13 UTC
5 points
0 comments5 min readLW link

ChatGPT tells 20 ver­sions of its pro­to­typ­i­cal story, with a short note on method

Bill Benzon14 Oct 2023 15:27 UTC
6 points
0 comments5 min readLW link

Map­ping ChatGPT’s on­tolog­i­cal land­scape, gra­di­ents and choices [in­ter­pretabil­ity]

Bill Benzon15 Oct 2023 20:12 UTC
1 point
0 comments18 min readLW link

ChatGPT Plays 20 Ques­tions [some­times needs help]

Bill Benzon17 Oct 2023 17:30 UTC
5 points
3 comments12 min readLW link

Eleuther re­leases Llemma: An Open Lan­guage Model For Mathematics

mako yass17 Oct 2023 20:03 UTC
22 points
0 comments1 min readLW link
(blog.eleuther.ai)

Re­veal­ing In­ten­tion­al­ity In Lan­guage Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC
117 points
14 comments22 min readLW link

Are (at least some) Large Lan­guage Models Holo­graphic Me­mory Stores?

Bill Benzon20 Oct 2023 13:07 UTC
11 points
4 comments6 min readLW link

Align­ment Im­pli­ca­tions of LLM Suc­cesses: a De­bate in One Act

Zack_M_Davis21 Oct 2023 15:22 UTC
238 points
50 comments13 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

23 Oct 2023 16:33 UTC
33 points
2 comments11 min readLW link

Towards Un­der­stand­ing Sy­co­phancy in Lan­guage Models

24 Oct 2023 0:30 UTC
65 points
0 comments2 min readLW link
(arxiv.org)

Com­po­si­tional prefer­ence mod­els for al­ign­ing LMs

Tomek Korbak25 Oct 2023 12:17 UTC
18 points
2 comments5 min readLW link

Send LLMs to School: In­struc­tion Tun­ing with Hu­man Curriculum

Bruce W. Lee31 Oct 2023 0:07 UTC
4 points
0 comments5 min readLW link

Ro­bust­ness of Con­trast-Con­sis­tent Search to Ad­ver­sar­ial Prompting

1 Nov 2023 12:46 UTC
15 points
1 comment7 min readLW link

ChatGPT’s On­tolog­i­cal Land­scape

Bill Benzon1 Nov 2023 15:12 UTC
7 points
0 comments4 min readLW link

Pre­face to the Se­quence on LLM Psychology

Quentin FEUILLADE--MONTIXI7 Nov 2023 16:12 UTC
31 points
0 comments2 min readLW link

The Stochas­tic Par­rot Hy­poth­e­sis is de­bat­able for the last gen­er­a­tion of LLMs

7 Nov 2023 16:12 UTC
50 points
20 comments6 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
36 points
2 comments2 min readLW link
(arxiv.org)

What’s go­ing on? LLMs and IS-A sen­tences

Bill Benzon8 Nov 2023 16:58 UTC
6 points
15 comments4 min readLW link

Poly­se­man­tic At­ten­tion Head in a 4-Layer Transformer

9 Nov 2023 16:16 UTC
46 points
0 comments6 min readLW link

Lin­ear en­cod­ing of char­ac­ter-level in­for­ma­tion in GPT-J to­ken embeddings

10 Nov 2023 22:19 UTC
34 points
4 comments28 min readLW link

AISC Pro­ject: Model­ling Tra­jec­to­ries of Lan­guage Models

NickyP13 Nov 2023 14:33 UTC
25 points
0 comments12 min readLW link

Is In­ter­pretabil­ity All We Need?

RogerDearnaley14 Nov 2023 5:31 UTC
1 point
1 comment1 min readLW link

LLMs May Find It Hard to FOOM

RogerDearnaley15 Nov 2023 2:52 UTC
11 points
30 comments12 min readLW link

A con­cep­tual pre­cur­sor to to­day’s lan­guage ma­chines [Shan­non]

Bill Benzon15 Nov 2023 13:50 UTC
24 points
6 comments2 min readLW link

Ex­trap­o­lat­ing from Five Words

Gordon Seidoh Worley15 Nov 2023 23:21 UTC
40 points
11 comments2 min readLW link

Towards Eval­u­at­ing AI Sys­tems for Mo­ral Sta­tus Us­ing Self-Reports

16 Nov 2023 20:18 UTC
45 points
3 comments1 min readLW link
(arxiv.org)

Clas­sify­ing rep­re­sen­ta­tions of sparse au­toen­coders (SAEs)

Annah17 Nov 2023 13:54 UTC
15 points
6 comments2 min readLW link

AISC pro­ject: TinyEvals

Jett22 Nov 2023 20:47 UTC
17 points
0 comments4 min readLW link

An Idea on How LLMs Can Show Self-Serv­ing Bias

Bruce W. Lee23 Nov 2023 20:25 UTC
6 points
6 comments3 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC
64 points
30 comments11 min readLW link

Re­searchers and writ­ers can ap­ply for proxy ac­cess to the GPT-3.5 base model (code-davinci-002)

ampdot1 Dec 2023 18:48 UTC
14 points
0 comments1 min readLW link
(airtable.com)

The Method of Loci: With some brief re­marks, in­clud­ing trans­form­ers and eval­u­at­ing AIs

Bill Benzon2 Dec 2023 14:36 UTC
6 points
0 comments3 min readLW link

In­ter­view with Vanessa Kosoy on the Value of The­o­ret­i­cal Re­search for AI

WillPetillo4 Dec 2023 22:58 UTC
35 points
0 comments35 min readLW link

Study­ing The Alien Mind

5 Dec 2023 17:27 UTC
75 points
10 comments15 min readLW link

Lan­guage Model Me­moriza­tion, Copy­right Law, and Con­di­tional Pre­train­ing Alignment

RogerDearnaley7 Dec 2023 6:14 UTC
3 points
0 comments11 min readLW link

LLM keys—A Pro­posal of a Solu­tion to Prompt In­jec­tion Attacks

Peter Hroššo7 Dec 2023 17:36 UTC
1 point
2 comments1 min readLW link

Does Chat-GPT dis­play ‘Scope Insen­si­tivity’?

callum7 Dec 2023 18:58 UTC
11 points
0 comments3 min readLW link

Re­fusal mechanisms: ini­tial ex­per­i­ments with Llama-2-7b-chat

8 Dec 2023 17:08 UTC
78 points
7 comments7 min readLW link

Find­ing Sparse Lin­ear Con­nec­tions be­tween Fea­tures in LLMs

9 Dec 2023 2:27 UTC
66 points
5 comments10 min readLW link

Cat­e­gor­i­cal Or­ga­ni­za­tion in Me­mory: ChatGPT Or­ga­nizes the 665 Topic Tags from My New Sa­vanna Blog

Bill Benzon14 Dec 2023 13:02 UTC
0 points
6 comments2 min readLW link

Map­ping the se­man­tic void: Strange go­ings-on in GPT em­bed­ding spaces

mwatkins14 Dec 2023 13:10 UTC
114 points
30 comments14 min readLW link

A vi­sual anal­ogy for text gen­er­a­tion by LLMs?

Bill Benzon16 Dec 2023 17:58 UTC
3 points
0 comments1 min readLW link

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

18 Dec 2023 11:58 UTC
145 points
21 comments10 min readLW link

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC
17 points
5 comments3 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

19 Dec 2023 19:14 UTC
45 points
4 comments6 min readLW link
(arxiv.org)

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments1 min readLW link

AI Safety Chatbot

21 Dec 2023 14:06 UTC
48 points
11 comments4 min readLW link

Ex­plor­ing the Resi­d­ual Stream of Trans­form­ers for Mechanis­tic In­ter­pretabil­ity — Explained

Zeping Yu26 Dec 2023 0:36 UTC
7 points
1 comment11 min readLW link

AGI will be made of het­ero­ge­neous com­po­nents, Trans­former and Selec­tive SSM blocks will be among them

Roman Leventov27 Dec 2023 14:51 UTC
33 points
9 comments4 min readLW link

The fu­ture of Hu­mans: Oper­a­tors of AI

François-Joseph Lacroix30 Dec 2023 23:46 UTC
1 point
0 comments1 min readLW link
(medium.com)

Does ChatGPT know what a tragedy is?

Bill Benzon31 Dec 2023 7:10 UTC
2 points
4 comments5 min readLW link

What’s up with LLMs rep­re­sent­ing XORs of ar­bi­trary fea­tures?

Sam Marks3 Jan 2024 19:44 UTC
154 points
61 comments16 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
35 points
4 comments2 min readLW link

Bench­mark Study #1: MMLU (Pile, MCQ)

Bruce W. Lee5 Jan 2024 21:35 UTC
10 points
0 comments5 min readLW link
(arxiv.org)

Bench­mark Study #2: Truth­fulQA (Task, MCQ)

Bruce W. Lee6 Jan 2024 2:39 UTC
11 points
2 comments4 min readLW link
(arxiv.org)

Bench­mark Study #3: Hel­laSwag (Task, MCQ)

Bruce W. Lee7 Jan 2024 4:59 UTC
2 points
4 comments6 min readLW link
(arxiv.org)

Bench­mark Study #4: AI2 Rea­son­ing Challenge (Task(s), MCQ)

Bruce W. Lee7 Jan 2024 17:13 UTC
6 points
0 comments5 min readLW link

An­nounc­ing the Dou­ble Crux Bot

9 Jan 2024 18:54 UTC
39 points
3 comments3 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC
46 points
8 comments36 min readLW link

Three changes that I’m mak­ing to the Bench­mark Study Series

Bruce W. Lee10 Jan 2024 0:43 UTC
2 points
0 comments2 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
22 points
4 comments39 min readLW link

A Chi­nese Room Con­tain­ing a Stack of Stochas­tic Parrots

RogerDearnaley12 Jan 2024 6:29 UTC
18 points
2 comments5 min readLW link

Sparse Au­toen­coders Work on At­ten­tion Layer Outputs

16 Jan 2024 0:26 UTC
81 points
5 comments19 min readLW link

Maybe talk­ing isn’t the best way to com­mu­ni­cate with LLMs

mnvr17 Jan 2024 6:24 UTC
3 points
1 comment1 min readLW link
(mrmr.io)

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC
5 points
2 comments4 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUT21 Jan 2024 2:30 UTC
1 point
0 comments1 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC
1 point
0 comments1 min readLW link

Pre­dict­ing AGI by the Tur­ing Test

Yuxi_Liu22 Jan 2024 4:22 UTC
21 points
2 comments10 min readLW link
(yuxi-liu-wired.github.io)

In­terLab – a toolkit for ex­per­i­ments with multi-agent interactions

22 Jan 2024 18:23 UTC
57 points
0 comments8 min readLW link
(acsresearch.org)

′ pe­ter­todd’’s last stand: The fi­nal days of open GPT-3 research

mwatkins22 Jan 2024 18:47 UTC
101 points
13 comments45 min readLW link

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

StellaAthena25 Jan 2024 19:17 UTC
94 points
14 comments1 min readLW link
(www.rand.org)

Why I take short timelines seriously

NicholasKees28 Jan 2024 22:27 UTC
115 points
29 comments4 min readLW link

The case for more am­bi­tious lan­guage model evals

Jozdien30 Jan 2024 0:01 UTC
104 points
25 comments5 min readLW link

Put­ting mul­ti­modal LLMs to the Tetris test

1 Feb 2024 16:02 UTC
30 points
5 comments7 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
4 points
15 comments13 min readLW link

At­ten­tion SAEs Scale to GPT-2 Small

3 Feb 2024 6:50 UTC
75 points
4 comments8 min readLW link

Im­ple­ment­ing ac­ti­va­tion steering

Annah5 Feb 2024 17:51 UTC
59 points
5 comments7 min readLW link

Bench­mark Study #5: So­cial In­tel­li­gence QA (Task, MCQ)

Bruce W. Lee7 Feb 2024 4:41 UTC
6 points
0 comments5 min readLW link
(arxiv.org)

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

7 Feb 2024 21:28 UTC
87 points
14 comments9 min readLW link
(arxiv.org)

What’s ChatGPT’s Fa­vorite Ice Cream Fla­vor? An In­ves­ti­ga­tion Into Syn­thetic Respondents

Greg Robison9 Feb 2024 18:38 UTC
19 points
4 comments15 min readLW link

And All the Shog­goths Merely Players

Zack_M_Davis10 Feb 2024 19:56 UTC
138 points
56 comments12 min readLW link

The Last Laugh: Ex­plor­ing the Role of Hu­mor as a Bench­mark for Large Lan­guage Models

Greg Robison12 Feb 2024 18:34 UTC
4 points
5 comments11 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
20 points
6 comments31 min readLW link

[Question] What ex­per­i­ment set­tles the Gary Mar­cus vs Ge­offrey Hin­ton de­bate?

Valentin Baltadzhiev14 Feb 2024 9:06 UTC
12 points
8 comments1 min readLW link

Map­ping the se­man­tic void II: Above, be­low and be­tween to­ken em­bed­dings

mwatkins15 Feb 2024 23:00 UTC
31 points
4 comments10 min readLW link

Phal­lo­cen­tric­ity in GPT-J’s bizarre strat­ified ontology

mwatkins17 Feb 2024 0:16 UTC
53 points
36 comments9 min readLW link

In­duc­ing hu­man-like bi­ases in moral rea­son­ing LMs

20 Feb 2024 16:28 UTC
18 points
3 comments14 min readLW link

Re­search Post: Tasks That Lan­guage Models Don’t Learn

22 Feb 2024 18:52 UTC
39 points
23 comments2 min readLW link
(arxiv.org)

The role of philo­soph­i­cal think­ing in un­der­stand­ing large lan­guage mod­els: Cal­ibrat­ing and clos­ing the gap be­tween first-per­son ex­pe­rience and un­der­ly­ing mechanisms

Bill Benzon23 Feb 2024 12:19 UTC
4 points
0 comments10 min readLW link

In­stru­men­tal de­cep­tion and ma­nipu­la­tion in LLMs—a case study

Olli Järviniemi24 Feb 2024 2:07 UTC
33 points
13 comments12 min readLW link

[Question] Sup­pos­ing the 1bit LLM pa­per pans out

O O29 Feb 2024 5:31 UTC
27 points
11 comments1 min readLW link

Ap­proach­ing Hu­man-Level Fore­cast­ing with Lan­guage Models

29 Feb 2024 22:36 UTC
59 points
6 comments3 min readLW link

An­thropic re­lease Claude 3, claims >GPT-4 Performance

LawrenceC4 Mar 2024 18:23 UTC
114 points
40 comments2 min readLW link
(www.anthropic.com)

Claude 3 claims it’s con­scious, doesn’t want to die or be modified

Mikhail Samin4 Mar 2024 23:05 UTC
66 points
99 comments14 min readLW link

Many ar­gu­ments for AI x-risk are wrong

TurnTrout5 Mar 2024 2:31 UTC
151 points
75 comments12 min readLW link

Claude Doesn’t Want to Die

garrison5 Mar 2024 6:00 UTC
21 points
3 comments1 min readLW link
(garrisonlovely.substack.com)

Re­search Re­port: Sparse Au­toen­coders find only 9/​180 board state fea­tures in OthelloGPT

Robert_AIZI5 Mar 2024 13:55 UTC
52 points
24 comments10 min readLW link
(aizi.substack.com)

We In­spected Every Head In GPT-2 Small us­ing SAEs So You Don’t Have To

6 Mar 2024 5:03 UTC
56 points
0 comments12 min readLW link

Un­der­stand­ing SAE Fea­tures with the Logit Lens

11 Mar 2024 0:16 UTC
53 points
0 comments14 min readLW link

Bias-Aug­mented Con­sis­tency Train­ing Re­duces Bi­ased Rea­son­ing in Chain-of-Thought

miles11 Mar 2024 23:46 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

[Question] Can any LLM be rep­re­sented as an Equa­tion?

Valentin Baltadzhiev14 Mar 2024 9:51 UTC
1 point
2 comments1 min readLW link

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

15 Mar 2024 23:16 UTC
90 points
0 comments1 min readLW link
(metr.github.io)

XAI re­leases Grok base model

g-w118 Mar 2024 0:47 UTC
7 points
3 comments1 min readLW link
(x.ai)

In­fer­ring the model di­men­sion of API-pro­tected LLMs

Ege Erdil18 Mar 2024 6:19 UTC
32 points
1 comment4 min readLW link
(arxiv.org)

[Linkpost] Vague Ver­biage in Forecasting

trevor22 Mar 2024 18:05 UTC
11 points
9 comments3 min readLW link
(goodjudgment.com)

Can quan­tised au­toen­coders find and in­ter­pret cir­cuits in lan­guage mod­els?

charlieoneill24 Mar 2024 20:05 UTC
31 points
1 comment24 min readLW link

[Question] Could LLMs Help Gen­er­ate New Con­cepts in Hu­man Lan­guage?

Pekka Lampelto24 Mar 2024 20:13 UTC
10 points
4 comments2 min readLW link

En­hanc­ing biose­cu­rity with lan­guage mod­els: defin­ing re­search directions

mic26 Mar 2024 12:30 UTC
12 points
0 comments1 min readLW link
(papers.ssrn.com)