The Case for Pre­dic­tive Models

Rubi J. HudsonApr 3, 2024, 6:22 PM
43 points
7 comments8 min readLW link

Con­crete em­piri­cal re­search pro­jects in mechanis­tic anomaly detection

Apr 3, 2024, 11:07 PM
43 points
3 comments10 min readLW link

List your AI X-Risk cruxes!

Aryeh EnglanderApr 28, 2024, 6:26 PM
42 points
7 comments2 min readLW link

For­get Every­thing (Statis­ti­cal Me­chan­ics Part 1)

J BostockApr 22, 2024, 1:33 PM
42 points
7 comments3 min readLW link

Gated At­ten­tion Blocks: Pre­limi­nary Progress to­ward Re­mov­ing At­ten­tion Head Superposition

Apr 8, 2024, 11:14 AM
42 points
4 comments15 min readLW link

Notes on Dwarkesh Pa­tel’s Pod­cast with Sholto Dou­glas and Tren­ton Bricken

ZviApr 1, 2024, 7:10 PM
41 points
1 comment16 min readLW link
(thezvi.wordpress.com)

Scal­ing of AI train­ing runs will slow down af­ter GPT-5

Maxime RichéApr 26, 2024, 4:05 PM
40 points
5 comments3 min readLW link

Con­flict in Posthu­man Literature

Martín SotoApr 6, 2024, 10:26 PM
40 points
1 comment2 min readLW link
(twitter.com)

What’s up with all the non-Mor­mons? Weirdly spe­cific uni­ver­sal­ities across LLMs

mwatkinsApr 19, 2024, 1:43 PM
40 points
13 comments27 min readLW link

De­quan­tify­ing first-or­der theories

jessicataApr 23, 2024, 7:04 PM
40 points
9 comments8 min readLW link
(unstableontology.com)

AI Reg­u­la­tion is Unsafe

Maxwell TabarrokApr 22, 2024, 4:37 PM
40 points
41 comments4 min readLW link
(www.maximum-progress.com)

Los­ing Faith In Con­trar­i­anism

Bentham's BulldogApr 25, 2024, 8:53 PM
39 points
44 comments5 min readLW link

On what re­search poli­cy­mak­ers ac­tu­ally need

MondSemmelApr 23, 2024, 7:50 PM
38 points
0 comments3 min readLW link
(www.slowboring.com)

In­duc­ing Un­prompted Misal­ign­ment in LLMs

Apr 19, 2024, 8:00 PM
38 points
7 comments16 min readLW link

[Fic­tion] A Confession

Arjun PanicksseryApr 18, 2024, 4:28 PM
38 points
2 comments5 min readLW link
(arjunpanickssery.substack.com)

Tinker

Richard_NgoApr 16, 2024, 6:26 PM
38 points
0 comments1 min readLW link
(press.asimov.com)

Thou­sands of mal­i­cious ac­tors on the fu­ture of AI misuse

Apr 1, 2024, 10:08 AM
37 points
0 comments1 min readLW link

Med­i­cal Roundup #2

ZviApr 9, 2024, 1:40 PM
37 points
18 comments16 min readLW link
(thezvi.wordpress.com)

Effec­tively Han­dling Disagree­ments—In­tro­duc­ing a New Workshop

Camille Berger Apr 15, 2024, 4:33 PM
37 points
2 comments7 min readLW link

A High De­cou­pling Failure

Maxwell TabarrokApr 14, 2024, 7:46 PM
37 points
5 comments3 min readLW link
(www.maximum-progress.com)

[Question] Is there soft­ware to prac­tice read­ing ex­pres­sions?

lsusrApr 23, 2024, 9:53 PM
37 points
11 comments1 min readLW link

WSJ: In­side Ama­zon’s Se­cret Oper­a­tion to Gather In­tel on Rivals

trevorApr 23, 2024, 9:33 PM
37 points
5 comments5 min readLW link
(www.wsj.com)

The Evolu­tion of Hu­mans Was Net-Nega­tive for Hu­man Values

Zack_M_DavisApr 1, 2024, 4:01 PM
37 points
1 comment2 min readLW link

Claude 3 Opus can op­er­ate as a Tur­ing machine

Gunnar_ZarnckeApr 17, 2024, 8:41 AM
36 points
2 comments1 min readLW link
(twitter.com)

Child­hood and Ed­u­ca­tion Roundup #5

ZviApr 17, 2024, 1:00 PM
36 points
3 comments25 min readLW link
(thezvi.wordpress.com)

LessWrong: After Dark, a new side of LessWrong

So8resApr 1, 2024, 10:44 PM
36 points
6 comments1 min readLW link

How I se­lect al­ign­ment re­search projects

Apr 10, 2024, 4:33 AM
36 points
4 comments24 min readLW link

UDT1.01: Log­i­cal In­duc­tors and Im­plicit Beliefs (5/​10)

DiffractorApr 18, 2024, 8:39 AM
34 points
2 comments19 min readLW link

hy­dro­gen tube transport

bhauthApr 18, 2024, 10:47 PM
34 points
12 comments5 min readLW link
(www.bhauth.com)

A quick ex­per­i­ment on LMs’ in­duc­tive bi­ases in perform­ing search

Alex MallenApr 14, 2024, 3:41 AM
32 points
2 comments4 min readLW link

Protes­tants Trad­ing Acausally

Martin SustrikApr 1, 2024, 2:46 PM
31 points
4 comments1 min readLW link

Fal­ling fer­til­ity ex­pla­na­tions and Israel

Yair HalberstadtApr 3, 2024, 3:27 AM
31 points
4 comments2 min readLW link

Thoughts on Zero Points

depressurizeApr 23, 2024, 2:22 AM
31 points
1 comment4 min readLW link
(sexandchicago.substack.com)

Good Bings copy, great Bings steal

dr_sApr 21, 2024, 9:52 AM
31 points
6 comments9 min readLW link

Quick ev­i­dence re­view of bulk­ing & cutting

jpApr 4, 2024, 9:43 PM
31 points
5 comments4 min readLW link

UDT1.01: Plannable and Un­planned Ob­ser­va­tions (3/​10)

DiffractorApr 12, 2024, 5:24 AM
31 points
0 comments7 min readLW link

New re­port: A re­view of the em­piri­cal ev­i­dence for ex­is­ten­tial risk from AI via mis­al­igned power-seeking

Apr 4, 2024, 11:41 PM
31 points
5 comments1 min readLW link
(blog.aiimpacts.org)

An­nounc­ing SPAR Sum­mer 2024!

laurenmarie12Apr 16, 2024, 8:30 AM
30 points
2 comments1 min readLW link

AI #59: Model Updates

ZviApr 11, 2024, 2:20 PM
30 points
2 comments63 min readLW link
(thezvi.wordpress.com)

Big-en­dian is bet­ter than lit­tle-endian

MenotimApr 29, 2024, 2:30 AM
30 points
17 comments3 min readLW link

The Poker The­ory of Poker Night

omarkApr 7, 2024, 9:47 AM
29 points
13 comments9 min readLW link
(www.codeandbugs.com)

End-to-end hack­ing with lan­guage models

tchauvinApr 5, 2024, 3:06 PM
29 points
0 comments8 min readLW link

Ex­per­i­ments with an al­ter­na­tive method to pro­mote spar­sity in sparse autoencoders

Eoin FarrellApr 15, 2024, 6:21 PM
29 points
7 comments12 min readLW link

Ex­pe­rience Re­port—ML4Good AI Safety Bootcamp

Kieron KretschmarApr 11, 2024, 6:03 PM
29 points
0 comments4 min readLW link

Please Understand

samhealyApr 1, 2024, 12:33 PM
28 points
11 comments6 min readLW link

[Question] Is LLM Trans­la­tion Without Rosetta Stone pos­si­ble?

cubefoxApr 11, 2024, 12:36 AM
28 points
15 comments1 min readLW link

{Book Sum­mary} The Art of Gathering

Tristan WilliamsApr 16, 2024, 10:48 AM
28 points
0 comments13 min readLW link

Struc­tured Trans­parency: a frame­work for ad­dress­ing use/​mis-use trade-offs when shar­ing information

habrykaApr 11, 2024, 6:35 PM
28 points
0 comments2 min readLW link
(arxiv.org)

Ack­shually, many wor­lds is wrong

tailcalledApr 11, 2024, 8:23 PM
27 points
42 comments4 min readLW link

On the 2nd CWT with Jonathan Haidt

ZviApr 5, 2024, 5:30 PM
27 points
3 comments33 min readLW link
(thezvi.wordpress.com)