The AI Driver’s Li­cence—A Policy Proposal

Jul 21, 2024, 8:38 PM
0 points
1 comment19 min readLW link

De­mog­ra­phy and Destiny

Zero ContradictionsJul 21, 2024, 8:34 PM
6 points
11 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

The $100B plan with “70% risk of kil­ling us all” w Stephen Fry [video]

Oleg TrottJul 21, 2024, 8:06 PM
35 points
8 comments1 min readLW link
(www.youtube.com)

Rais­ing Welfare for Lab Rodents

xanderbalwitJul 21, 2024, 7:18 PM
−2 points
0 comments1 min readLW link
(press.asimov.com)

A sim­ple model of math skill

Alex_AltairJul 21, 2024, 6:57 PM
101 points
16 comments8 min readLW link

Us­ing an LLM per­plex­ity filter to de­tect weight exfiltration

Adam KarvonenJul 21, 2024, 6:18 PM
25 points
11 comments2 min readLW link

[Question] Would a scope-in­sen­si­tive AGI be less likely to in­ca­pac­i­tate hu­man­ity?

Jim BuhlerJul 21, 2024, 2:15 PM
2 points
3 comments1 min readLW link

Holo­mor­phic sur­jec­tion the­o­rem (Pi­card’s lit­tle the­o­rem)

dkl9Jul 21, 2024, 1:24 PM
15 points
0 comments2 min readLW link
(dkl9.net)

aim­less ace an­a­lyzes ac­tive am­a­teur: a micro-aaaaal­ign­ment proposal

lemonhopeJul 21, 2024, 12:37 PM
12 points
0 comments1 min readLW link

Pivotal Acts are eas­ier than Align­ment?

Michael SoareverixJul 21, 2024, 12:15 PM
2 points
4 comments1 min readLW link

Ball Sq Pathways

jefftkJul 21, 2024, 2:20 AM
13 points
1 comment1 min readLW link
(www.jefftk.com)

Free­dom and Pri­vacy of Thought Architectures

SebastianG Jul 20, 2024, 9:43 PM
5 points
2 comments1 min readLW link

In­tro­duc­tion to Modern Dat­ing: Strate­gic Dat­ing Ad­vice for be­gin­ners

Jesper LindholmJul 20, 2024, 3:45 PM
6 points
6 comments13 min readLW link

Why Ge­or­gism Lost Its Popularity

Zero ContradictionsJul 20, 2024, 3:08 PM
45 points
54 comments1 min readLW link
(zerocontradictions.net)

Only Fools Avoid Hind­sight Bias

Kevin DorstJul 20, 2024, 1:42 PM
−11 points
5 comments6 min readLW link
(kevindorst.substack.com)

A more sys­tem­atic case for in­ner misalignment

Richard_NgoJul 20, 2024, 5:03 AM
31 points
4 comments5 min readLW link

BatchTopK: A Sim­ple Im­prove­ment for TopK-SAEs

Jul 20, 2024, 2:20 AM
61 points
0 comments4 min readLW link

Krona Compare

jefftkJul 20, 2024, 1:10 AM
10 points
0 comments2 min readLW link
(www.jefftk.com)

(Ap­prox­i­mately) Deter­minis­tic Nat­u­ral Latents

Jul 19, 2024, 11:02 PM
42 points
1 comment4 min readLW link

Fea­ture Tar­geted LLC Es­ti­ma­tion Dist­in­guishes SAE Fea­tures from Ran­dom Directions

Jul 19, 2024, 8:32 PM
59 points
6 comments16 min readLW link

JumpReLU SAEs + Early Ac­cess to Gemma 2 SAEs

Jul 19, 2024, 4:10 PM
49 points
10 comments1 min readLW link
(storage.googleapis.com)

Truth is Univer­sal: Ro­bust De­tec­tion of Lies in LLMs

Lennart BuergerJul 19, 2024, 2:07 PM
24 points
3 comments2 min readLW link
(arxiv.org)

Sus­tain­abil­ity of Digi­tal Life Form Societies

Hiroshi YamakawaJul 19, 2024, 1:59 PM
19 points
1 comment20 min readLW link

Ro­mae Industriae

Maxwell TabarrokJul 19, 2024, 1:03 PM
34 points
2 comments7 min readLW link
(www.maximum-progress.com)

[Question] Have peo­ple given up on iter­ated dis­til­la­tion and am­plifi­ca­tion?

Chris_LeongJul 19, 2024, 12:23 PM
20 points
1 comment1 min readLW link

How do we know that “good re­search” is good? (aka “di­rect eval­u­a­tion” vs “eigen-eval­u­a­tion”)

RubyJul 19, 2024, 12:31 AM
49 points
21 comments6 min readLW link

Linkpost: Surely you can be serious

kaveJul 18, 2024, 10:18 PM
62 points
8 comments1 min readLW link
(www.experimental-history.com)

My ex­pe­rience ap­ply­ing to MATS 6.0

micJul 18, 2024, 7:02 PM
17 points
3 comments5 min readLW link

[Question] What are the ac­tual ar­gu­ments in fa­vor of com­pu­ta­tion­al­ism as a the­ory of iden­tity?

sunwillriseJul 18, 2024, 6:44 PM
12 points
26 comments5 min readLW link

Yet Another Cri­tique of “Lux­ury Beliefs”

ymeskhoutJul 18, 2024, 6:37 PM
6 points
10 comments9 min readLW link
(www.ymeskhout.com)

[In­terim re­search re­port] Eval­u­at­ing the Goal-Direct­ed­ness of Lan­guage Models

Jul 18, 2024, 6:19 PM
40 points
4 comments11 min readLW link

In­ter­pretabil­ity in Ac­tion: Ex­plo­ra­tory Anal­y­sis of VPT, a Minecraft Agent

Jul 18, 2024, 5:02 PM
9 points
0 comments1 min readLW link
(arxiv.org)

Ac­ti­va­tion Eng­ineer­ing The­o­ries of Impact

kubaneticsJul 18, 2024, 4:44 PM
6 points
1 comment2 min readLW link

[Question] Me & My Clone

SimonBaarsJul 18, 2024, 4:25 PM
27 points
22 comments1 min readLW link

AI #73: Openly Evil AI

ZviJul 18, 2024, 2:40 PM
89 points
20 comments52 min readLW link
(thezvi.wordpress.com)

A List of 45+ Mech In­terp Pro­ject Ideas from Apollo Re­search’s In­ter­pretabil­ity Team

Jul 18, 2024, 2:15 PM
122 points
18 comments18 min readLW link

SAEs (usu­ally) Trans­fer Between Base and Chat Models

Jul 18, 2024, 10:29 AM
67 points
0 comments10 min readLW link

[Question] Should we ex­clude al­ign­ment re­search from LLM train­ing datasets?

Ben MillwoodJul 18, 2024, 10:27 AM
3 points
5 comments1 min readLW link

Keep­ing con­tent out of LLM train­ing datasets

Ben MillwoodJul 18, 2024, 10:27 AM
3 points
0 comments5 min readLW link

The As­sas­si­na­tion of Trump’s Ear is Ev­i­dence for Time-Travel

elvJul 18, 2024, 7:01 AM
−9 points
5 comments5 min readLW link

Friend­ship is trans­ac­tional, un­con­di­tional friend­ship is insurance

RubyJul 17, 2024, 10:52 PM
67 points
24 comments2 min readLW link

D&D.Sci: Whom Shall You Call? [Eval­u­a­tion and Rule­set]

abstractapplicJul 17, 2024, 10:34 PM
17 points
5 comments5 min readLW link

Op­ti­mistic As­sump­tions, Longterm Plan­ning, and “Cope”

RaemonJul 17, 2024, 10:14 PM
215 points
46 comments7 min readLW link

Bak­ing vs Patiss­ing vs Cook­ing, the HPS explanation

adamShimiJul 17, 2024, 8:29 PM
30 points
16 comments3 min readLW link
(epistemologicalfascinations.substack.com)

Launch­ing the Re­s­pi­ra­tory Out­look 2024/​25 Fore­cast­ing Series

ChristianWilliamsJul 17, 2024, 7:51 PM
5 points
0 commentsLW link
(www.metaculus.com)

What are you get­ting paid in?

Austin ChenJul 17, 2024, 7:23 PM
92 points
14 comments4 min readLW link
(www.approachwithalacrity.com)

In­di­vi­d­u­ally in­cen­tivized safe Pareto im­prove­ments in open-source bargaining

Jul 17, 2024, 6:26 PM
41 points
2 comments17 min readLW link

Profit and Value

kwangJul 17, 2024, 6:06 PM
22 points
3 comments6 min readLW link
(open.substack.com)

So You’ve Learned To Tele­port by Tom Scott

landscape_kiwiJul 17, 2024, 6:04 PM
4 points
0 comments1 min readLW link
(www.youtube.com)

How does gen­er­al­ized ac­cessibil­ity com­pare to tar­geted ac­cessibil­ity?

ErioirEJul 17, 2024, 5:07 PM
3 points
0 comments2 min readLW link