How use­ful is “AI Con­trol” as a fram­ing on AI X-Risk?

Mar 14, 2024, 6:06 PM
70 points
4 comments34 min readLW link

Un­der­stand­ing SAE Fea­tures with the Logit Lens

Mar 11, 2024, 12:16 AM
68 points
0 comments14 min readLW link

AE Stu­dio @ SXSW: We need more AI con­scious­ness re­search (and fur­ther re­sources)

Mar 26, 2024, 8:59 PM
67 points
8 comments3 min readLW link

So­cial sta­tus part 2/​2: ev­ery­thing else

Steven ByrnesMar 5, 2024, 4:29 PM
65 points
2 comments23 min readLW link

All About Con­cave and Con­vex Agents

mako yassMar 24, 2024, 9:37 PM
64 points
24 comments8 min readLW link

Su­perfore­cast­ing the Ori­gins of the Covid-19 Pandemic

DanielFilanMar 12, 2024, 7:01 PM
64 points
0 comments1 min readLW link
(goodjudgment.substack.com)

On the Glad­stone Report

ZviMar 20, 2024, 7:50 PM
64 points
11 comments40 min readLW link
(thezvi.wordpress.com)

We In­spected Every Head In GPT-2 Small us­ing SAEs So You Don’t Have To

Mar 6, 2024, 5:03 AM
63 points
0 comments12 min readLW link

AI #55: Keep Claud­ing Along

ZviMar 14, 2024, 3:40 PM
62 points
16 comments70 min readLW link
(thezvi.wordpress.com)

Do not delete your mis­al­igned AGI.

mako yassMar 24, 2024, 9:37 PM
62 points
13 comments3 min readLW link

More peo­ple get­ting into AI safety should do a PhD

AdamGleaveMar 14, 2024, 10:14 PM
61 points
24 comments12 min readLW link
(gleave.me)

Deep­Mind: Eval­u­at­ing Fron­tier Models for Danger­ous Capabilities

Zach Stein-PerlmanMar 21, 2024, 3:00 AM
61 points
8 comments1 min readLW link
(arxiv.org)

Re­search Re­port: Sparse Au­toen­coders find only 9/​180 board state fea­tures in OthelloGPT

Robert_AIZIMar 5, 2024, 1:55 PM
61 points
24 comments10 min readLW link
(aizi.substack.com)

Re­sults from an Ad­ver­sar­ial Col­lab­o­ra­tion on AI Risk (FRI)

Mar 11, 2024, 8:00 PM
61 points
3 comments9 min readLW link
(forecastingresearch.org)

[Question] What do we know about the AI knowl­edge and views, es­pe­cially about ex­is­ten­tial risk, of the new OpenAI board mem­bers?

ZviMar 11, 2024, 2:55 PM
60 points
2 comments2 min readLW link

5 Physics Problems

Mar 18, 2024, 8:05 AM
60 points
0 comments15 min readLW link

0th Per­son and 1st Per­son Logic

Adele LopezMar 10, 2024, 12:56 AM
60 points
28 comments6 min readLW link

Mea­sur­ing Co­her­ence of Poli­cies in Toy Environments

Mar 18, 2024, 5:59 PM
59 points
9 comments14 min readLW link

D&D.Sci: The Mad Tyrant’s Pet Turtles

abstractapplicMar 29, 2024, 4:22 PM
59 points
18 comments2 min readLW link

Woods’ new preprint on ob­ject permanence

Steven ByrnesMar 7, 2024, 9:29 PM
58 points
1 comment6 min readLW link

On the Lat­est TikTok Bill

ZviMar 13, 2024, 6:50 PM
58 points
7 comments29 min readLW link
(thezvi.wordpress.com)

AI things that are per­haps as im­por­tant as hu­man-con­trol­led AI

Chi NguyenMar 3, 2024, 6:07 PM
55 points
4 commentsLW link

Come to Man­i­fest 2024 (June 7-9 in Berkeley)

Saul MunnMar 27, 2024, 9:30 PM
54 points
2 commentsLW link
(news.manifold.markets)

Be More Katja

Nathan YoungMar 11, 2024, 9:12 PM
53 points
0 comments3 min readLW link

Was Re­leas­ing Claude-3 Net-Nega­tive?

Logan RiggsMar 27, 2024, 5:41 PM
52 points
5 comments4 min readLW link

On Lex Frid­man’s Se­cond Pod­cast with Altman

ZviMar 25, 2024, 12:20 PM
51 points
10 comments10 min readLW link
(thezvi.wordpress.com)

Vi­pas­sana Med­i­ta­tion and Ac­tive In­fer­ence: A Frame­work for Un­der­stand­ing Suffer­ing and its Cessation

sturbMar 21, 2024, 12:32 PM
50 points
8 comments19 min readLW link

Sce­nario Fore­cast­ing Work­shop: Ma­te­ri­als and Learnings

Mar 8, 2024, 2:30 AM
50 points
3 comments2 min readLW link

The Bro­ken Screw­driver and other parables

bhauthMar 4, 2024, 3:34 AM
49 points
1 comment2 min readLW link

Should ra­tio­nal­ists be spiritual /​ Spiritu­al­ity as over­com­ing delusion

Mar 25, 2024, 4:48 PM
49 points
57 comments29 min readLW link

High­lights from Lex Frid­man’s in­ter­view of Yann LeCun

Joel BurgetMar 13, 2024, 8:58 PM
48 points
15 comments41 min readLW link

Con­struc­tive Cauchy se­quences vs. Dedekind cuts

jessicataMar 14, 2024, 11:04 PM
47 points
23 comments4 min readLW link
(unstableontology.com)

How to safely use an optimizer

Simon FischerMar 28, 2024, 4:11 PM
47 points
21 comments7 min readLW link

AI Safety 101 : Ca­pa­bil­ities—Hu­man Level AI, What? How? and When?

Mar 7, 2024, 5:29 PM
46 points
8 comments54 min readLW link

Me­ta­science of the Ve­su­vius Challenge

Maxwell TabarrokMar 30, 2024, 12:02 PM
46 points
2 comments6 min readLW link
(www.maximum-progress.com)

Some costs of superposition

Linda LinseforsMar 3, 2024, 4:08 PM
46 points
11 comments3 min readLW link

How peo­ple stopped dy­ing from di­ar­rhea so much (& other life-sav­ing de­ci­sions)

WriterMar 16, 2024, 4:00 PM
45 points
0 commentsLW link
(youtu.be)

AI #54: Claud­ing Along

ZviMar 7, 2024, 4:00 PM
45 points
11 comments51 min readLW link
(thezvi.wordpress.com)

Lay­ing the Foun­da­tions for Vi­sion and Mul­ti­modal Mechanis­tic In­ter­pretabil­ity & Open Problems

Mar 13, 2024, 5:09 PM
44 points
13 comments14 min readLW link

Back to Ba­sics: Truth is Unitary

lsusrMar 29, 2024, 9:10 PM
44 points
13 comments6 min readLW link

Hous­ing Roundup #7

ZviMar 4, 2024, 3:00 PM
42 points
1 comment44 min readLW link
(thezvi.wordpress.com)

One-shot strat­egy games?

RaemonMar 11, 2024, 12:19 AM
41 points
42 comments1 min readLW link

A Teacher vs. Every­one Else

ronak69Mar 21, 2024, 5:45 PM
41 points
8 comments2 min readLW link

Jobs, Re­la­tion­ships, and Other Cults

Mar 13, 2024, 5:58 AM
40 points
9 comments35 min readLW link

Movie posters

KatjaGraceMar 6, 2024, 6:20 AM
40 points
0 comments2 min readLW link
(worldspiritsockpuppet.com)

Neu­ro­science and Alignment

Garrett BakerMar 18, 2024, 9:09 PM
40 points
25 comments2 min readLW link

Mud and De­s­pair (Part 4 of “The Sense Of Phys­i­cal Ne­ces­sity”)

LoganStrohlMar 7, 2024, 12:14 AM
38 points
0 comments2 min readLW link

Elon files grave charges against OpenAI

mako yassMar 1, 2024, 5:42 PM
38 points
10 comments1 min readLW link
(www.courthousenews.com)

In­creas­ing IQ is trivial

George3d6Mar 1, 2024, 10:43 PM
38 points
61 comments6 min readLW link
(epistemink.substack.com)

Sim­ple Kelly bet­ting in pre­dic­tion markets

jessicataMar 6, 2024, 6:59 PM
38 points
3 comments3 min readLW link
(unstablerontology.substack.com)