Abla­tions for “Fron­tier Models are Ca­pable of In-con­text Schem­ing”

Dec 17, 2024, 11:58 PM
115 points

35 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Care­less think­ing: A the­ory of bad thinking

Nathan YoungDec 17, 2024, 6:23 PM
49 points

16 votes

Overall karma indicates overall quality.

17 comments9 min readLW link
(nathanpmyoung.substack.com)

The Se­cond Gemini

ZviDec 17, 2024, 3:50 PM
23 points

9 votes

Overall karma indicates overall quality.

0 comments11 min readLW link
(thezvi.wordpress.com)

AIS Hun­gary is hiring a part-time Tech­ni­cal Lead! (Dead­line: Dec 31st)

gergogasparDec 17, 2024, 2:12 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

Every­thing you care about is in the map

TahpDec 17, 2024, 2:05 PM
17 points

12 votes

Overall karma indicates overall quality.

27 comments3 min readLW link

Real­ity is Frac­tal-Shaped

silentbobDec 17, 2024, 1:52 PM
18 points

12 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

Try­ing to trans­late when peo­ple talk past each other

Kaj_SotalaDec 17, 2024, 9:40 AM
41 points

17 votes

Overall karma indicates overall quality.

12 comments6 min readLW link
(kajsotala.fi)

What is “wire­head­ing”?

Dec 17, 2024, 7:49 AM
10 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(aisafety.info)

1 What If We Re­build Mo­ti­va­tion with the Fermi ESTIMATion?

P. JoãoDec 17, 2024, 7:46 AM
6 points

6 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Where do you put your ideas?

CstineSublimeDec 17, 2024, 7:26 AM
9 points

5 votes

Overall karma indicates overall quality.

20 comments1 min readLW link

Ele­vat­ing Air Purifiers

jefftkDec 17, 2024, 1:40 AM
25 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.jefftk.com)

A dataset of ques­tions on de­ci­sion-the­o­retic rea­son­ing in New­comb-like problems

Dec 16, 2024, 10:42 PM
50 points

27 votes

Overall karma indicates overall quality.

1 comment2 min readLW link
(arxiv.org)

A prac­ti­cal guide to tiling the uni­verse with hedonium

Vittu PerkeleDec 16, 2024, 9:25 PM
−8 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(perkeleperusing.substack.com)

AI Safety Seed Fund­ing Net­work—Join as a Donor or Investor

Alexandra BosDec 16, 2024, 7:30 PM
30 points

9 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

I read ev­ery ma­jor AI lab’s safety plan so you don’t have to

sarahhwDec 16, 2024, 6:51 PM
20 points

5 votes

Overall karma indicates overall quality.

0 comments12 min readLW link
(longerramblings.substack.com)

Grokking re­vis­ited: re­verse en­g­ineer­ing grokking mod­ulo ad­di­tion in LSTM

Dec 16, 2024, 6:48 PM
4 points

4 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Progress links and short notes, 2024-12-16

jasoncrawfordDec 16, 2024, 5:24 PM
7 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(newsletter.rootsofprogress.org)

Effec­tive Altru­ism FAQ

Bentham's BulldogDec 16, 2024, 4:27 PM
0 points

9 votes

Overall karma indicates overall quality.

7 comments12 min readLW link

Vari­ably com­press­ibly stud­ies are fun

dkl9Dec 16, 2024, 4:00 PM
0 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link
(dkl9.net)

AIs Will In­creas­ingly At­tempt Shenanigans

ZviDec 16, 2024, 3:20 PM
118 points

59 votes

Overall karma indicates overall quality.

2 comments26 min readLW link
(thezvi.wordpress.com)

Test­ing which LLM ar­chi­tec­tures can do hid­den se­rial reasoning

Filip SondejDec 16, 2024, 1:48 PM
84 points

29 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Neu­roAI for AI safety: A Differ­en­tial Path

Dec 16, 2024, 1:17 PM
22 points

6 votes

Overall karma indicates overall quality.

0 comments7 min readLW link
(arxiv.org)

Cir­cling as prac­tice for “just be your­self”

Kaj_SotalaDec 16, 2024, 7:40 AM
87 points

40 votes

Overall karma indicates overall quality.

6 comments4 min readLW link
(kajsotala.fi)

Re­an­a­lyz­ing the 2023 Ex­pert Sur­vey on Progress in AI

AI ImpactsDec 16, 2024, 6:10 AM
8 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(blog.aiimpacts.org)

Ideas for bench­mark­ing LLM creativity

gwernDec 16, 2024, 5:18 AM
60 points

23 votes

Overall karma indicates overall quality.

11 comments1 min readLW link
(gwern.net)

Com­par­ing the AirFanta 3Pro to the Coway AP-1512

jefftkDec 16, 2024, 1:40 AM
13 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link
(www.jefftk.com)

[Question] are IQ tests a good mea­sure of in­tel­li­gence?

KvmanThinkingDec 15, 2024, 11:06 PM
0 points

8 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Madi­son Sec­u­lar Solstice

svfritzDec 15, 2024, 9:52 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Is AI al­ign­ment a purely func­tional prop­erty?

RokoDec 15, 2024, 9:42 PM
13 points

5 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

[Question] How coun­ter­fac­tual are log­i­cal coun­ter­fac­tu­als?

Donald HobsonDec 15, 2024, 9:16 PM
11 points

7 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

De­bunk­ing the myth of safe AI

henophiliaDec 15, 2024, 5:44 PM
−11 points

5 votes

Overall karma indicates overall quality.

8 comments1 min readLW link
(henophilia.substack.com)

In­tro­duc­ing Avatarism: A Ra­tional Frame­work for Build­ing ac­tual Heaven

ratiba roDec 15, 2024, 5:17 PM
2 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

A Public Choice Take on Effec­tive Altruism

vaishnav92Dec 15, 2024, 4:58 PM
9 points

3 votes

Overall karma indicates overall quality.

4 comments3 min readLW link
(www.optimaloutliers.com)

World Models I’m Cur­rently Building

temporaryDec 15, 2024, 4:29 PM
5 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(samuelshadrach.com)

Dress Up For Sec­u­lar Solstice

Gordon H.S.Dec 15, 2024, 4:28 PM
33 points

21 votes

Overall karma indicates overall quality.

13 comments7 min readLW link

Remap your caps lock key

bilalchughtaiDec 15, 2024, 2:03 PM
81 points

59 votes

Overall karma indicates overall quality.

21 comments1 min readLW link

Effec­tive Evil’s AI Misal­ign­ment Plan

lsusrDec 15, 2024, 7:39 AM
83 points

57 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

How to Edit an Es­say into a Sols­tice Speech?

CzynskiDec 15, 2024, 4:30 AM
5 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(thepdv.wordpress.com)

How Your Phys­iol­ogy Affects the Mind’s Pro­jec­tion Fallacy

YanLyutnevDec 14, 2024, 9:10 PM
−1 points

11 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

In­tro­duc­ing the Ev­i­dence Color Wheel

Larry LeeDec 14, 2024, 4:08 PM
6 points

4 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

An Illus­trated Sum­mary of “Ro­bust Agents Learn Causal World Model”

DalcyDec 14, 2024, 3:02 PM
67 points

21 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

Best-of-N Jailbreaking

Dec 14, 2024, 4:58 AM
78 points

30 votes

Overall karma indicates overall quality.

5 comments2 min readLW link
(arxiv.org)

D&D.Sci Dun­geon­build­ing: the Dun­geon Tournament

aphyerDec 14, 2024, 4:30 AM
50 points

11 votes

Overall karma indicates overall quality.

16 comments3 min readLW link

Creat­ing In­ter­pretable La­tent Spaces with Gra­di­ent Routing

Jacob G-WDec 14, 2024, 4:00 AM
26 points

9 votes

Overall karma indicates overall quality.

6 comments2 min readLW link
(jacobgw.com)

Prob­a­bil­ity of death by suicide by a 26 year old

John WisemanDec 14, 2024, 3:33 AM
−25 points

14 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Ma­tryoshka Sparse Autoencoders

Noa NabeshimaDec 14, 2024, 2:52 AM
98 points

36 votes

Overall karma indicates overall quality.

15 comments11 min readLW link

[Question] What is MIRI cur­rently do­ing?

RokoDec 14, 2024, 2:39 AM
33 points

18 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

The o1 Sys­tem Card Is Not About o1

ZviDec 13, 2024, 8:30 PM
116 points

39 votes

Overall karma indicates overall quality.

5 comments16 min readLW link
(thezvi.wordpress.com)

Arch-an­ar­chy and The Fable of the Dragon-Tyrant

Peter lawless Dec 13, 2024, 8:15 PM
−10 points

6 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Com­mu­ni­ca­tions in Hard Mode (My new job at MIRI)

tanagrabeastDec 13, 2024, 8:13 PM
208 points

97 votes

Overall karma indicates overall quality.

25 comments5 min readLW link