“Think it Faster” worksheet

Raemon8 Feb 2025 22:02 UTC
66 points
11 comments4 min readLW link

Seven sources of goals in LLM agents

Seth Herd8 Feb 2025 21:54 UTC
23 points
3 comments2 min readLW link

[Question] p(s-risks to con­tem­po­rary hu­mans)?

mhampton8 Feb 2025 21:19 UTC
6 points
5 comments6 min readLW link

Cross-Layer Fea­ture Align­ment and Steer­ing in Large Lan­guage Model

dlaptev8 Feb 2025 20:18 UTC
8 points
0 comments6 min readLW link

Towards build­ing blocks of ontologies

8 Feb 2025 16:03 UTC
29 points
0 comments26 min readLW link

Can Knowl­edge Hurt You? The Dangers of In­fo­haz­ards (and Exfo­haz­ards)

8 Feb 2025 15:51 UTC
19 points
0 comments5 min readLW link
(www.youtube.com)

Distill­ing the In­ter­nal Model Principle

JoseFaustino8 Feb 2025 14:59 UTC
21 points
0 comments16 min readLW link

Knock­ing Down My AI Op­ti­mist Strawman

tailcalled8 Feb 2025 10:52 UTC
31 points
3 comments6 min readLW link

Pre­serv­ing Epistemic Novelty in AI: Ex­per­i­ments, In­sights, and the Case for De­cen­tral­ized Col­lec­tive Intelligence

Andy E Williams8 Feb 2025 10:25 UTC
−4 points
8 comments7 min readLW link

Chaos In­vest­ments v0.31

Screwtape8 Feb 2025 6:53 UTC
16 points
1 comment9 min readLW link

AI Safety Oversights

Davey Morse8 Feb 2025 6:15 UTC
3 points
0 comments1 min readLW link

Wiki on Sus­pects in Lind, Za­jko, and Maland Killings

Rebecca_Records8 Feb 2025 4:16 UTC
20 points
4 comments1 min readLW link

Re­search di­rec­tions Open Phil wants to fund in tech­ni­cal AI safety

8 Feb 2025 1:40 UTC
117 points
21 comments58 min readLW link
(www.openphilanthropy.org)

So You Want To Make Marginal Progress...

johnswentworth7 Feb 2025 23:22 UTC
298 points
42 comments4 min readLW link

Rea­sons-based choice and cluelessness

JesseClifton7 Feb 2025 22:21 UTC
34 points
0 comments10 min readLW link

[Trans­la­tion] In the Age of AI don’t Look for Unicorns

mushroomsoup7 Feb 2025 21:06 UTC
3 points
0 comments10 min readLW link

Rac­ing Towards Fu­sion and AI

Jeffrey Heninger7 Feb 2025 20:40 UTC
49 points
11 comments7 min readLW link

‘High-Level Ma­chine In­tel­li­gence’ and ‘Full Au­toma­tion of La­bor’ in the AI Im­pacts Surveys

Jeffrey Heninger7 Feb 2025 20:40 UTC
11 points
1 comment7 min readLW link

Re­quest for In­for­ma­tion for a new US AI Ac­tion Plan (OSTP RFI)

agucova7 Feb 2025 20:40 UTC
5 points
0 comments2 min readLW link
(www.federalregister.gov)

A Prob­lem to Solve Be­fore Build­ing a De­cep­tion Detector

7 Feb 2025 19:35 UTC
76 points
12 comments14 min readLW link

Re­quest for pro­pos­als: im­prov­ing ca­pa­bil­ity evaluations

cb7 Feb 2025 18:51 UTC
1 point
0 comments1 min readLW link
(www.openphilanthropy.org)

How AI Takeover Might Hap­pen in 2 Years

joshc7 Feb 2025 17:10 UTC
432 points
140 comments29 min readLW link
(x.com)

the devil’s ontology

lostinwilliamsburg7 Feb 2025 14:18 UTC
−1 points
14 comments6 min readLW link

On the Meta and Deep­Mind Safety Frameworks

Zvi7 Feb 2025 13:10 UTC
45 points
1 comment17 min readLW link
(thezvi.wordpress.com)

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google

7 Feb 2025 3:57 UTC
37 points
0 comments10 min readLW link

When you down­vote, ex­plain why

KvmanThinking7 Feb 2025 1:03 UTC
5 points
31 comments1 min readLW link

Med­i­cal Wind­fall Prizes

PeterMcCluskey6 Feb 2025 23:33 UTC
5 points
1 comment5 min readLW link
(bayesianinvestor.com)

Do No Harm? Nav­i­gat­ing and Nudg­ing AI Mo­ral Choices

6 Feb 2025 19:18 UTC
11 points
0 comments9 min readLW link

Open Philan­thropy Tech­ni­cal AI Safety RFP - $40M Available Across 21 Re­search Areas

6 Feb 2025 18:58 UTC
111 points
0 comments1 min readLW link
(www.openphilanthropy.org)

AISN #47: Rea­son­ing Models

6 Feb 2025 18:52 UTC
3 points
0 comments4 min readLW link
(newsletter.safe.ai)

Wild An­i­mal Suffer­ing Is The Worst Thing In The World

Bentham's Bulldog6 Feb 2025 16:15 UTC
26 points
18 comments7 min readLW link

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

6 Feb 2025 15:46 UTC
104 points
9 comments2 min readLW link
(arxiv.org)

AI #102: Made in America

Zvi6 Feb 2025 14:20 UTC
26 points
18 comments67 min readLW link
(thezvi.wordpress.com)

Biol­ogy, Ide­ol­ogy and Violence

Zero Contradictions6 Feb 2025 11:26 UTC
−3 points
5 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

MATS Ap­pli­ca­tions + Re­search Direc­tions I’m Cur­rently Ex­cited About

Neel Nanda6 Feb 2025 11:03 UTC
73 points
7 comments8 min readLW link

Don’t go bankrupt, don’t go rogue

Nathan Young6 Feb 2025 10:31 UTC
20 points
1 comment7 min readLW link

Vot­ing Re­sults for the 2023 Review

Raemon6 Feb 2025 8:00 UTC
86 points
3 comments69 min readLW link

Chi­canery: No

Screwtape6 Feb 2025 5:42 UTC
31 points
10 comments5 min readLW link

[Question] hyp­no­sis question

KvmanThinking6 Feb 2025 2:41 UTC
3 points
5 comments1 min readLW link

BIDA Cal­en­dar iCal Feed

jefftk6 Feb 2025 1:30 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

C’mon guys, De­liber­ate Prac­tice is Real

Raemon5 Feb 2025 22:33 UTC
99 points
25 comments9 min readLW link

The Risk of Grad­ual Disem­pow­er­ment from AI

Zvi5 Feb 2025 22:10 UTC
87 points
20 comments20 min readLW link
(thezvi.wordpress.com)

Wired on: “DOGE per­son­nel with ad­min ac­cess to Fed­eral Pay­ment Sys­tem”

Raemon5 Feb 2025 21:32 UTC
88 points
45 comments2 min readLW link
(web.archive.org)

On AI Scaling

harsimony5 Feb 2025 20:24 UTC
6 points
3 comments8 min readLW link
(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliams5 Feb 2025 19:17 UTC
21 points
0 comments6 min readLW link
(www.metaculus.com)

Post-hoc rea­son­ing in chain of thought

Kyle Cox5 Feb 2025 18:58 UTC
19 points
0 comments11 min readLW link

Deep­Seek-R1 for Beginners

Anton Razzhigaev5 Feb 2025 18:58 UTC
13 points
0 comments8 min readLW link

Mak­ing the case for av­er­age-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC
4 points
0 comments5 min readLW link

[Question] Align­ment Para­dox and a Re­quest for Harsh Criticism

Bridgett Kay5 Feb 2025 18:17 UTC
6 points
7 comments1 min readLW link

In­tro­duc­ing In­ter­na­tional AI Gover­nance Alli­ance (IAIGA)

jamesnorris5 Feb 2025 16:02 UTC
7 points
0 comments1 min readLW link