Your AI Safety fo­cus is down­stream of your AGI timeline

Michael Flood17 Jan 2025 21:24 UTC
9 points
0 comments4 min readLW link

Thoughts on the con­ser­va­tive as­sump­tions in AI control

Buck17 Jan 2025 19:23 UTC
91 points
5 comments13 min readLW link

Ti­maeus is hiring re­searchers & engineers

17 Jan 2025 19:13 UTC
65 points
4 comments4 min readLW link

Model Amnesty Project

themis17 Jan 2025 18:53 UTC
3 points
2 comments3 min readLW link

Ad­dress­ing doubts of AI progress: Why GPT-5 is not late, and why data scarcity isn’t a fun­da­men­tal limiter near term.

LDJ17 Jan 2025 18:53 UTC
2 points
0 comments2 min readLW link

Play­ing Dixit with AI: How Well LLMs De­tect ‘Me-ness’

Mariia Koroliuk17 Jan 2025 18:52 UTC
5 points
0 comments2 min readLW link

Do­ing a self-ran­dom­ized study of the im­pacts of glycine on sleep (Science is hard)

thedissonance.net17 Jan 2025 18:49 UTC
11 points
5 comments11 min readLW link

How sci-fi can have drama with­out dystopia or doomerism

jasoncrawford17 Jan 2025 15:22 UTC
19 points
3 comments3 min readLW link
(newsletter.rootsofprogress.org)

[Question] What do you mean with ‘al­ign­ment is solv­able in prin­ci­ple’?

Remmelt17 Jan 2025 15:03 UTC
3 points
9 comments1 min readLW link

Meta Pivots on Con­tent Moderation

Zvi17 Jan 2025 14:20 UTC
47 points
3 comments10 min readLW link
(thezvi.wordpress.com)

Tax Price Goug­ing?

jefftk17 Jan 2025 14:10 UTC
55 points
22 comments3 min readLW link
(www.jefftk.com)

The quan­tum red pill or: They lied to you, we live in the (den­sity) matrix

Dmitry Vaintrob17 Jan 2025 13:58 UTC
37 points
34 comments12 min readLW link

Bed­nets -- 4 longer malaria studies

Hzn17 Jan 2025 8:47 UTC
4 points
0 comments4 min readLW link

Pa­tent Trol­ling to Save the World

Double17 Jan 2025 4:13 UTC
23 points
7 comments3 min readLW link

Call Booth Ex­ter­nal Monitor

jefftk17 Jan 2025 3:10 UTC
15 points
0 comments1 min readLW link
(www.jefftk.com)

[Cross-post] Wel­come to the Es­say Meta

davekasten16 Jan 2025 23:36 UTC
14 points
2 comments8 min readLW link

AI for Re­solv­ing Fore­cast­ing Ques­tions: An Early Exploration

ozziegooen16 Jan 2025 21:41 UTC
10 points
2 comments9 min readLW link

[Question] How Do You In­ter­pret the Goal of LessWrong and Its Com­mu­nity?

ashen846116 Jan 2025 19:08 UTC
−2 points
2 comments1 min readLW link

Ex­perts’ AI timelines are longer than you have been told?

Vasco Grilo16 Jan 2025 18:03 UTC
10 points
4 comments3 min readLW link
(bayes.net)

Num­ber­wang: LLMs Do­ing Au­tonomous Re­search, and a Call for Input

16 Jan 2025 17:20 UTC
71 points
30 comments31 min readLW link

Topolog­i­cal De­bate Framework

lunatic_at_large16 Jan 2025 17:19 UTC
10 points
5 comments9 min readLW link

AI #99: Farewell to Biden

Zvi16 Jan 2025 14:20 UTC
54 points
5 comments58 min readLW link
(thezvi.wordpress.com)

De­cep­tive Align­ment and Homuncularity

16 Jan 2025 13:55 UTC
26 points
12 comments22 min readLW link

In­tro­duc­ing the WeirdML Benchmark

Håvard Tveit Ihle16 Jan 2025 11:38 UTC
57 points
13 comments11 min readLW link

The Math­e­mat­i­cal Rea­son You should have 9 Kids

Zero Contradictions16 Jan 2025 11:24 UTC
−9 points
6 comments1 min readLW link
(eternalanglo.com)

Quan­tum with­out complication

16 Jan 2025 8:53 UTC
30 points
2 comments10 min readLW link

Per­ma­nents: much more than you wanted to know

Dmitry Vaintrob16 Jan 2025 8:04 UTC
17 points
2 comments15 min readLW link

Gam­ing Truth­fulQA: Sim­ple Heuris­tics Ex­posed Dataset Weaknesses

TurnTrout16 Jan 2025 2:14 UTC
65 points
3 comments1 min readLW link
(turntrout.com)

What Is The Align­ment Prob­lem?

johnswentworth16 Jan 2025 1:20 UTC
181 points
49 comments25 min readLW link

Im­prov­ing Our Safety Cases Us­ing Up­per and Lower Bounds

Yonatan Cale16 Jan 2025 0:01 UTC
23 points
0 comments3 min readLW link

Un­reg­u­lated Pep­tides: Does BPC-157 hold its promises?

ChristianKl15 Jan 2025 23:36 UTC
28 points
7 comments4 min readLW link

New, im­proved mul­ti­ple-choice TruthfulQA

15 Jan 2025 23:32 UTC
72 points
1 comment3 min readLW link

The Differ­ence Between Pre­dic­tion Mar­kets and De­bate (Ar­gu­ment) Maps

Jamie Joyce15 Jan 2025 23:19 UTC
7 points
3 comments3 min readLW link

A Novel Emer­gence of Meta-Aware­ness in LLM Fine-Tuning

rife15 Jan 2025 22:59 UTC
57 points
32 comments2 min readLW link

Six Small Co­hab­itive Games

Screwtape15 Jan 2025 21:59 UTC
40 points
7 comments13 min readLW link

LLMs are re­ally good at k-or­der think­ing (where k is even)

charlieoneill15 Jan 2025 20:43 UTC
7 points
0 comments2 min readLW link

Every­where I Look, I See Kat Woods

just_browsing15 Jan 2025 19:29 UTC
19 points
45 comments5 min readLW link

[un­ti­tled post]

Emre15 Jan 2025 18:52 UTC
−1 points
0 comments1 min readLW link

“Pick Two” AI Trilemma: Gen­er­al­ity, Agency, Align­ment.

Black Flag15 Jan 2025 18:52 UTC
7 points
0 comments2 min readLW link

Myths about Non­d­u­al­ity and Science by Gary Weber

Vadim Golub15 Jan 2025 18:33 UTC
2 points
0 comments23 min readLW link

Marx and the Machine

DAL15 Jan 2025 18:33 UTC
5 points
2 comments9 min readLW link

Code4Com­pas­sion 2025: a hackathon trans­form­ing an­i­mal ad­vo­cacy through technology

superbeneficiary15 Jan 2025 18:31 UTC
3 points
0 comments1 min readLW link

Ap­pli­ca­tions Open for the Co­op­er­a­tive AI Sum­mer School 2025!

JesseClifton15 Jan 2025 18:16 UTC
7 points
0 comments1 min readLW link

List of AI safety pa­pers from com­pa­nies, 2023–2024

Zach Stein-Perlman15 Jan 2025 18:00 UTC
11 points
0 comments1 min readLW link

AI Align­ment Meme Viruses

RationalDino15 Jan 2025 15:55 UTC
5 points
0 comments2 min readLW link

Look­ing for hu­man­ness in the world wide social

Itay Dreyfus15 Jan 2025 14:50 UTC
11 points
0 comments6 min readLW link
(productidentity.co)

On the OpenAI Eco­nomic Blueprint

Zvi15 Jan 2025 14:30 UTC
81 points
2 comments9 min readLW link
(thezvi.wordpress.com)

A prob­lem shared by many differ­ent al­ign­ment targets

ThomasCederborg15 Jan 2025 14:22 UTC
13 points
18 comments36 min readLW link

LLMs for lan­guage learning

Benquo15 Jan 2025 14:08 UTC
10 points
2 comments7 min readLW link
(benjaminrosshoffman.com)

Fea­ture re­quest: com­ment bookmarks

dirk15 Jan 2025 6:45 UTC
18 points
2 comments1 min readLW link