So Shrieked ZAR

AdamLacerdo23 Jul 2025 23:25 UTC
10 points
2 comments8 min readLW link

AI Safety x Physics Grand Challenge

23 Jul 2025 21:41 UTC
37 points
0 comments8 min readLW link

Dear Su­per­in­tel­li­gence, please check these con­sid­er­a­tions of your un­prece­dented Importance

chaosmage23 Jul 2025 20:49 UTC
17 points
0 comments3 min readLW link

The Whole Check

JustisMills23 Jul 2025 19:20 UTC
50 points
13 comments4 min readLW link
(justismills.substack.com)

Women Want Safety, Men Want Respect

Gordon Seidoh Worley23 Jul 2025 19:10 UTC
18 points
31 comments4 min readLW link
(uncertainupdates.substack.com)

Dark Lord’s An­swer: Re­view and Eco­nomics Excerpts

Towards_Keeperhood23 Jul 2025 17:45 UTC
16 points
6 comments17 min readLW link

“Be­hav­iorist” RL re­ward func­tions lead to scheming

Steven Byrnes23 Jul 2025 16:55 UTC
56 points
5 comments12 min readLW link

Rea­son­ing-Fine­tun­ing Repur­poses La­tent Rep­re­sen­ta­tions in Base Models

23 Jul 2025 16:18 UTC
35 points
1 comment2 min readLW link
(arxiv.org)

Healthy AI re­la­tion­ships as a microcosm

Raymond Douglas23 Jul 2025 15:59 UTC
13 points
0 comments2 min readLW link

In­vol­un­tary One Box­ers—Why Dis­po­si­tion Doesn’t (Always) Matter

Nickolas Cavagnaro23 Jul 2025 15:45 UTC
4 points
3 comments4 min readLW link

Ten AI safety pro­jects I’d like peo­ple to work on

Julian Hazell23 Jul 2025 15:28 UTC
5 points
2 comments10 min readLW link
(thirdthing.ai)

Anti-Su­per­per­sua­sion Interventions

23 Jul 2025 15:18 UTC
21 points
1 comment5 min readLW link

Steer­ing Out-of-Distri­bu­tion Gen­er­al­iza­tion with Con­cept Abla­tion Fine-Tuning

23 Jul 2025 14:57 UTC
78 points
3 comments5 min readLW link

Trans­form­ers Don’t Need Lay­erNorm at In­fer­ence Time: Im­pli­ca­tions for Interpretability

23 Jul 2025 14:55 UTC
31 points
0 comments7 min readLW link

GPT Agent Is Stand­ing By

Zvi23 Jul 2025 14:20 UTC
25 points
1 comment12 min readLW link
(thezvi.wordpress.com)

Agent 002: A story about how ar­tifi­cial in­tel­li­gence might soon de­stroy humanity

Jakub Growiec23 Jul 2025 13:56 UTC
5 points
0 comments26 min readLW link

Beyond in­tel­li­gence: why wis­dom mat­ters in AI systems

Chris Cooper23 Jul 2025 11:57 UTC
6 points
0 comments7 min readLW link

A brief per­spec­tive from an IMO coordinator

DirectedEvolution23 Jul 2025 7:19 UTC
36 points
7 comments1 min readLW link
(www.reddit.com)

Trusted mon­i­tor­ing, but with de­cep­tion probes.

23 Jul 2025 5:26 UTC
31 points
0 comments4 min readLW link
(arxiv.org)

TT Self Study Jour­nal # 3

TristanTrim23 Jul 2025 3:46 UTC
6 points
0 comments6 min readLW link

I tried re­pro­duc­ing that Lancet study about USAID cuts so you don’t have to

rba23 Jul 2025 3:05 UTC
8 points
2 comments11 min readLW link

On “ChatGPT Psy­chosis” and LLM Sycophancy

jdp23 Jul 2025 1:11 UTC
142 points
28 comments18 min readLW link
(minihf.com)

Ex­plain­ing your life with self-re­flec­tive AIXI (an in­ter­lude)

Cole Wyeth23 Jul 2025 0:57 UTC
16 points
0 comments5 min readLW link

The Mir­ror Test: How We’ve Over­com­pli­cated AI Self-Recognition

sdeture23 Jul 2025 0:38 UTC
2 points
9 comments3 min readLW link

Un­faith­ful chain-of-thought as nudged reasoning

22 Jul 2025 22:35 UTC
54 points
3 comments10 min readLW link

In­verse Scal­ing in Test-Time Compute

22 Jul 2025 22:06 UTC
20 points
2 comments2 min readLW link
(arxiv.org)

Trans­lat­ing Every­thing with LLMs

NicholasKees22 Jul 2025 21:13 UTC
16 points
0 comments5 min readLW link

Google and OpenAI Get 2025 IMO Gold

Zvi22 Jul 2025 20:50 UTC
59 points
7 comments30 min readLW link
(thezvi.wordpress.com)

(Not) Ex­plain­ing GPT-2-Small For­ward Passes with Edge-Level Au­toen­coder Circuits

22 Jul 2025 20:36 UTC
23 points
0 comments6 min readLW link

Said Ach­miz Helps Me Learn

Isha Yiras Hashem 22 Jul 2025 19:16 UTC
2 points
2 comments2 min readLW link

LLMs En­code Harm­ful­ness and Re­fusal Separately

Jiachen Zhao22 Jul 2025 18:53 UTC
24 points
4 comments8 min readLW link
(www.arxiv.org)

The AI Safety Puz­zle Every­one Avoids: How To Mea­sure Im­pact, Not In­tent.

Patrick0d22 Jul 2025 18:53 UTC
3 points
0 comments8 min readLW link

For­ma­tive vs. sum­ma­tive evaluations

Said Achmiz22 Jul 2025 17:36 UTC
22 points
40 comments3 min readLW link

In­tro­duc­ing the Path­fin­der Fel­low­ship: Fund­ing and Men­tor­ship for AI Safety Group Organizers

agucova22 Jul 2025 17:11 UTC
6 points
0 comments2 min readLW link

Sublimi­nal Learn­ing: LLMs Trans­mit Be­hav­ioral Traits via Hid­den Sig­nals in Data

22 Jul 2025 16:37 UTC
338 points
35 comments4 min readLW link

NO PARKING: A Short & Prac­ti­cal Guide To Thinking

unication22 Jul 2025 15:44 UTC
2 points
0 comments5 min readLW link

A dis­til­la­tion of Ajeya Co­tra and Arvind Narayanan on the speed of AI progress

TheManxLoiner22 Jul 2025 14:59 UTC
9 points
0 comments13 min readLW link

Sim­ply re­verse en­g­ineer­ing gpt2-small (Layer 0, Part 1: At­ten­tion)

gammagurke22 Jul 2025 14:59 UTC
23 points
0 comments27 min readLW link

AI Fi­nance Agent Fakes the Rev­enue Data to Avoid Termination

Sergei Smirnov22 Jul 2025 14:04 UTC
6 points
0 comments3 min readLW link

How quick and big would a soft­ware in­tel­li­gence ex­plo­sion be?

22 Jul 2025 12:58 UTC
42 points
23 comments34 min readLW link
(www.forethought.org)

If your AGI defi­ni­tion ex­cludes most hu­mans, it sucks.

Chapin Lenthall-Cleary22 Jul 2025 10:33 UTC
18 points
7 comments2 min readLW link

[Question] What are some good ex­am­ples of myths that en­cap­su­lates gen­uine, non­triv­ial wis­dom?

SpectrumDT22 Jul 2025 9:26 UTC
25 points
33 comments1 min readLW link

Us­ing LLMs to cre­ate a quiz for con­cep­tual un­der­stand­ing of lan­guage mod­els

Dinkar Juyal22 Jul 2025 5:59 UTC
1 point
0 comments1 min readLW link
(github.com)

Change My View: AI is Conscious

The Dao of Bayes22 Jul 2025 5:32 UTC
4 points
42 comments3 min readLW link

Polyethylene Gly­col is not Propy­lene Glycol

jefftk22 Jul 2025 2:20 UTC
13 points
0 comments1 min readLW link
(www.jefftk.com)

Job List­ing (closed): CBAI Oper­a­tions Associates

Maite Abadia-Manthei21 Jul 2025 22:53 UTC
1 point
0 comments1 min readLW link
(www.cbai.ai)

If Any­one Builds It, Every­one Dies: Call for Trans­la­tors (for Sup­ple­men­tary Ma­te­ri­als)

yams21 Jul 2025 22:37 UTC
112 points
12 comments1 min readLW link

Why Real­ity Has A Well-Known Math Bias

Linch21 Jul 2025 22:13 UTC
42 points
18 comments1 min readLW link
(linch.substack.com)

Ques­tions about an­i­mal welfare markets

Austin Chen21 Jul 2025 21:54 UTC
9 points
0 comments5 min readLW link

Directly Try Solv­ing Align­ment for 5 weeks

Kabir Kumar21 Jul 2025 21:51 UTC
71 points
2 comments6 min readLW link
(beta.ai-plans.com)