Three Quotes on Trans­for­ma­tive Technology

Chris_Leong1 Aug 2025 22:57 UTC
8 points
3 comments1 min readLW link

SB-1047 Doc­u­men­tary: The Post-Mortem

Michaël Trazzi1 Aug 2025 21:42 UTC
130 points
0 comments5 min readLW link

Per­sona vec­tors: mon­i­tor­ing and con­trol­ling char­ac­ter traits in lan­guage models

1 Aug 2025 21:19 UTC
25 points
3 comments5 min readLW link
(arxiv.org)

Boots the­ory and Wikipedia

philh1 Aug 2025 20:30 UTC
8 points
12 comments12 min readLW link
(reasonableapproximation.net)

Pod­cast: Lin­coln Quirk from Wave

Elizabeth1 Aug 2025 19:00 UTC
40 points
1 comment1 min readLW link
(acesounderglass.com)

AI in a vat: Fun­da­men­tal limits of effi­cient world mod­el­ling for safe agent sandboxing

Fernando Rosas1 Aug 2025 18:37 UTC
34 points
3 comments15 min readLW link

The Dark Arts As A Scaf­fold­ing Skill For Rationality

Screwtape1 Aug 2025 17:12 UTC
82 points
25 comments7 min readLW link

Steve Petersen seek­ing funding

abramdemski1 Aug 2025 17:03 UTC
87 points
0 comments1 min readLW link

The Week in AI Governance

Zvi1 Aug 2025 12:20 UTC
18 points
1 comment24 min readLW link
(thezvi.wordpress.com)

Re­search Areas in AI Con­trol (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:27 UTC
25 points
0 comments18 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Meth­ods for Post-train­ing and Elic­i­ta­tion (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:27 UTC
12 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Bench­mark De­sign and Eval­u­a­tion (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
10 points
0 comments9 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in In­ter­pretabil­ity (The Align­ment Pro­ject by UK AISI)

Joseph Bloom1 Aug 2025 10:26 UTC
14 points
0 comments5 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Cog­ni­tive Science (The Align­ment Pro­ject by UK AISI)

Geoffrey Irving1 Aug 2025 10:26 UTC
12 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Learn­ing The­ory (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
15 points
0 comments24 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Prob­a­bil­is­tic Meth­ods (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:26 UTC
3 points
0 comments4 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Eco­nomic The­ory and Game The­ory (The Align­ment Pro­ject by UK AISI)

Cecilia Wood1 Aug 2025 10:25 UTC
4 points
0 comments6 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in Com­pu­ta­tional Com­plex­ity The­ory (The Align­ment Pro­ject by UK AISI)

Simon Marshall1 Aug 2025 10:25 UTC
6 points
0 comments10 min readLW link
(alignmentproject.aisi.gov.uk)

Re­search Areas in In­for­ma­tion The­ory and Cryp­tog­ra­phy (The Align­ment Pro­ject by UK AISI)

Simon Marshall1 Aug 2025 10:25 UTC
6 points
0 comments3 min readLW link
(alignmentproject.aisi.gov.uk)

Self-Align­ment: Ex­plor­ing the per­spec­tive of An­a­lyt­i­cal Psychology

JakeArgent1 Aug 2025 10:17 UTC
4 points
0 comments12 min readLW link

Re­search Areas in Eval­u­a­tion and Guaran­tees in Re­in­force­ment Learn­ing (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 9:53 UTC
14 points
0 comments11 min readLW link
(alignmentproject.aisi.gov.uk)

The Align­ment Pro­ject by UK AISI

1 Aug 2025 9:52 UTC
29 points
0 comments2 min readLW link
(alignmentproject.aisi.gov.uk)

Pro­lific.com sur­vey on AI pause

samuelshadrach1 Aug 2025 8:33 UTC
9 points
3 comments7 min readLW link
(samuelshadrach.com)

Some mis­takes in think­ing about AGI evolu­tion and control

Remmelt1 Aug 2025 8:08 UTC
7 points
0 comments1 min readLW link

“Op­po­nent shap­ing” as a model for ma­nipu­la­tion and cooperation

Dan MacKinlay1 Aug 2025 7:50 UTC
9 points
0 comments17 min readLW link
(danmackinlay.name)

Two Kinds of Do Overs

jefftk1 Aug 2025 2:30 UTC
65 points
1 comment2 min readLW link
(www.jefftk.com)

Call on AI Com­pa­nies: Pub­lish Your Whistle­blow­ing Policies

karl31 Jul 2025 22:04 UTC
20 points
3 comments7 min readLW link

Do Not Ren­der Your Counterfactuals

AlphaAndOmega31 Jul 2025 21:35 UTC
110 points
19 comments5 min readLW link
(open.substack.com)

Emer­gence Is Beau­tiful—beauty and mean­ing in an en­tropic universe

James Stephen Brown31 Jul 2025 19:00 UTC
8 points
0 comments5 min readLW link

Sharp­en­ing the Shears: 8 Les­sons from Gar­den Leave

Jordan Rubin31 Jul 2025 18:57 UTC
8 points
0 comments4 min readLW link
(jordanmrubin.substack.com)

AISN #60: The AI Ac­tion Plan

31 Jul 2025 18:20 UTC
6 points
0 comments4 min readLW link
(newsletter.safe.ai)

Ap­prox­i­mat­ing Hu­man Prefer­ences Us­ing a Multi-Judge Learned System

31 Jul 2025 18:01 UTC
19 points
0 comments13 min readLW link

Fol­low-up to “My Em­pa­thy Is Rarely Kind”

johnswentworth31 Jul 2025 17:21 UTC
80 points
42 comments2 min readLW link

Book Re­view: The MANIAC

Annapurna31 Jul 2025 15:18 UTC
15 points
6 comments2 min readLW link
(jorgevelez.substack.com)

Red-Thing-Ism

J Bostock31 Jul 2025 14:09 UTC
101 points
9 comments3 min readLW link

AI #127: Con­tinued Claude Code Complications

Zvi31 Jul 2025 13:40 UTC
32 points
4 comments43 min readLW link
(thezvi.wordpress.com)

I am wor­ried about near-term non-LLM AI developments

testingthewaters31 Jul 2025 13:15 UTC
251 points
56 comments5 min readLW link

What do we do about the Inevitable?

CSDD31 Jul 2025 10:22 UTC
−7 points
0 comments4 min readLW link

[Question] Sev­eral ques­tions about Zen koans

Said Achmiz31 Jul 2025 6:35 UTC
24 points
21 comments3 min readLW link

Beyond Han­gri­ness: A Deeper Frame­work for Emo­tional Clarity

jaredclucas30 Jul 2025 23:59 UTC
−7 points
0 comments5 min readLW link

LLMs Are Already Misal­igned: Sim­ple Ex­per­i­ments Prove It

Mackam30 Jul 2025 23:48 UTC
12 points
10 comments7 min readLW link

Repli­ca­tors—Pan­dora’s dan­ger­ous children

James Stephen Brown30 Jul 2025 22:39 UTC
19 points
2 comments3 min readLW link

Ex­plo­ra­tion hack­ing: can rea­son­ing mod­els sub­vert RL?

30 Jul 2025 22:02 UTC
16 points
4 comments9 min readLW link

Op­ti­miz­ing The Fi­nal Out­put Can Obfus­cate CoT (Re­search Note)

30 Jul 2025 21:26 UTC
196 points
22 comments6 min readLW link

A Timing Prob­lem for In­stru­men­tal Convergence

rhys southan30 Jul 2025 19:15 UTC
2 points
44 comments1 min readLW link
(link.springer.com)

Child­hood and Ed­u­ca­tion: Col­lege Admissions

Zvi30 Jul 2025 17:40 UTC
51 points
11 comments18 min readLW link
(thezvi.wordpress.com)

Ap­ply to SPAR Fall 2025—80+ pro­jects!

agucova30 Jul 2025 17:34 UTC
19 points
0 comments1 min readLW link

Di­men­sions of log­i­cal time as eco­nomic strategies

tayzzyronth30 Jul 2025 16:56 UTC
10 points
2 comments7 min readLW link

On Wireheading

Dave92F130 Jul 2025 16:26 UTC
9 points
4 comments3 min readLW link

Uncer­tain Up­dates: July 2025

Gordon Seidoh Worley30 Jul 2025 14:50 UTC
8 points
0 comments2 min readLW link
(uncertainupdates.substack.com)