RSS

Alice Blair

Karma: 1,075

Dumping out a lot of thoughts on LW in hopes that something sticks. Eternally upskilling.

I write the ML Safety Newsletter

DMs open, especially for promising opportunities in AI Safety and potential collaborators. I’m maybe interested in helping you optimize the communications of your new project.

The Weak­est Model in the Selector

Alice Blair29 Dec 2025 6:55 UTC
13 points
4 comments1 min readLW link

In Fa­vor of Inkhaven-But-Less

Alice Blair13 Dec 2025 23:16 UTC
26 points
6 comments2 min readLW link

Rea­sons to care about Ca­nary Strings

Alice Blair5 Dec 2025 21:41 UTC
27 points
3 comments2 min readLW link

Slack Observability

Alice Blair1 Dec 2025 7:52 UTC
32 points
0 comments2 min readLW link

Gem­ini 3 is Eval­u­a­tion-Para­noid and Contaminated

Alice Blair20 Nov 2025 21:02 UTC
173 points
42 comments7 min readLW link

MLSN #17: Mea­sur­ing Gen­eral AI Abil­ities and Miti­gat­ing Deception

19 Nov 2025 20:11 UTC
5 points
0 comments6 min readLW link
(newsletter.mlsafety.org)

In-Con­text Writ­ing with Son­net 4.5

Alice Blair17 Nov 2025 7:51 UTC
9 points
0 comments3 min readLW link

AISN #65: Mea­sur­ing Au­toma­tion and Su­per­in­tel­li­gence Mo­ra­to­rium Letter

29 Oct 2025 16:05 UTC
5 points
0 comments3 min readLW link
(newsletter.safe.ai)

Un­com­mon Utili­tar­i­anism #3: Bounded Utility Functions

Alice Blair27 Oct 2025 5:06 UTC
16 points
10 comments6 min readLW link

Un­com­mon Utili­tar­i­anism #2: Pos­i­tive Utilitarianism

Alice Blair20 Oct 2025 4:17 UTC
6 points
1 comment2 min readLW link

Sublin­ear Utility in Pop­u­la­tion and other Un­com­mon Utilitarianism

Alice Blair13 Oct 2025 6:19 UTC
68 points
15 comments7 min readLW link

Align­ment Fak­ing Demo for Con­gres­sional Staffers

Alice Blair6 Oct 2025 1:44 UTC
21 points
2 comments3 min readLW link

Ap­plied Mur­phyjitsu Meditation

Alice Blair29 Sep 2025 6:31 UTC
21 points
0 comments3 min readLW link

IABIED is on the NYT best­sel­ler list

Alice Blair25 Sep 2025 2:32 UTC
125 points
5 comments1 min readLW link

Warmth, Light, Flame

Alice Blair22 Sep 2025 4:19 UTC
39 points
0 comments4 min readLW link

Be­ing Handed Puzzles

Alice Blair8 Sep 2025 6:44 UTC
14 points
1 comment2 min readLW link

Hand­ing Peo­ple Puzzles

Alice Blair18 Aug 2025 6:27 UTC
28 points
2 comments3 min readLW link

Listen­ing Be­fore Speaking

Alice Blair11 Aug 2025 5:23 UTC
15 points
3 comments3 min readLW link

Notes from Dopamine Detoxing

Alice Blair20 May 2025 23:43 UTC
15 points
4 comments9 min readLW link

Mo­ral Obli­ga­tion and Mo­ral Opportunity

Alice Blair14 May 2025 16:42 UTC
37 points
7 comments3 min readLW link