Alice Blair

Karma: 1,123

Dumping out a lot of thoughts on LW in hopes that something sticks. Eternally upskilling.

DMs open, especially for promising opportunities in AI Safety and potential collaborators. I’m maybe interested in helping you optimize the communications of your new project.

ML Safety Newsletter #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking

Alice Blair and Dan H

28 Apr 2026 19:16 UTC

16 points

0 comments5 min readLW link

AISN #71: Cyberattacks & Datacenter Moratorium Bill

Alice Blair and Dan H

10 Apr 2026 14:18 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #70: Automated Warfare and AI Layoffs

Alice Blair, Laura Hiscott and Dan H

24 Mar 2026 15:30 UTC

8 points

0 comments4 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #69: Department of War, Anthropic, and National Security

Alice Blair, Laura Hiscott and Dan H

13 Mar 2026 16:05 UTC

10 points

0 comments4 min readLW link

(newsletter.safe.ai)

MLSN #19: Honesty, Disempowerment, & Cybersecurity

Alice Blair12 Mar 2026 15:42 UTC

6 points

0 comments5 min readLW link

(newsletter.mlsafety.org)

MLSN #18: Adversarial Diffusion, Activation Oracles, Weird Generalization

Alice Blair and Dan H

20 Jan 2026 17:03 UTC

14 points

3 comments5 min readLW link

The Weakest Model in the Selector

Alice Blair29 Dec 2025 6:55 UTC

13 points

6 comments1 min readLW link

In Favor of Inkhaven-But-Less

Alice Blair13 Dec 2025 23:16 UTC

26 points

6 comments2 min readLW link

Reasons to care about Canary Strings

Alice Blair5 Dec 2025 21:41 UTC

27 points

3 comments2 min readLW link

Slack Observability

Alice Blair1 Dec 2025 7:52 UTC

32 points

0 comments2 min readLW link

Gemini 3 is Evaluation-Paranoid and Contaminated

Alice Blair20 Nov 2025 21:02 UTC

181 points

42 comments7 min readLW link

MLSN #17: Measuring General AI Abilities and Mitigating Deception

Alice Blair and Dan H

19 Nov 2025 20:11 UTC

5 points

0 comments6 min readLW link

(newsletter.mlsafety.org)

In-Context Writing with Sonnet 4.5

Alice Blair17 Nov 2025 7:51 UTC

9 points

0 comments3 min readLW link

AISN #65: Measuring Automation and Superintelligence Moratorium Letter

Alice Blair and Dan H

29 Oct 2025 16:05 UTC

5 points

0 comments3 min readLW link

(newsletter.safe.ai)

Uncommon Utilitarianism #3: Bounded Utility Functions

Alice Blair27 Oct 2025 5:06 UTC

16 points

10 comments6 min readLW link

Uncommon Utilitarianism #2: Positive Utilitarianism

Alice Blair20 Oct 2025 4:17 UTC

6 points

1 comment2 min readLW link

Sublinear Utility in Population and other Uncommon Utilitarianism

Alice Blair13 Oct 2025 6:19 UTC

69 points

15 comments7 min readLW link

Alignment Faking Demo for Congressional Staffers

Alice Blair6 Oct 2025 1:44 UTC

21 points

2 comments3 min readLW link

Applied Murphyjitsu Meditation

Alice Blair29 Sep 2025 6:31 UTC

21 points

0 comments3 min readLW link

IABIED is on the NYT bestseller list

Alice Blair25 Sep 2025 2:32 UTC

125 points

5 comments1 min readLW link