RSS

Alexa Pan

Karma: 321

Alexa Pan’s Shortform

Alexa Pan22 Apr 2026 23:35 UTC
4 points
2 comments1 min readLW link

A tax­on­omy of bar­ri­ers to trad­ing with early mis­al­igned AIs

Alexa Pan21 Apr 2026 19:02 UTC
73 points
2 comments47 min readLW link

Will mis­al­igned AIs know that they’re mis­al­igned?

Alexa Pan4 Dec 2025 21:58 UTC
13 points
5 comments9 min readLW link

What would an IRB-like policy for AI ex­per­i­ments look like?

Alexa Pan24 Nov 2025 19:36 UTC
22 points
0 comments15 min readLW link

Son­net 4.5′s eval gam­ing se­ri­ously un­der­mines al­ign­ment evals, and this seems caused by train­ing on al­ign­ment evals

30 Oct 2025 15:34 UTC
144 points
22 comments14 min readLW link

AI Safety Newslet­ter #43: White House Is­sues First Na­tional Se­cu­rity Memo on AI Plus, AI and Job Dis­place­ment, and AI Takes Over the Nobels

28 Oct 2024 16:03 UTC
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI Safety Newslet­ter #42: New­som Ve­toes SB 1047 Plus, OpenAI’s o1, and AI Gover­nance Summary

1 Oct 2024 20:35 UTC
8 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI Safety Newslet­ter #40: Cal­ifor­nia AI Leg­is­la­tion Plus, NVIDIA De­lays Chip Pro­duc­tion, and Do AI Safety Bench­marks Ac­tu­ally Mea­sure Safety?

21 Aug 2024 18:09 UTC
11 points
0 comments6 min readLW link
(newsletter.safe.ai)