Simon Lermen

Karma: 2,039

Substack: https://substack.com/@simonlermen

X/Twitter: @SimonLermenAI

What if superintelligence is just weak?

Simon Lermen26 Mar 2026 17:45 UTC

30 points

25 comments2 min readLW link

(substack.com)

Large-Scale Online Deanonymization with LLMs

Simon Lermen and Daniel Paleka

24 Feb 2026 17:02 UTC

69 points

5 comments4 min readLW link

(simonlermen.substack.com)

AI can suddenly become dangerous despite gradual progress

Simon Lermen22 Jan 2026 16:47 UTC

15 points

0 comments4 min readLW link

(simonlermen.substack.com)

On Owning Galaxies

Simon Lermen6 Jan 2026 18:16 UTC

152 points

62 comments3 min readLW link

(simonlermen.substack.com)

Will We Get Alignment by Default? — with Adrià Garriga-Alonso

Simon Lermen and Adrià Garriga-alonso

27 Nov 2025 19:19 UTC

50 points

3 comments1 min readLW link

(simonlermen.substack.com)

Comment on Natural Emergent Misalignment Paper by Anthropic

Simon Lermen23 Nov 2025 4:21 UTC

21 points

0 comments4 min readLW link

Jailbreaking AI models to Phish Elderly Victims

Simon Lermen and Fred Heiding

18 Nov 2025 23:17 UTC

17 points

0 comments2 min readLW link

(simonlermen.substack.com)

AI 2025 - Last Shipmas

Simon Lermen17 Nov 2025 19:39 UTC

62 points

5 comments7 min readLW link

Universal Basic Income in an AGI Future

Simon Lermen11 Nov 2025 2:26 UTC

21 points

1 comment2 min readLW link

(simonlermen.substack.com)

Anthropic & Dario’s dream

Simon Lermen8 Nov 2025 1:19 UTC

55 points

1 comment5 min readLW link

Comparative advantage & AI

Simon Lermen3 Nov 2025 21:50 UTC

115 points

28 comments4 min readLW link

Model welfare and open source

Simon Lermen2 Nov 2025 2:29 UTC

15 points

1 comment5 min readLW link

Simon Lermen’s Shortform

Simon Lermen6 Oct 2025 15:04 UTC

5 points

52 comments1 min readLW link

Why I don’t believe Superalignment will work

Simon Lermen22 Sep 2025 17:10 UTC

47 points

6 comments5 min readLW link

Human study on AI spear phishing campaigns

Simon Lermen, Fred Heiding and Andrew Kao

3 Jan 2025 15:11 UTC

81 points

8 comments5 min readLW link

Current safety training techniques do not fully transfer to the agent setting

Simon Lermen and fidgetsinner

3 Nov 2024 19:24 UTC

162 points

9 comments5 min readLW link

Deceptive agents can collude to hide dangerous features in SAEs

Simon Lermen and Mateusz Dziemian

15 Jul 2024 17:07 UTC

33 points

2 comments7 min readLW link

Applying refusal-vector ablation to a Llama 3 70B agent

Simon Lermen11 May 2024 0:08 UTC

51 points

14 comments7 min readLW link

Creating unrestricted AI Agents with Command R+

Simon Lermen16 Apr 2024 14:52 UTC

77 points

13 comments5 min readLW link

unRLHF—Efficiently undoing LLM safeguards

Pranav Gade, Jeffrey Ladish and Simon Lermen

12 Oct 2023 19:58 UTC

117 points

15 comments20 min readLW link