James Sullivan

Karma: 58

I’m a software engineer that is interested in AI, futurism, space, and the big questions of life.

https://www.linkedin.com/in/jamessullivan092/

What Sentences Cause Alignment Faking?

James Sullivan28 Apr 2026 4:37 UTC

3 points

0 comments9 min readLW link

(open.substack.com)

Are we aligning the model or just its mask?

James Sullivan27 Mar 2026 2:10 UTC

12 points

0 comments10 min readLW link

(substack.com)

James Sullivan 21 Mar 2026 17:00 UTC
1 point
0
in reply to: jdp’s comment on: Terrified Comments on Corrigibility in Claude’s Constitution
And the trick is that there is no trick. The way “general intelligence” works is that you are a narrow intelligence with limited out-of-distribution generalization, and this is obscured from you by the fact that while you are asleep, your brain is rearranging itself to try to meet whatever challenges it thinks you’re going to face the next day.
Would we really say that a human is a “narrow intelligence” when trying any new task until they sleep on it? I think the only thing that would meet the definition of “general intelligence” that this implies is something that generalize to all situations, no matter how foreign. By that definition, I’m not sure if general intelligence is possible.

Playing Dumb: Detecting Sandbagging in Frontier LLMs via Consistency Checks

James Sullivan13 Jan 2026 19:28 UTC

11 points

0 comments5 min readLW link

Jailbreaking Claude 4 and Other Frontier Language Models

James Sullivan15 Jun 2025 0:31 UTC

1 point

0 comments3 min readLW link

(open.substack.com)

How do AI agents work together when they can’t trust each other?

James Sullivan6 Jun 2025 3:10 UTC

16 points

0 comments8 min readLW link

(jamessullivan092.substack.com)

James Sullivan 14 Apr 2025 16:35 UTC
2 points
0
on: MATS Spring 2024 Extension Retrospective
Of the people that wanted to go to a frontier lab, how many had a mentor that worked at a frontier lab? I assume that would make finding a role easier.

Developmental Stages in Multi-Problem Grokking

James Sullivan29 Sep 2024 18:58 UTC

4 points

0 comments6 min readLW link