RSS

james.lucassen

Karma: 727

jlucassen.com

Hid­den Role Games as a Trusted Model Eval

james.lucassen16 Mar 2026 4:46 UTC
13 points
0 comments9 min readLW link
(jlucassen.com)

BashArena: A Con­trol Set­ting for Highly Priv­ileged AI Agents

18 Dec 2025 18:19 UTC
58 points
0 comments15 min readLW link
(blog.redwoodresearch.org)

De­ci­sion The­ory Guard­ing is Suffi­cient for Scheming

james.lucassen9 Sep 2025 14:49 UTC
36 points
4 comments2 min readLW link

How Can You Tell if You’ve In­stil­led a False Belief in Your LLM?

james.lucassen6 Sep 2025 16:45 UTC
20 points
1 comment10 min readLW link
(jlucassen.com)

On Con­tact, Part 1

james.lucassen21 Jan 2025 3:10 UTC
14 points
1 comment11 min readLW link

Ret­ro­spec­tive: 12 [sic] Months Since MIRI

james.lucassen21 Jan 2025 2:52 UTC
68 points
0 comments9 min readLW link