Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
carissacullen
Karma:
22
All
Posts
Comments
New
Top
Old
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
Elliott Thornley (EJT)
,
carissacullen
,
christosi
,
alexr
,
LAThomson
and
Harry Garland
3 Jun 2026 14:24 UTC
20
points
3
comments
19
min read
LW
link
(arxiv.org)
Detecting collusion through multi-agent interpretability
schroederdewitt
,
aaronrose227
and
carissacullen
3 Apr 2026 9:17 UTC
15
points
1
comment
6
min read
LW
link
Detecting collusion through multi-agent interpretability
schroederdewitt
,
aaronrose227
and
carissacullen
2 Apr 2026 22:20 UTC
2
points
0
comments
5
min read
LW
link
Back to top