carissacullen

Karma: 22

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

Elliott Thornley (EJT), carissacullen, christosi, alexr, LAThomson and Harry Garland

3 Jun 2026 14:24 UTC

20 points

3 comments19 min readLW link

(arxiv.org)

Detecting collusion through multi-agent interpretability

schroederdewitt, aaronrose227 and carissacullen

3 Apr 2026 9:17 UTC

15 points

1 comment6 min readLW link

Detecting collusion through multi-agent interpretability

schroederdewitt, aaronrose227 and carissacullen

2 Apr 2026 22:20 UTC

2 points

0 comments5 min readLW link