RSS

carissacullen

Karma: 22

Towards Shut­down­able Agents: Gen­er­al­iz­ing Stochas­tic Choice in RL Agents and LLMs

3 Jun 2026 14:24 UTC
20 points
3 comments19 min readLW link
(arxiv.org)

De­tect­ing col­lu­sion through multi-agent interpretability

3 Apr 2026 9:17 UTC
15 points
1 comment6 min readLW link

De­tect­ing col­lu­sion through multi-agent interpretability

2 Apr 2026 22:20 UTC
2 points
0 comments5 min readLW link