RSS

schroederdewitt

Karma: 112

De­tect­ing col­lu­sion through multi-agent interpretability

3 Apr 2026 9:17 UTC
15 points
1 comment6 min readLW link

De­tect­ing col­lu­sion through multi-agent interpretability

2 Apr 2026 22:20 UTC
2 points
0 comments5 min readLW link

Se­cret Col­lu­sion: Will We Know When to Un­plug AI?

16 Sep 2024 16:07 UTC
66 points
8 comments31 min readLW link

Take SCIFs, it’s dan­ger­ous to go alone

1 May 2024 8:02 UTC
43 points
1 comment3 min readLW link