RSS

Kei

Karma: 423

Early Signs of Stegano­graphic Ca­pa­bil­ities in Fron­tier LLMs

4 Jul 2025 16:36 UTC
30 points
5 comments2 min readLW link

Re­ward hack­ing is be­com­ing more so­phis­ti­cated and de­liber­ate in fron­tier LLMs

Kei24 Apr 2025 16:03 UTC
95 points
6 comments1 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
141 points
15 comments13 min readLW link

Kei’s Shortform

Kei27 Jan 2025 7:23 UTC
3 points
5 commentsLW link