RSS

Collin

Karma: 472

http://​​collinpburns.com/​​

What AI Safety Ma­te­ri­als Do ML Re­searchers Find Com­pel­ling?

28 Dec 2022 2:03 UTC
175 points
34 comments2 min readLW link

How “Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Su­per­vi­sion” Fits Into a Broader Align­ment Scheme

Collin15 Dec 2022 18:22 UTC
243 points
39 comments16 min readLW link1 review