Simon Lermen comments on The Compendium, A full argument about extinction risk from AGI

Simon Lermen 3 Nov 2024 19:46 UTC
2 points
0
I had finishing this up on my to-do list for a while. I just made a full length post on it.

https://www.lesswrong.com/posts/ZoFxTqWRBkyanonyb/current-safety-training-techniques-do-not-fully-transfer-to

I think it’s fair to say that some smarter models do better at this, however, it’s still worrisome that there is a gap. Also attacks continue to transfer.