I have trouble imagining how you would get the Game Tree of Alignment exercise started. I would guess you’d take the common alignment approaches from Engineering Alignment under the AI Safety tag and then the next level would be why each of these don’t work and the next level how to patch them?
I have trouble imagining how you would get the Game Tree of Alignment exercise started. I would guess you’d take the common alignment approaches from Engineering Alignment under the AI Safety tag and then the next level would be why each of these don’t work and the next level how to patch them?
Sure, that’s a great place to start!