Re: 1, during my time at OpenAI I also strongly got the impression that inner alignment was way underinvested. The Alignment team’s agenda seemed basically about better values/behavior specification IMO, not making the model want those things on the inside (though this is now 7 months out of date). (Also, there are at least a few folks within OAI I’m sure know and care about these issues)
Re: 1, during my time at OpenAI I also strongly got the impression that inner alignment was way underinvested. The Alignment team’s agenda seemed basically about better values/behavior specification IMO, not making the model want those things on the inside (though this is now 7 months out of date). (Also, there are at least a few folks within OAI I’m sure know and care about these issues)