Incorrect: OpenAI leadership is dismissive of existential risk from AI.
So the reason I think this is very high-level people have made claims like, “the orthogonality thesis is probably false”, and someone I know who talked to a very, very, very high-level person at OpenAI had to explain to them that inner alignment is a thing. If they actually cared, I would expect the leadership to have more familiarity with their critic’s arguments.
No one remembers now, but the founding rhetoric was also pretty bad, though walked back I suppose.
Also, I often see them claim their AI ethics work (train a model not to offend the average Berkeley humanities grad—possibly not useless, I suppose, but not exactly going to save our lightcone) is important alignment work. Obviously, what is going on inside is not legible to me, but what I see from the outside has mostly been disheartening. Their recent blog on alignment was an exception to this.
Though there are people with their priorities straight at OpenAI, I see little evidence that this is true of their leadership. I’m not confident an organization can be net beneficial when this is the case.
If we’re thinking about the same “very, very, very high-level person at OpenAI”, it does seem like this person now buys that inner alignment is a thing and is concerned about it (or says he’s concerned). It is scary because people at these AI labs don’t know all that much about AI alignment but also hopeful because they don’t seem to disagree with it and maybe just need to be given the arguments in a good way by someone they would listen to?
Also, I often see them claim their AI ethics work (train a model not to offend the average Berkeley humanities grad—possibly not useless, I suppose, but not exactly going to save our lightcone) is important alignment work.
Wait, you don’t think this (I mean the training, not the offending) is a safety problem in and of itself? (See also my previous comment about this.)
So the reason I think this is very high-level people have made claims like, “the orthogonality thesis is probably false”, and someone I know who talked to a very, very, very high-level person at OpenAI had to explain to them that inner alignment is a thing. If they actually cared, I would expect the leadership to have more familiarity with their critic’s arguments.
No one remembers now, but the founding rhetoric was also pretty bad, though walked back I suppose.
Also, I often see them claim their AI ethics work (train a model not to offend the average Berkeley humanities grad—possibly not useless, I suppose, but not exactly going to save our lightcone) is important alignment work. Obviously, what is going on inside is not legible to me, but what I see from the outside has mostly been disheartening. Their recent blog on alignment was an exception to this.
Though there are people with their priorities straight at OpenAI, I see little evidence that this is true of their leadership. I’m not confident an organization can be net beneficial when this is the case.
If we’re thinking about the same “very, very, very high-level person at OpenAI”, it does seem like this person now buys that inner alignment is a thing and is concerned about it (or says he’s concerned). It is scary because people at these AI labs don’t know all that much about AI alignment but also hopeful because they don’t seem to disagree with it and maybe just need to be given the arguments in a good way by someone they would listen to?
I suspect we are thinking about the same person and that is heartening that they changed their mind.
Wait, you don’t think this (I mean the training, not the offending) is a safety problem in and of itself? (See also my previous comment about this.)