Maybe the real issue is we don’t know what AGI will be like, so we can’t do science on it yet. Like pre-LLM alignment research, we’re pretty clueless.
(This is my position, FWIW. We can ~know some things, e.g. convergent instrumental goals are very likely to either pursued, or be obsoleted by some even more powerful plan. E.g. highly capable agents will hack into lots of computers to run themselves—or maybe manufacture new computer chips—or maybe invent some surprising way of doing lots of computation cheaply.)
I don’t think we know convergent instrumentality will happen, necessarily. If the human level AI understands that this is wrong AND genuinely cares (as it does now) and we augment its intelligence, it’s pretty likely that it won’t do it.
It could change its mind, sure, we’d like a bit more assurance than that.
I guess I’ll have to find a way to avoid twiddling thumbs till we do know what AGI will look like.
(This is my position, FWIW. We can ~know some things, e.g. convergent instrumental goals are very likely to either pursued, or be obsoleted by some even more powerful plan. E.g. highly capable agents will hack into lots of computers to run themselves—or maybe manufacture new computer chips—or maybe invent some surprising way of doing lots of computation cheaply.)
I don’t think we know convergent instrumentality will happen, necessarily. If the human level AI understands that this is wrong AND genuinely cares (as it does now) and we augment its intelligence, it’s pretty likely that it won’t do it.
It could change its mind, sure, we’d like a bit more assurance than that.
I guess I’ll have to find a way to avoid twiddling thumbs till we do know what AGI will look like.