I’ll be interested if you have any more specific ideas here. I can’t think of anything because:
The question of “How can an AGI self-modify into a safe and beneficial AGI?” seems pretty similar to “How can a person program a safe and beneficial AGI?”, at least until the system is so superhumanly advanced that it can hopefully figure out the answer itself. So in that sense, everyone is thinking about it all the time.
The challenges of safe self-modification don’t seem wildly different than the challenges of safe learning (after all, learning changes the agent too), including things like goal stability, ontological crises, etc. And whereas learning is basically mandatory, deeper self-modification could (probably IMO) be turned off if necessary, again at least until the system is so superhumanly advanced that it can solve the problem itself. So in that sense, at least some people are sorta thinking about it these days.
I dunno, I just can’t think of any experiment we could do with today’s AI in this domain that would discover or prove something that wasn’t already obvious. (...Which of course doesn’t mean that such experiments don’t exist.)
I’ll be interested if you have any more specific ideas here. I can’t think of anything because:
The question of “How can an AGI self-modify into a safe and beneficial AGI?” seems pretty similar to “How can a person program a safe and beneficial AGI?”, at least until the system is so superhumanly advanced that it can hopefully figure out the answer itself. So in that sense, everyone is thinking about it all the time.
The challenges of safe self-modification don’t seem wildly different than the challenges of safe learning (after all, learning changes the agent too), including things like goal stability, ontological crises, etc. And whereas learning is basically mandatory, deeper self-modification could (probably IMO) be turned off if necessary, again at least until the system is so superhumanly advanced that it can solve the problem itself. So in that sense, at least some people are sorta thinking about it these days.
I dunno, I just can’t think of any experiment we could do with today’s AI in this domain that would discover or prove something that wasn’t already obvious. (...Which of course doesn’t mean that such experiments don’t exist.)