Yes, I think we are looking at “seeds of feasible ideas” at this stage, not at “ready to go” ideas...
I tried to look at what would it take for super-powerful AIs
not to destroy the fabric of their environment together with themselves and everything
to care about “interests, freedom, and well-being of all sentient beings”
That’s not too easy, but might be doable in a fashion invariant with respect to recursive self-modification (and might be more feasible than more traditional approaches to alignment).
Of course, the fact that we don’t know what’s sentient and what’s not sentient does not help, to say the least ;-) But perhaps we and/or AIs and/or our collaborations with AIs might figure this out sooner rather than later...
Yes, I think we are looking at “seeds of feasible ideas” at this stage, not at “ready to go” ideas...
I tried to look at what would it take for super-powerful AIs
not to destroy the fabric of their environment together with themselves and everything
to care about “interests, freedom, and well-being of all sentient beings”
That’s not too easy, but might be doable in a fashion invariant with respect to recursive self-modification (and might be more feasible than more traditional approaches to alignment).
Of course, the fact that we don’t know what’s sentient and what’s not sentient does not help, to say the least ;-) But perhaps we and/or AIs and/or our collaborations with AIs might figure this out sooner rather than later...
Anyway, I did scribble a short write-up on this direction of thinking a few months ago: Exploring non-anthropocentric aspects of AI existential safety