The overall conversation is very interesting, brings plenty of new info about various AI-related technical aspects and such. Thanks for the link!
With AI safety, regardless of whether one agrees or disagrees with a decentralized approach, we certainly should do more work on the decentralized approaches (it seems that the bulk of AI safety work is currently dedicated to aligning a “single system”, and we really should “cover all bases”).
Lex and George don’t even start to discuss the crucial problem with decentralized approaches: that offense seems to be much more powerful than defense along a number of axes (in particular, it is highly likely that it’s much easier to cause irrepairable damage to the overall ecosystem or to a population of humans than to defend against such damage).
So, one needs some form of “weak semi-alignment” being universally adopted by all sufficiently strong AI systems: that a single AI entity always refrains from doing things that might cause a lot of damage without reaching something “between a strong super-majority and near-consensus of all AI systems”, and that the whole AI ecosystem is structured to enforce this (a distributed AI ecosystem needs such mechanisms anyway, because there are existential safety threats which put the whole AI ecosystem in danger, for example, cf. my write-up Exploring non-anthropocentric aspects of AI existential safety).
And then one might ponder how to build the level of AI existential safety and human flourishing which we need on top of such a “weak semi-alignment” mechanism (what makes that easier is that once one does have this kind of “weak semi-alignment”, the additional mechanisms don’t need to be universal; every sufficiently strong AI system should be “weakly semi-aligned” against dangerous unilateralism, but additional requirements needed for human flourishing must hold for the whole AI ecosystem, but it might be that they don’t need to be enforced for each “weakly semi-aligned system” as a separate entity).
I also share the intuition that defense is harder than offense.
However, hmmm… What if: E.g.: AGI#1 kills a person. “oh no! they died! irrepairable damage! we can’t bring them back!” Then AGI#2 just brings them back.
Indeed, when the state-of-the-art is capable of fixing a particular kind of damage, then that damage does not need to be covered by “weak semi-alignment” any longer (and might just become a part of routine “economics’ of the AI ecosystem, if some “property rights” or “costs” are involved)...
So, yes, various taboos can gradually get repealed as technology progresses to be able to undo the consequences of the violations of those taboos...
So basically are we mainly concerned about the short transitionary period where it’s possible to actually lose things/people/etc. permanently? (And where we also don’t have the information to reconstruct)
At least, there seems to be growing understanding that “safety of the transition period” and “safety in the limit” are very different, and that “the transition period” is particularly uncertain and difficult to understand and might be the period of particularly high vulnerability.
This is why many people are arguing for shorter timelines (in addition to an argument that shorter timelines make a more smooth take-off more likely)...
And it does seem that we have entered the early part of “the transition period” already, and that various technology-related risks have been gradually increasing lately (in addition to losing things/people/etc. on the daily basis at “the normal rate” anyway).
The bulk of “AI existential safety research” seems to be mostly about “safety in the limit”. And, in particular, what I have been doing in relation to this has been mostly about “safety in the limit” (when there is way more cognitive power in our world compared to the present state).
But it would be good to see more studies trying to model the dynamics of “the transition period”, if that’s at all possible...
The transcript is posted.
The overall conversation is very interesting, brings plenty of new info about various AI-related technical aspects and such. Thanks for the link!
With AI safety, regardless of whether one agrees or disagrees with a decentralized approach, we certainly should do more work on the decentralized approaches (it seems that the bulk of AI safety work is currently dedicated to aligning a “single system”, and we really should “cover all bases”).
Lex and George don’t even start to discuss the crucial problem with decentralized approaches: that offense seems to be much more powerful than defense along a number of axes (in particular, it is highly likely that it’s much easier to cause irrepairable damage to the overall ecosystem or to a population of humans than to defend against such damage).
So, one needs some form of “weak semi-alignment” being universally adopted by all sufficiently strong AI systems: that a single AI entity always refrains from doing things that might cause a lot of damage without reaching something “between a strong super-majority and near-consensus of all AI systems”, and that the whole AI ecosystem is structured to enforce this (a distributed AI ecosystem needs such mechanisms anyway, because there are existential safety threats which put the whole AI ecosystem in danger, for example, cf. my write-up Exploring non-anthropocentric aspects of AI existential safety).
And then one might ponder how to build the level of AI existential safety and human flourishing which we need on top of such a “weak semi-alignment” mechanism (what makes that easier is that once one does have this kind of “weak semi-alignment”, the additional mechanisms don’t need to be universal; every sufficiently strong AI system should be “weakly semi-aligned” against dangerous unilateralism, but additional requirements needed for human flourishing must hold for the whole AI ecosystem, but it might be that they don’t need to be enforced for each “weakly semi-aligned system” as a separate entity).
I also share the intuition that defense is harder than offense.
However, hmmm… What if: E.g.: AGI#1 kills a person. “oh no! they died! irrepairable damage! we can’t bring them back!” Then AGI#2 just brings them back.
Right!
Indeed, when the state-of-the-art is capable of fixing a particular kind of damage, then that damage does not need to be covered by “weak semi-alignment” any longer (and might just become a part of routine “economics’ of the AI ecosystem, if some “property rights” or “costs” are involved)...
So, yes, various taboos can gradually get repealed as technology progresses to be able to undo the consequences of the violations of those taboos...
So basically are we mainly concerned about the short transitionary period where it’s possible to actually lose things/people/etc. permanently? (And where we also don’t have the information to reconstruct)
At least, there seems to be growing understanding that “safety of the transition period” and “safety in the limit” are very different, and that “the transition period” is particularly uncertain and difficult to understand and might be the period of particularly high vulnerability.
This is why many people are arguing for shorter timelines (in addition to an argument that shorter timelines make a more smooth take-off more likely)...
And it does seem that we have entered the early part of “the transition period” already, and that various technology-related risks have been gradually increasing lately (in addition to losing things/people/etc. on the daily basis at “the normal rate” anyway).
The bulk of “AI existential safety research” seems to be mostly about “safety in the limit”. And, in particular, what I have been doing in relation to this has been mostly about “safety in the limit” (when there is way more cognitive power in our world compared to the present state).
But it would be good to see more studies trying to model the dynamics of “the transition period”, if that’s at all possible...