I note that to my eyes, you appear to be straightforwardly accepting the need-to-generalize claim, and arguing for ability-to-generalize. Putting words in your mouth a little, what I see you saying is that, by the time we have a true loss-of-control-can-be-catastrophic moment where failure kills boazbarak, we have had enough failure recoveries on highly similar systems to be sure deadly-failure probability is indistinguishable from zero, that maximum-likely-failure-consequence is shrinking as fast or faster than model capability.
But current approaches don’t seem to me to zero out the rate of failures above a certain level of catastrophicness. They’re best seen as continuous in probability, not continuous in failure size.
I am not sure I 100% understand what you are saying. Again, like I wrote elsewhere, it is possible that for one reason or another rather than systems becoming safer and more controlled, they will become less safe and riskier over time. It is possible we will have a sequence of failures growing in magnitude over time, but for one reason or another do not address them, and hence since end up in a very large scale catastrophe.
It is possible that current approaches are not good enough and will not improve fast enough to match the stakes at which we want to deploy AI. If that is the case then it will end badly, but I believe that we will see many bad outcomes well before an extinction event. To put it crudely, I would expect that if we are on a path to that ending, the magnitude of harms that will be caused by AI will climb on an exponential scale over time similar to how other capabilities are growing.
I note that to my eyes, you appear to be straightforwardly accepting the need-to-generalize claim, and arguing for ability-to-generalize. Putting words in your mouth a little, what I see you saying is that, by the time we have a true loss-of-control-can-be-catastrophic moment where failure kills boazbarak, we have had enough failure recoveries on highly similar systems to be sure deadly-failure probability is indistinguishable from zero, that maximum-likely-failure-consequence is shrinking as fast or faster than model capability.
But current approaches don’t seem to me to zero out the rate of failures above a certain level of catastrophicness. They’re best seen as continuous in probability, not continuous in failure size.
I am not sure I 100% understand what you are saying. Again, like I wrote elsewhere, it is possible that for one reason or another rather than systems becoming safer and more controlled, they will become less safe and riskier over time. It is possible we will have a sequence of failures growing in magnitude over time, but for one reason or another do not address them, and hence since end up in a very large scale catastrophe.
It is possible that current approaches are not good enough and will not improve fast enough to match the stakes at which we want to deploy AI. If that is the case then it will end badly, but I believe that we will see many bad outcomes well before an extinction event. To put it crudely, I would expect that if we are on a path to that ending, the magnitude of harms that will be caused by AI will climb on an exponential scale over time similar to how other capabilities are growing.