By such rationalizations, Klurl, you can excuse any possible example I try to bring you, to show you that by default Reality is a safe, comfortable, unchanging, unsurprising, and above all normal place! You will just say some sort of ‘filter’ is involved! Well, my position is just that, by one means or another, the fleshlings will no doubt be subjected to some similar filter
So this bit turned out to actually be a valid argument for the situation being safe. Their reality did have a track record of not being blown up by new intelligences, and there was a systematic reason for that which saved them from the fleshlings too. (Though it failed as an argument for why the fleshlings would “end up with emotions that mechanical life would find normal and unsurprising.”)
Not super reassuring for our own future though. Our reality doesn’t seem systematically safe/comfortable/unchanging/unsurprising to me.
The most analogous argument that applies to us would be: Bad events are very often prevented by humans being moderately competent and successfully trying to prevent bad events.
Which is indeed a great reason to be more optimistic about the situation than if that wasn’t true. Indeed, I expect humans to put in many, many orders of magnitude more effort on alignment (and alignment evaluation) than Klurl and Trapaucius did in the story. Still unclear if it’ll be sufficient.
Bad events are very often prevented by humans being moderately competent and successfully trying to prevent bad events.
...but I think the track record for this is pretty amazingly dismal, in practice? We are arguably more at risk from Pandemics today than we were in 2019, despite the clear warning. And even more narrowly, as a species, we’re spending many orders of magnitude more money on AI capabilities than we are on AI alignment, and that seems tragically very unlikely to change.
Certainly the track record is disappointing compared to what’s possible, and what seems like it ought to be reasonable. And the track record shows that even pretty obvious mistakes are common. And I imagine that success probability falls off worryingly quickly as success requires more foresight and allows for less trial and error. (Fwiw, I think all this is compatible with “humans trying to prevent bad events very often prevents bad events”, when quantifying over a very broad range of possible events.)
The counterpoint to that is that as the scale of humanity’s power grows, so does the scale of those bad events. Many bad events were not in fact prevented. Wars were lost, famines happened, empires fell. But none of those were world-ending bad events because we simply lacked the ability to do anything that big; even our mistakes couldn’t possibly be big enough. And that’s changed.
So this bit turned out to actually be a valid argument for the situation being safe. Their reality did have a track record of not being blown up by new intelligences, and there was a systematic reason for that which saved them from the fleshlings too. (Though it failed as an argument for why the fleshlings would “end up with emotions that mechanical life would find normal and unsurprising.”)
Not super reassuring for our own future though. Our reality doesn’t seem systematically safe/comfortable/unchanging/unsurprising to me.
The most analogous argument that applies to us would be: Bad events are very often prevented by humans being moderately competent and successfully trying to prevent bad events.
Which is indeed a great reason to be more optimistic about the situation than if that wasn’t true. Indeed, I expect humans to put in many, many orders of magnitude more effort on alignment (and alignment evaluation) than Klurl and Trapaucius did in the story. Still unclear if it’ll be sufficient.
...but I think the track record for this is pretty amazingly dismal, in practice? We are arguably more at risk from Pandemics today than we were in 2019, despite the clear warning. And even more narrowly, as a species, we’re spending many orders of magnitude more money on AI capabilities than we are on AI alignment, and that seems tragically very unlikely to change.
Certainly the track record is disappointing compared to what’s possible, and what seems like it ought to be reasonable. And the track record shows that even pretty obvious mistakes are common. And I imagine that success probability falls off worryingly quickly as success requires more foresight and allows for less trial and error. (Fwiw, I think all this is compatible with “humans trying to prevent bad events very often prevents bad events”, when quantifying over a very broad range of possible events.)
The counterpoint to that is that as the scale of humanity’s power grows, so does the scale of those bad events. Many bad events were not in fact prevented. Wars were lost, famines happened, empires fell. But none of those were world-ending bad events because we simply lacked the ability to do anything that big; even our mistakes couldn’t possibly be big enough. And that’s changed.