I don’t think it is much stronger, I think Zvi is shorthanding an idea that has been discussed many times and at much greater length on this site and elsewhere. The fact that scientists usually know which clusters of hypotheses are worth testing, long before our scientific institutions would consider them justified in claiming that anyone knows the answer, is already sufficiently strong evidence that “the scientific method,” as it is currently instantiated in the world, has much stricter standards of evidence than what epistemology fundamentally allows. Things like the replication crisis are similarly strong evidence that its standards are somewhat misaligned with epistemology, in that they can lead scientists astray for a long time before evidence builds up that forces it back on track.
The specific claim here is not “science as a whole and scientific reasoning are irrelevant.” It’s “If we rely on getting hard, reliable scientific evidence to align AGI, that will generally require many failed experiments and disproven hypotheses, because that’s how Science accumulates knowledge. But in a context where a single failed experiment can result in human extinction, that’s just not going to be a process that makes survival likely.” Which, we can disagree on the premise about whether we’re facing such a scenario, but I really don’t understand how to meaningfully disagree with the conclusion given the premise.
As an example: If the Manhattan Project physicists had been wrong in their calculations and the Trinity test had triggered a self-sustaining atmospheric nitrogen fission/fusion reaction, humanity would have gone extinct in seconds. This would have been the only experimental evidence in favor of the hypothesis, and it would have arrived too late to save humanity. In that case we were triply lucky: the physicists thought of the possibility, took it seriously enough to do the calculations, and were correct in their conclusion that it would not happen. Years later, they were wrong about similar (but more complicated) calculations on whether lithium-6 would contribute significantly to H-bomb yields, but thankfully this error was not the existential one.
Similarly, there were extremely strong reasons why we knew the LHC was not going to destroy the planet with tiny black holes or negative strangelets or whatever other nonsense was thrown around in popular media before it started up, and the scientists involved thought carefully about the possibilities anyway. But, the whole point of experiments is to look for the places where our models and predictions are wrong, and AI doesn’t have anywhere near enough of a theoretical basis to make the strong predictions that particle physics does.
I have sort of changed my mind on this, in that while I still disagree with Zvi and you, I now think that my response to Zvi was way too uncharitable, and as a consequence I’ll probably retract my first comment.
I disagree with the premise, and one of the assumptions used very often in EA/LW analyses isn’t enough to show it, without other assumptions, though
I don’t think it is much stronger, I think Zvi is shorthanding an idea that has been discussed many times and at much greater length on this site and elsewhere. The fact that scientists usually know which clusters of hypotheses are worth testing, long before our scientific institutions would consider them justified in claiming that anyone knows the answer, is already sufficiently strong evidence that “the scientific method,” as it is currently instantiated in the world, has much stricter standards of evidence than what epistemology fundamentally allows. Things like the replication crisis are similarly strong evidence that its standards are somewhat misaligned with epistemology, in that they can lead scientists astray for a long time before evidence builds up that forces it back on track.
The specific claim here is not “science as a whole and scientific reasoning are irrelevant.” It’s “If we rely on getting hard, reliable scientific evidence to align AGI, that will generally require many failed experiments and disproven hypotheses, because that’s how Science accumulates knowledge. But in a context where a single failed experiment can result in human extinction, that’s just not going to be a process that makes survival likely.” Which, we can disagree on the premise about whether we’re facing such a scenario, but I really don’t understand how to meaningfully disagree with the conclusion given the premise.
As an example: If the Manhattan Project physicists had been wrong in their calculations and the Trinity test had triggered a self-sustaining atmospheric nitrogen fission/fusion reaction, humanity would have gone extinct in seconds. This would have been the only experimental evidence in favor of the hypothesis, and it would have arrived too late to save humanity. In that case we were triply lucky: the physicists thought of the possibility, took it seriously enough to do the calculations, and were correct in their conclusion that it would not happen. Years later, they were wrong about similar (but more complicated) calculations on whether lithium-6 would contribute significantly to H-bomb yields, but thankfully this error was not the existential one.
Similarly, there were extremely strong reasons why we knew the LHC was not going to destroy the planet with tiny black holes or negative strangelets or whatever other nonsense was thrown around in popular media before it started up, and the scientists involved thought carefully about the possibilities anyway. But, the whole point of experiments is to look for the places where our models and predictions are wrong, and AI doesn’t have anywhere near enough of a theoretical basis to make the strong predictions that particle physics does.
I have sort of changed my mind on this, in that while I still disagree with Zvi and you, I now think that my response to Zvi was way too uncharitable, and as a consequence I’ll probably retract my first comment.
I disagree with the premise, and one of the assumptions used very often in EA/LW analyses isn’t enough to show it, without other assumptions, though
I might respond to the rest later on.
I look forward to reading it if you do!