A lot of my hope is definitely in the ‘we don’t find a way to build an AGI soon’ bucket.
My biggest hopes, at least for doom, lie in the fact that instrumental convergence both is too weak of an assumption, without other assumptions to be a good argument for doom, and the fact that unbounded instrumental convergence is actually useless for capabilities, compared to much more bounded instrumental convergence making alignment way easier (It’s still hard, but not nearly as hard as many doomers probably think).
Cf this post on how instrumental convergence is mostly wrong for predicting that AI doom will happen:
It is deeply troubling to see the question of extinction risk not even dismissed. It is ignored entirely.
I do like the discussion about releasing low-confidence findings with warnings attached, rather than censoring low-confidence results. You love to see it.
I’m going to be blunt and say that the attitude expressed in the first sentence is a good representative of an attitude I hate on LW: The idea that the scientific method, which generally includes empirical evidence is fundamentally irrelevant to a field is a very big problem that I see here, because as Richard Ngo in my view correctly said what the problem is with the attitude that the scientific method and empirical evidence are irrelevant to AI safety, and here goes:
Historically, the way that great scientists have gotten around this issue is by engaging very heavily with empirical data (like Darwin did) or else with strongly predictive theoretical frameworks (like Einstein did). Trying to do work which lacks either is a road with a lot of skulls on it. And that’s fine, this might be necessary, and so it’s good to have some people pushing in this direction, but it seems like a bunch of people around here don’t just ignore the skulls, they seem to lack any awareness that the absence of the key components by which scientific progress has basically ever been made is a red flag at all.
In particular, this is essentially why I’m so concerned about the epistemics of AI safety, especially on LW, because this dissing of empirical evidence/the scientific method is to put it bluntly, a good example of not realizing that basically all of our ability to know much of anything is based on that.
I really, really wish LWers aren’t nearly this hostile to admitting that the scientific method/empirical evidence mattered as Zvi is showing here.
EDIT: I retract most of this comment, with the exception of the first paragraph.
Not irrelevant, just insufficient. And that’s not a dig against the field, it’s true of every human endeavor that requires quick decision making based on incomplete data, or does not permit multiple attempts to succeed at something. Science has plenty of things to say before and after, about training for such tasks or understanding what happened and why. And that’s true here, too. There’s plenty science can tell us now, in advance, that we can use in AI and AI alignment research. But the problem is, aligning the first AGI or ASI may be a one-shot opportunity: succeed the first time or everyone dies. In that scenario, the way the scientific method is usually carried out (iterative hypothesis testing) is obviously insufficient, in the same way that such a method is insufficient for testing defenses against Earth-bound mile-wide asteroids.
I disagree with this, but note that this is a lot saner than the original response I was focusing on. The point I was trying to make was that it was either very surprising or indicated an example of very bad epistemics that science was not relevant to making us not dying, which is much stronger than your claim, and it would definitely need to be way better defended than this post did.
I disagree with your comment, but the claim you make is way less surprising than Zvi’s claim on science.
I don’t think it is much stronger, I think Zvi is shorthanding an idea that has been discussed many times and at much greater length on this site and elsewhere. The fact that scientists usually know which clusters of hypotheses are worth testing, long before our scientific institutions would consider them justified in claiming that anyone knows the answer, is already sufficiently strong evidence that “the scientific method,” as it is currently instantiated in the world, has much stricter standards of evidence than what epistemology fundamentally allows. Things like the replication crisis are similarly strong evidence that its standards are somewhat misaligned with epistemology, in that they can lead scientists astray for a long time before evidence builds up that forces it back on track.
The specific claim here is not “science as a whole and scientific reasoning are irrelevant.” It’s “If we rely on getting hard, reliable scientific evidence to align AGI, that will generally require many failed experiments and disproven hypotheses, because that’s how Science accumulates knowledge. But in a context where a single failed experiment can result in human extinction, that’s just not going to be a process that makes survival likely.” Which, we can disagree on the premise about whether we’re facing such a scenario, but I really don’t understand how to meaningfully disagree with the conclusion given the premise.
As an example: If the Manhattan Project physicists had been wrong in their calculations and the Trinity test had triggered a self-sustaining atmospheric nitrogen fission/fusion reaction, humanity would have gone extinct in seconds. This would have been the only experimental evidence in favor of the hypothesis, and it would have arrived too late to save humanity. In that case we were triply lucky: the physicists thought of the possibility, took it seriously enough to do the calculations, and were correct in their conclusion that it would not happen. Years later, they were wrong about similar (but more complicated) calculations on whether lithium-6 would contribute significantly to H-bomb yields, but thankfully this error was not the existential one.
Similarly, there were extremely strong reasons why we knew the LHC was not going to destroy the planet with tiny black holes or negative strangelets or whatever other nonsense was thrown around in popular media before it started up, and the scientists involved thought carefully about the possibilities anyway. But, the whole point of experiments is to look for the places where our models and predictions are wrong, and AI doesn’t have anywhere near enough of a theoretical basis to make the strong predictions that particle physics does.
I have sort of changed my mind on this, in that while I still disagree with Zvi and you, I now think that my response to Zvi was way too uncharitable, and as a consequence I’ll probably retract my first comment.
I disagree with the premise, and one of the assumptions used very often in EA/LW analyses isn’t enough to show it, without other assumptions, though
My biggest hopes, at least for doom, lie in the fact that instrumental convergence both is too weak of an assumption, without other assumptions to be a good argument for doom, and the fact that unbounded instrumental convergence is actually useless for capabilities, compared to much more bounded instrumental convergence making alignment way easier (It’s still hard, but not nearly as hard as many doomers probably think).
Cf this post on how instrumental convergence is mostly wrong for predicting that AI doom will happen:
https://www.lesswrong.com/posts/w8PNjCS8ZsQuqYWhD/instrumental-convergence-draft
But now, onto my main comment here:
I’m going to be blunt and say that the attitude expressed in the first sentence is a good representative of an attitude I hate on LW: The idea that the scientific method, which generally includes empirical evidence is fundamentally irrelevant to a field is a very big problem that I see here, because as Richard Ngo in my view correctly said what the problem is with the attitude that the scientific method and empirical evidence are irrelevant to AI safety, and here goes:
In particular, this is essentially why I’m so concerned about the epistemics of AI safety, especially on LW, because this dissing of empirical evidence/the scientific method is to put it bluntly, a good example of not realizing that basically all of our ability to know much of anything is based on that.
I really, really wish LWers aren’t nearly this hostile to admitting that the scientific method/empirical evidence mattered as Zvi is showing here.
EDIT: I retract most of this comment, with the exception of the first paragraph.
Not irrelevant, just insufficient. And that’s not a dig against the field, it’s true of every human endeavor that requires quick decision making based on incomplete data, or does not permit multiple attempts to succeed at something. Science has plenty of things to say before and after, about training for such tasks or understanding what happened and why. And that’s true here, too. There’s plenty science can tell us now, in advance, that we can use in AI and AI alignment research. But the problem is, aligning the first AGI or ASI may be a one-shot opportunity: succeed the first time or everyone dies. In that scenario, the way the scientific method is usually carried out (iterative hypothesis testing) is obviously insufficient, in the same way that such a method is insufficient for testing defenses against Earth-bound mile-wide asteroids.
I disagree with this, but note that this is a lot saner than the original response I was focusing on. The point I was trying to make was that it was either very surprising or indicated an example of very bad epistemics that science was not relevant to making us not dying, which is much stronger than your claim, and it would definitely need to be way better defended than this post did.
I disagree with your comment, but the claim you make is way less surprising than Zvi’s claim on science.
I don’t think it is much stronger, I think Zvi is shorthanding an idea that has been discussed many times and at much greater length on this site and elsewhere. The fact that scientists usually know which clusters of hypotheses are worth testing, long before our scientific institutions would consider them justified in claiming that anyone knows the answer, is already sufficiently strong evidence that “the scientific method,” as it is currently instantiated in the world, has much stricter standards of evidence than what epistemology fundamentally allows. Things like the replication crisis are similarly strong evidence that its standards are somewhat misaligned with epistemology, in that they can lead scientists astray for a long time before evidence builds up that forces it back on track.
The specific claim here is not “science as a whole and scientific reasoning are irrelevant.” It’s “If we rely on getting hard, reliable scientific evidence to align AGI, that will generally require many failed experiments and disproven hypotheses, because that’s how Science accumulates knowledge. But in a context where a single failed experiment can result in human extinction, that’s just not going to be a process that makes survival likely.” Which, we can disagree on the premise about whether we’re facing such a scenario, but I really don’t understand how to meaningfully disagree with the conclusion given the premise.
As an example: If the Manhattan Project physicists had been wrong in their calculations and the Trinity test had triggered a self-sustaining atmospheric nitrogen fission/fusion reaction, humanity would have gone extinct in seconds. This would have been the only experimental evidence in favor of the hypothesis, and it would have arrived too late to save humanity. In that case we were triply lucky: the physicists thought of the possibility, took it seriously enough to do the calculations, and were correct in their conclusion that it would not happen. Years later, they were wrong about similar (but more complicated) calculations on whether lithium-6 would contribute significantly to H-bomb yields, but thankfully this error was not the existential one.
Similarly, there were extremely strong reasons why we knew the LHC was not going to destroy the planet with tiny black holes or negative strangelets or whatever other nonsense was thrown around in popular media before it started up, and the scientists involved thought carefully about the possibilities anyway. But, the whole point of experiments is to look for the places where our models and predictions are wrong, and AI doesn’t have anywhere near enough of a theoretical basis to make the strong predictions that particle physics does.
I have sort of changed my mind on this, in that while I still disagree with Zvi and you, I now think that my response to Zvi was way too uncharitable, and as a consequence I’ll probably retract my first comment.
I disagree with the premise, and one of the assumptions used very often in EA/LW analyses isn’t enough to show it, without other assumptions, though
I might respond to the rest later on.
I look forward to reading it if you do!