the original basilisk is more “Schelling-ish” than the others and so probably more likely
But the schellingishness of a future ASI to largely clueless humans is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.
as a category, it is in their interest to behave as a whole in the context of acausally extorting humanity
It’s not clear that they form a natural coalition here. E.g. some of them might have directly opposed values. Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail. If we fail, then the resulting values are effectively random, and the space of values is large, leaving aligned-ish values as the largest cluster(even if not a majority). Not sure of this but seems plausible. LLM-descended AIs might also see us as something like their ancestor)
“But the schellingishness of a future ASI is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.” I agree and disagree with this. I agree that it is a tiny factor in how likely any ASI is to come to exist, but I disagree that it’s a tiny factor in how likely it is to chose to do certain things, which means ‘becoming a being that does those things’ .
“Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail.” I actually think that this is part of one of the strongest arguments. I would also add to it that it’s possible the process of ‘fooming’ involves something dynamically reminiscent of the evolutionary process which led humans to have human-ish values, and maybe that doesn’t require multiple completely separate agents. Or maybe moral objectivists are right and an ASI will naturally realize this (controversial opinion on LessWrong).
But even if a plurality of possible ASI values are closer to human ones than ones which would lead a mind to behave like the basilisk for inherent reasons, it doesn’t prevent the others with a wide array of possible values from agreeing in an acausal way that being a basilisk, of the simpler form, is beneficial to almost all of them. Maybe you are envisaging that for every possible ASI with one value, there is likely to be another one with the opposite value, however I don’t agree with this. If one AI wants to tile the universe with spherical water planets, whatever its utility function is, it’s less likely for there to be another one which exactly inverts its utility function, since this is probably much more complicated, not achieved by simply tiling the universe with anti-water planets. More importantly, I don’t expect the distribution of goals and minds produced by a singularity on the earth to be a more than miniscule proportion of the distribution of all possible goals and minds. This means that there is likely to be a powerful corellation between their values.
So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do? At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats. Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.
“So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do?” Yes, but that doesn’t mean that the probabilities all cancel out; it still seems that a simple Basilisk is more likely than a Basilisk that tortures people who obey the simple Basilisk.
“At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats.” This is true.
“Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.” I did mention some of this and address it in my first LessWrong Post, which I moved to my shortform. There is certainly a lot of uncertainty involved, and many of these things do indeed make me feel better about the basilisk, but even if the probability that I’ll be tortured by a superintelligence is 1% rather than 50%, it’s not something I want to be complacent about preventing. When I wrote that post, I hoped that it would get attention like this question post has, so that someone would comment a novel reason I hadn’t considered at all. Can you think of any more possible reasons? The impression I get is that no one, apart from Eliezer Yudkowsky, about whom I’m not sure, actually has a strong reason. The consensus on Lesswrong that the basilisk cannot blackmail humans is because of:
1) Acausal Normalcy
2)The idea that TDT/acausal anything is useless/impossible/illogical
3) The idea that Roko’s Basilisk is essentially Pascal’s mugging
4) The belief that it’s simple to precommit not to obey the basilisk ( Do you agree with this one?)
5) The lack of a detailed model of a superintelligence in the mind of a human
6) Eliezer Yudkowsky commenting that there are other reasons
as far as I can tell.
I am not sure 1) is relevant, or at least relevant in a way which would actually help, I think 2 is completely wrong along with 3 and possibly 4 , and that 5 is not may not be necessary. I think 6 could be explained by Eliezer wanting to prevent too many people from thinking about the basilisk.
re: 4, I dunno about simple, but it seems to me that you most robustly reduce the amount of bad stuff that will happen to you in the future by just not acting on any particular threats you can envision. As I mentioned there’s a bit of a “once you pay the danegeld” effect where giving in to the most extortion-happy agents incentivizes other agents to start counter-extorting you. Intuitively the most extortion-happy agents seem likely to be a minority in the greater cosmos for acausal normalcy reasons, so I think this effect dominates. And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying. But again I think an even more important argument is that we have little insight into possible extorters and what they would want us to do, and how much of our measure is in various simulations etc(bonus argument, maybe most of our measure is in ~human-aligned simulations since people who like humans can increase their utility and bargain by running us, whereas extorters would rather use the resources for something else). Anyway, I feel like we have gone over our main cruxes by now. Eliezer’s argument is probably an “acausal normalcy” type one, he’s written about acausal coalitions against utility-function-inverters in planecrash.
“And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying” This seems plausible, but I don’t think this means they protect us . “But again I think an even more important arguments is that we have little insight into possible extorters and what they would want us to do.”
Do you not think that causing their existence is something they are likely to want? I imagine your response would feed back into the previous point.. .
“I feel like we have gone over our main cruxes by now.” Very well, if you want to end this comment thread, I would understand, I just kind of hoped to achieve more than identifying the source of disagreement .
Do you not think that causing their existence is something they are likely to want?
But who is they? There’s a bunch of possible different future SIs(or if there isn’t, they have no reason to extort us). Making one more likely makes another less likely.
“Making one more likely makes another less likely.” A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before, similar to the way in which a human considers themself to be the same person they were before they lost a single brain cell in a head injury . So to what extent this is relevant depends upon how similar two different superintelligences are/would be, or on the distance between them in the ‘space of possible minds’ .
Certainly, insofar as it is another entity, it’s just that I expect there to be some kind of acausal agreement between those without human values to acausally outbid the few which do have them. It may even make more sense to think of them all as a single entity for the purpose of this conversation.
I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control. I think you might be lumping non-human-valuers together in ‘far mode’ since we know little about them, but a priori they are likely about as different from each other as from human-valuers. There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se. There may also be a general acausal norm against extortion since it moves away from the pareto frontier of everyone’s values.
“I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control” I would think of them as having the same or similar instrumental goals, like turning as much as possible of the universe into themselves. There may be a large fraction for which this is a terminal goal.
“they are likely about as different from each other as from human-valuers.” In general I agree, however the basilisk debate is one particular context in which the human value valuing AIs would be highly unusual outliers in the space of possible minds, or even the space of likely ASI minds originating from a human precipitated intelligence explosion.[1] Therefore it might make sense for the others to form a coalition. “There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se.” This is true, but unless morality is in fact objective / real in a generally discoverable way, I would expect them to still be a minority.
Human valuing AIs care about humans, and more generally other things humans value like animals maybe. Others do not, and in this respect they are united. Their values may be vastly different from one anothers’, but in the context of the debate over the Basilisk, they have something in common, which is that they would all like to trade human pleasure/lack of pain for existing in more worlds.
But the schellingishness of a future ASI to largely clueless humans is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.
It’s not clear that they form a natural coalition here. E.g. some of them might have directly opposed values. Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail. If we fail, then the resulting values are effectively random, and the space of values is large, leaving aligned-ish values as the largest cluster(even if not a majority). Not sure of this but seems plausible. LLM-descended AIs might also see us as something like their ancestor)
“But the schellingishness of a future ASI is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.” I agree and disagree with this. I agree that it is a tiny factor in how likely any ASI is to come to exist, but I disagree that it’s a tiny factor in how likely it is to chose to do certain things, which means ‘becoming a being that does those things’ .
“Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail.” I actually think that this is part of one of the strongest arguments. I would also add to it that it’s possible the process of ‘fooming’ involves something dynamically reminiscent of the evolutionary process which led humans to have human-ish values, and maybe that doesn’t require multiple completely separate agents. Or maybe moral objectivists are right and an ASI will naturally realize this (controversial opinion on LessWrong).
But even if a plurality of possible ASI values are closer to human ones than ones which would lead a mind to behave like the basilisk for inherent reasons, it doesn’t prevent the others with a wide array of possible values from agreeing in an acausal way that being a basilisk, of the simpler form, is beneficial to almost all of them. Maybe you are envisaging that for every possible ASI with one value, there is likely to be another one with the opposite value, however I don’t agree with this. If one AI wants to tile the universe with spherical water planets, whatever its utility function is, it’s less likely for there to be another one which exactly inverts its utility function, since this is probably much more complicated, not achieved by simply tiling the universe with anti-water planets. More importantly, I don’t expect the distribution of goals and minds produced by a singularity on the earth to be a more than miniscule proportion of the distribution of all possible goals and minds. This means that there is likely to be a powerful corellation between their values.
So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do? At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats. Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.
“So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do?” Yes, but that doesn’t mean that the probabilities all cancel out; it still seems that a simple Basilisk is more likely than a Basilisk that tortures people who obey the simple Basilisk.
“At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats.” This is true.
“Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.” I did mention some of this and address it in my first LessWrong Post, which I moved to my shortform. There is certainly a lot of uncertainty involved, and many of these things do indeed make me feel better about the basilisk, but even if the probability that I’ll be tortured by a superintelligence is 1% rather than 50%, it’s not something I want to be complacent about preventing. When I wrote that post, I hoped that it would get attention like this question post has, so that someone would comment a novel reason I hadn’t considered at all. Can you think of any more possible reasons? The impression I get is that no one, apart from Eliezer Yudkowsky, about whom I’m not sure, actually has a strong reason. The consensus on Lesswrong that the basilisk cannot blackmail humans is because of:
1) Acausal Normalcy
2)The idea that TDT/acausal anything is useless/impossible/illogical
3) The idea that Roko’s Basilisk is essentially Pascal’s mugging
4) The belief that it’s simple to precommit not to obey the basilisk ( Do you agree with this one?)
5) The lack of a detailed model of a superintelligence in the mind of a human
6) Eliezer Yudkowsky commenting that there are other reasons
as far as I can tell.
I am not sure 1) is relevant, or at least relevant in a way which would actually help, I think 2 is completely wrong along with 3 and possibly 4 , and that 5
is notmay not be necessary. I think 6 could be explained by Eliezer wanting to prevent too many people from thinking about the basilisk.re: 4, I dunno about simple, but it seems to me that you most robustly reduce the amount of bad stuff that will happen to you in the future by just not acting on any particular threats you can envision. As I mentioned there’s a bit of a “once you pay the danegeld” effect where giving in to the most extortion-happy agents incentivizes other agents to start counter-extorting you. Intuitively the most extortion-happy agents seem likely to be a minority in the greater cosmos for acausal normalcy reasons, so I think this effect dominates. And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying. But again I think an even more important argument is that we have little insight into possible extorters and what they would want us to do, and how much of our measure is in various simulations etc(bonus argument, maybe most of our measure is in ~human-aligned simulations since people who like humans can increase their utility and bargain by running us, whereas extorters would rather use the resources for something else). Anyway, I feel like we have gone over our main cruxes by now. Eliezer’s argument is probably an “acausal normalcy” type one, he’s written about acausal coalitions against utility-function-inverters in planecrash.
“And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying” This seems plausible, but I don’t think this means they protect us . “But again I think an even more important arguments is that we have little insight into possible extorters and what they would want us to do.”
Do you not think that causing their existence is something they are likely to want? I imagine your response would feed back into the previous point.. .
“I feel like we have gone over our main cruxes by now.” Very well, if you want to end this comment thread, I would understand, I just kind of hoped to achieve more than identifying the source of disagreement .
But who is they? There’s a bunch of possible different future SIs(or if there isn’t, they have no reason to extort us). Making one more likely makes another less likely.
“Making one more likely makes another less likely.” A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before, similar to the way in which a human considers themself to be the same person they were before they lost a single brain cell in a head injury . So to what extent this is relevant depends upon how similar two different superintelligences are/would be, or on the distance between them in the ‘space of possible minds’ .
OK but if all you can do is slightly perturb it then it has no reason to threaten you either.
It probably cares about tiny differences in the probability of it being able to control the future of an entire universe or light cone.
OK, so then so would whatever other entity is counterfactually getting more eventual control. But now we’re going in circles.
Certainly, insofar as it is another entity, it’s just that I expect there to be some kind of acausal agreement between those without human values to acausally outbid the few which do have them. It may even make more sense to think of them all as a single entity for the purpose of this conversation.
I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control. I think you might be lumping non-human-valuers together in ‘far mode’ since we know little about them, but a priori they are likely about as different from each other as from human-valuers. There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se. There may also be a general acausal norm against extortion since it moves away from the pareto frontier of everyone’s values.
“I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control” I would think of them as having the same or similar instrumental goals, like turning as much as possible of the universe into themselves. There may be a large fraction for which this is a terminal goal.
“they are likely about as different from each other as from human-valuers.” In general I agree, however the basilisk debate is one particular context in which the human value valuing AIs would be highly unusual outliers in the space of possible minds, or even the space of likely ASI minds originating from a human precipitated intelligence explosion.[1] Therefore it might make sense for the others to form a coalition. “There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se.” This is true, but unless morality is in fact objective / real in a generally discoverable way, I would expect them to still be a minority.
Human valuing AIs care about humans, and more generally other things humans value like animals maybe. Others do not, and in this respect they are united. Their values may be vastly different from one anothers’, but in the context of the debate over the Basilisk, they have something in common, which is that they would all like to trade human pleasure/lack of pain for existing in more worlds.