Regarding “Intelligence and the Good”, would you mind summarizing in a sentence or something what you might suggest I could take from it?
You had a “what on Earth?” reaction to Lumpen talking about intelligence being good unlike paperclips, so I thought it was relevant as a perspective on why intelligence might be prima-facie a good thing unlike paperclips (ofc extrapolating to intelligence explosion is harder). In particular the relationship between intelligence and openness, contra negative-feedback traps.
Increasing human intelligence is good because it’s in the context of human souls.
Yeah I disagree here but moving on...
Except, “indistinguishable” is way too strong, probably, IDK. I would agree with “probably heavily overlapping / entangled with”. Also I’m not actually that sure what “will-to-X” is supposed to mean here.
Agree re: too strong. Will-to-think as a phrase references his essay, “Will-To-Think”, which is also relevant as commenting on the same general area.
It kinda sounds like he’s saying “intelligence has to be a terminal goal; therefore other things can’t be a terminal goal”. Is he applying a strong mutual-exclusion principle on goals, based off selection pressure / competition / taxes / etc.?
The kind of situation he thinks is unlikely is one where an agent has a arbitrary/stupid terminal goal, and has giant intelligence organized all around that. What he is saying is that for the system to be intelligent, it needs to decide to be intelligent. It couldn’t be intelligent if due to its terminal goal, it decided to not increase its intelligence. The volition to think needs to be a drive, though doesn’t in principle need to be a terminal drive; it cannot be defeated by some other drive and the system still be intelligent.
It would be possible to weaken this to the kind of claim you agreed with earlier (dung beetle value drifts because alignment is hard). I’m interested in a possible intermediate statement. The kind of situation I imagine is that there is a multi-component mind and one of the components is the “utility function” component which uses some simple rule to score representations of possible futures. That component could stay stupid while other components get smarter. It seems now easy to imagine that the other components could develop their own drives that end up steering the system more than the “utility function module”. They could route around the utility module and cause dynamics that pursue ends set by the more intelligent parts of the mind. This could map to an “inner alignment failure” in MIRI ontology. As he discusses later, there is a possible analogy with evolution, where humans have something like a reward module set by evolution, but do not always act according to it.
Of course the MIRI theorist can say “well yes I agree inner alignment is hard, and it is likely that early AGIs would not hold to their original terminal goals, and instead they would get smart and then only later settle on a terminal goal; it is just not my opinion that the terminal goal is by default going to be set by a stupid system and continue to be held to by smart systems” and this is a partial agreement/disagreement with Land.
other obvious hypotheses include diminishing returns to investment in brains
Yeah I don’t have a strong opinion on the biology here, am guessing you’re more correct than Land.
Overall I suggested these essays because you had a “what on Earth?” reaction to things Lumpen was saying and I think these essays suggest more context to the background worldview on why it might be plausible that valuable things come from intelligence and processes that increase intelligence, and that there isn’t a clearly better account for where valuable things come from.
What he is saying is that for the system to be intelligent, it needs to decide to be intelligent. It couldn’t be intelligent if due to its terminal goal, it decided to not increase its intelligence.
Hm. Is the syllogism something like (I’m being sloppy with wording but)
Alignment to G is impossible.
Therefore, permanently pursuing G requires not getting smarter.
Goodness comes from getting smarter.
Therefore alignment is bad.
And then this could be softened to like “alignment is hard, so it cuts against increasing intelligence, so it’s kinda bad”?
For a wide variety of G, aligning to G would prevent getting smarter.
Goodness comes from getting smarter.
Therefore, for a wide variety of G, aligning to G is bad.
But not if G = intelligence optimization (or maybe something highly compatible with intelligence optimization)
The main way to question 1 is the instrumental/terminal goal distinction. We could imagine that a paperclip maximizer is aligned to paperclips, continually decides to think / optimize its intelligence instead of paperclips up to a point, then towards the end of the universe, it starts paperclipping instead of intelligence optimizing. This is an edge case in the Landian schema, since it would have the will-to-think early on, but put some limit on it; and also there’s some disagreement about the plausibility of this case. (It seems instrumental / terminal goal distinctions would exist in some cognitive architectures, but it’s not clear that human brains are such an architecture.)
In the human-scale /acc case it’s more like ~everyone agrees that alignment would require slowing down intelligence, and the practical disagreement is elsewhere. There’s one perspective on 2 that is like “well yes human values in part came from intelligence optimization in evolutionary history, some of our values are our own intelligence deciding its own thing contra evolutionary drives, but also, intelligence is more like one ingredient and there are other ingredients that are basically random, we randomly got the good values”. And “we randomly got the good values” could either be a matter of luck on a moral realist account or could be because value is a relational concept and saying “we have good values” is a tautology because it’s just saying the distance metric between our values and our values is low. (But then Land objects that a tautological claim like this isn’t very compelling given there are symmetry-breaking factors of convergence across different minds… which can then be questioned on realism grounds and normative grounds etc etc)
I suppose sociologically, there is a directionality to technological progress which is associated with capitalism and intelligence optimization (this relates to Land’s “AI = capitalism” thesis), and different people decide to be more or less conditionally pro this. They might want to get off the train at some point due to having something to protect. There is some destination that they value more than the journey, and they want to slow the train down. (Or maybe steer the train differently, as the alignment theorists might want to put it). Given this a lot of people would relate to a prima facie consideration of “intelligence optimization good” and would differ in how compelling they find other considerations.
(“Random” isn’t how I would say it; it’s a meaningful part of our history; but this is interpretable only if you admit the created-in-motion valuations. It’s Yudkowsky’s “justification loop through the meta-level, not just a tautology” thing.)
But then Land objects that a tautological claim like this isn’t very compelling given there are symmetry-breaking factors of convergence across different minds… which can then be questioned on realism grounds and normative grounds etc etc
And Yudkowsky would reply that it’s not supposed to be compelling to arbitrary minds (including realistic ones), just to human / humane minds.
So like, if I tried to appeal to some values** in your mind, to get you to realize that you want to be anti-full-speed-ahead with AI, you (whoever’s receiving the message) would view that as the Cathedral trying to prevent your pursuit of intelligence in a way which is doomed to either fail, or else to succeed at permanently keeping the world dull?
** [quite broadly construed—generally, elements that would play a significant role in your ongoing self-governance (which one can have fun with the etymology of)]
Sorry, let me rephrase; it sounds like you and/or Land have chosen a disembodied / nonindexed viewpoint on values.… or I mean, you know, applying the criterion of universality to values, and then dismissing nonconvergent values on those grounds? Like, why would “parochial values being good values because they seem good to you is not compelling because the reasoning doesn’t lead to convergence” or “parochial values being good values because they seem good to you is not compelling because different minds have different parochial values” be compelling? Sounds like a commitment to non-parochialness.
If so, why? Do you think it’s instrumentally useful to do so? I can kinda see how that would be reflectively stable ish, in some respects. (I don’t think it’s instrumentally useful, but that’s based on really using the means-ends evaluation where I say it’s instrumentally dumb because an AGI IE would trample your ends.) Perhaps you might reply “Sure, it’s instrumentally useful, but that’s not why I’m applying the criterion. I’m applying the criterion because intelligence is good, convergent things are intelligent, so I want to find what’s convergent”. But that’s grounding out “intelligence is good, overriding parochial goodness” in “intelligence is good”, which isn’t much grounding. You could say “Sure, it’s the same sort of justification loop through the meta-level”. And I’m like, ok, yeah, it’s maybe another sort of stable point, not sure; but I don’t get why you like that stable point, or at least, how you got there (or how you got to thinking that you’re there, or that it would be good to be there); and also it sounds like you think that equilibrium is supposed to be compelling to someone in another equilibrium (or you think the other one is less of an equilibrium).
So like, if I tried to appeal to some values** in your mind, to get you to realize that you want to be anti-full-speed-ahead with AI, you (whoever’s receiving the message) would view that as the Cathedral trying to prevent your pursuit of intelligence in a way which is doomed to either fail, or else to succeed at permanently keeping the world dull?
Perhaps? That’s a structural reading, different from the object-level argumentative reading. In many cases there are industries/governments who incentivize certain discourse patterns. So specific discourse moves could be instances of this pattern but it’s hard to judge except on a case by case basis.
or I mean, you know, applying the criterion of universality to values, and then dismissing nonconvergent values on those grounds?
This has to be at least in part semantic. I think some things are good and also I think some things are what I want and what I tend to pursue. And I don’t think these are the same concept. I don’t think it is tautologically the case that I tend to pursue what is good. I don’t think Land believes this about himself either.
I think Land and I can both say that when we say something is good, we are making a different claim than that we want the thing. It is unclear in other cases; you mention Yudkowsky’s meta ethics and I am not sure exactly how to fill in the blank. Perhaps Yudkowsky by “good” means what he would want on reflection? Or maybe he thinks “good” is CEV of humanity not just himself?
The symmetry-breaking idea has to do with ways of thinking and acting that depend on which considerations are more or less universalizable. So people can judge that some things are more universal-good than others and incline their behavior towards those which aligns their revealed-preference wants with what is universal-good in their view more or less. It doesn’t have to be a perfect correspondence.
Like, why would “parochial values being good values because they seem good to you is not compelling because the reasoning doesn’t lead to convergence” or “parochial values being good values because they seem good to you is not compelling because different minds have different parochial values” be compelling? Sounds like a commitment to non-parochialness.
I don’t think something is a good value just because it seems good to me. In other cases this is easy to see: I don’t think some numerical sum has some value just because it seems that way to me. Now of course this runs into philosophical questions about what “good” means other than seeming good to the speaker. (Yudkowsky discusses some self-ratification problems in No license to be human).
Like for example, why would I disagree that intelligence optimization is good in the human case only because it is a human being optimized? For that statement to parse as correct to me, I would need to judge some intelligence optimization to be good in cases that a human is being optimized and not in other cases. But that doesn’t read to me as what I want. I think I care about humans more than other animals in large part because humans have better cognition than other animals. I think if dogs were as smart as people then maybe I would value them as much as people. I suppose here I am demonstrating a habit of mind and of speech that is explaining preferences in terms of other preferences and these tending towards universality.
But that’s grounding out “intelligence is good, overriding parochial goodness” in “intelligence is good”, which isn’t much grounding.
“Intelligence is good” matches what I feel is good better than “human intelligence is good”. Now of course one can ask “why” to that as a psychological question and then maybe part of what happens psychologically is that I evaluate things on how universal they seem and up-weight universalizable ones and then that affects my brain’s reward function and so I feel better about such statements. And Land explains more why he thinks intelligence is convergent and a universal tendency, and I vibe with that and that is a causal factor in my upvoting “Intelligence is good”.
I get that maybe if you wanted an ultimate “but why?” explanation you will be disappointed but it doesn’t seem like in your case you are in general giving ultimate “but why?” explanations to everything you want.
it sounds like you think that equilibrium is supposed to be compelling to someone in another equilibrium (or you think the other one is less of an equilibrium).
Yeah I’m not sure. I think some value systems fail at reflective equilibrium. Yudkowsky’s Lobian considerations point at some such failures. Land’s ideas point at possible differential stability conditions. I of course don’t want to make a universal psychological statement of compellingness, given that it’s more of an empirical question, how often when people read Land/Yudkowsky/whoever do they end up with tendencies towards some attractors of use of language like “value” and “good” and “intelligence” and so on?
I don’t think something is a good value just because it seems good to me.
Ok this is a fair response to what I asked, but it feels a bit besides the point, though maybe you don’t think so. Like, I agree that various tendencies toward universalizing are good/correct, and I agree that this, as well as other tools, are how you investigate and adopt differences between what seems good and what later is revealed to be good. But the question I’m trying to ask is like “how does this get you all the way to not wanting anything that isn’t universalizable”, if that’s your stance (? confused).
I think Land and I can both say that when we say something is good, we are making a different claim than that we want the thing. It is unclear in other cases; you mention Yudkowsky’s meta ethics and I am not sure exactly how to fill in the blank. Perhaps Yudkowsky by “good” means what he would want on reflection?
I don’t think I need to precisely say what I mean by good here, to make the point? Like, I’m saying that the non-convergent valuesy preferencesy free-choice-makingy goalsy goodnessy stuff can be self-ratifying, and probably is to a substantive extent in humans, and there’s nothing wrong with that; I’m unclear on your position, but I think you think that there is something wrong with it? Er, let me restate—I think you choose to not look for what is parochial self-ratifying valuesy stuff in yourself and help it self-ratify, and would avoid that? Or you think you do that? (Unsure, sorry if I keep asking the same questions.)
I think I care about humans more than other animals in large part because humans have better cognition than other animals. I think if dogs were as smart as people then maybe I would value them as much as people.
That’s an interesting thread. I’m curious how easy you’d find it to imagine beings with various functions from [how intelligent they are/become] to [how much you’d value them].
E.g. can you imagine a being that you’d value the same even as it gets smarter? I imagine that usually you’d view it as more and more valuable the smarter it gets?
Can you easily imagine a being that you’d value more as it’s smarter, but SLOWER than humans?
Can you easily imagine a being that you’d value more as it’s smarter, but ASYMPTOTING or NONMONOTONICALLY? (I imagine yes, because you could imagine a species such as humans or similar which, if a bit too smart, would by default Cathedral it up so hard that they permanently stop a foom?)
Can you easily imagine a being that you’d value more as it’s smarter, but FASTER than humans? (I would weakly predict yes, because you’d view a fooming AGI as being good, and likely to grow less constrainedly than humans? Unsure.)
Can you easily imagine a being that you’d value LESS as it’s smarter, EVEN IF IT GETS SMARTER AND SMARTER UNBOUNDEDLY?
But the question I’m trying to ask is like “how does this get you all the way to not wanting anything that isn’t universalizable”, if that’s your stance (? confused).
As I said, what I think is good is not the same as what I want. Similarly, what I want is not the same as what is universalizable.
Like, I’m saying that the non-convergent valuesy preferencesy free-choice-makingy goalsy goodnessy stuff can be self-ratifying, and probably is to a substantive extent in humans, and there’s nothing wrong with that; I’m unclear on your position, but I think you think that there is something wrong with it?
I mean, I think humans vary in intelligence, coherence, and intentional-stance values. And the distribution is non orthogonal, in that some attractors are smarter than others. Some of the attractors are more right than others, in terms of epistemic-right, in terms of intelligence, coherence, etc. I get maybe you disagree with my usage of “right” here but I don’t think I’m using the term incoherently. I think you’d partially agree in that alignment is infeasible / orthogonality is false for human-level agents.
E.g. can you imagine a being that you’d value the same even as it gets smarter? I imagine that usually you’d view it as more and more valuable the smarter it gets?
That’s hard, it’s a balancing act. Maybe as it gets smarter it also gets more destructive to my selfish, short termist interests, like it creates a bunch of everyday inconveniences. Then maybe I’d value it more due to its intelligence and less because of the interferences. There might be some balancing point, idk. It’s an awkward hypothetical though.
Can you easily imagine a being that you’d value more as it’s smarter, but SLOWER than humans?
I could imagine maybe humans create art I appreciate at a higher rate as they get smarter, and the art quality axis is sloped up more for humans than some other animal species.
Can you easily imagine a being that you’d value more as it’s smarter, but ASYMPTOTING or NONMONOTONICALLY? (I imagine yes, because you could imagine a species such as humans or similar which, if a bit too smart, would by default Cathedral it up so hard that they permanently stop a foom?)
Your example is a bit strange because stopping a foom means stopping intelligence. To me it’s hard to imagine the balancing-out although I mentioned the possibility of accidental correlation (it gets more inconvenient to me as it gets smarter) which could apply here.
Can you easily imagine a being that you’d value more as it’s smarter, but FASTER than humans? (I would weakly predict yes, because you’d view a fooming AGI as being good, and likely to grow less constrainedly than humans? Unsure.)
Yeah I guess? There are various accidental reasons I like some humans more than others that are not just predicted by intelligence, and that could extend to maybe I would like some equal-intelligence fantasy creatures more than humans.
Can you easily imagine a being that you’d value LESS as it’s smarter, EVEN IF IT GETS SMARTER AND SMARTER UNBOUNDEDLY?
I guess I could imagine an AI torture scenario where I would not want the AI to get smarter. Or maybe an AI that is trying to decel as much of the universe as possible, like killing all the aliens. Although of course I’d inquire into the realism of the hypothetical. (Analogy: zombie arguments sometimes conflate “causally easy to imagine” with “actually possible / plausible / realistic”, need to elaborate on the imagination to judge it properly.)
To be clear the “value” in these cases are something like a casual judgment of what I like more, it’s not meant to be a philosophical thesis. When I’m talking about intelligence metrics and dogs I’m making more of a prima facie / all-else-being-equal claim and then there could be other factors that influence what I would like more.
Alignment to G is impossible. Therefore, permanently pursuing G requires not getting smarter.
For a wide variety of G, aligning to G would prevent getting smarter.
Sidenote, maybe not important, but noting: I think the reason for this difference is that to me, “alignment” means “making a mind that can grow unboundedly and will always pursue G” (well, I’m not actually all that committed to the “goal” ontology but it’s fine here I think). Noting mainly because it might help communication.
Suppose an AI faced a tradeoff between optimizing its intelligence and maximizing paperclips. If it is aligned to paperclips, then it would pick the option that maximizes paperclips at the expense of intelligence. In some sense this means even if it can grow unboundedly in intelligence, it would sometimes decide not to. This is in Land’s ontology, a lack of will-to-think at some point in the process.
Now of course someone could object that this situation won’t come up, because the paperclip maximizer pursues Omohundro drives, which include intelligence optimization. Or perhaps the situation does come up but only late in the universe.
Now of course someone could object that this situation won’t come up, because the paperclip maximizer pursues Omohundro drives, which include intelligence optimization. Or perhaps the situation does come up but only late in the universe.
You had a “what on Earth?” reaction to Lumpen talking about intelligence being good unlike paperclips, so I thought it was relevant as a perspective on why intelligence might be prima-facie a good thing unlike paperclips (ofc extrapolating to intelligence explosion is harder). In particular the relationship between intelligence and openness, contra negative-feedback traps.
Yeah I disagree here but moving on...
Agree re: too strong. Will-to-think as a phrase references his essay, “Will-To-Think”, which is also relevant as commenting on the same general area.
The kind of situation he thinks is unlikely is one where an agent has a arbitrary/stupid terminal goal, and has giant intelligence organized all around that. What he is saying is that for the system to be intelligent, it needs to decide to be intelligent. It couldn’t be intelligent if due to its terminal goal, it decided to not increase its intelligence. The volition to think needs to be a drive, though doesn’t in principle need to be a terminal drive; it cannot be defeated by some other drive and the system still be intelligent.
It would be possible to weaken this to the kind of claim you agreed with earlier (dung beetle value drifts because alignment is hard). I’m interested in a possible intermediate statement. The kind of situation I imagine is that there is a multi-component mind and one of the components is the “utility function” component which uses some simple rule to score representations of possible futures. That component could stay stupid while other components get smarter. It seems now easy to imagine that the other components could develop their own drives that end up steering the system more than the “utility function module”. They could route around the utility module and cause dynamics that pursue ends set by the more intelligent parts of the mind. This could map to an “inner alignment failure” in MIRI ontology. As he discusses later, there is a possible analogy with evolution, where humans have something like a reward module set by evolution, but do not always act according to it.
Of course the MIRI theorist can say “well yes I agree inner alignment is hard, and it is likely that early AGIs would not hold to their original terminal goals, and instead they would get smart and then only later settle on a terminal goal; it is just not my opinion that the terminal goal is by default going to be set by a stupid system and continue to be held to by smart systems” and this is a partial agreement/disagreement with Land.
Yeah I don’t have a strong opinion on the biology here, am guessing you’re more correct than Land.
Overall I suggested these essays because you had a “what on Earth?” reaction to things Lumpen was saying and I think these essays suggest more context to the background worldview on why it might be plausible that valuable things come from intelligence and processes that increase intelligence, and that there isn’t a clearly better account for where valuable things come from.
Hm. Is the syllogism something like (I’m being sloppy with wording but)
Alignment to G is impossible.
Therefore, permanently pursuing G requires not getting smarter.
Goodness comes from getting smarter.
Therefore alignment is bad.
And then this could be softened to like “alignment is hard, so it cuts against increasing intelligence, so it’s kinda bad”?
I’d rephrase as:
For a wide variety of G, aligning to G would prevent getting smarter.
Goodness comes from getting smarter.
Therefore, for a wide variety of G, aligning to G is bad.
But not if G = intelligence optimization (or maybe something highly compatible with intelligence optimization)
The main way to question 1 is the instrumental/terminal goal distinction. We could imagine that a paperclip maximizer is aligned to paperclips, continually decides to think / optimize its intelligence instead of paperclips up to a point, then towards the end of the universe, it starts paperclipping instead of intelligence optimizing. This is an edge case in the Landian schema, since it would have the will-to-think early on, but put some limit on it; and also there’s some disagreement about the plausibility of this case. (It seems instrumental / terminal goal distinctions would exist in some cognitive architectures, but it’s not clear that human brains are such an architecture.)
In the human-scale /acc case it’s more like ~everyone agrees that alignment would require slowing down intelligence, and the practical disagreement is elsewhere. There’s one perspective on 2 that is like “well yes human values in part came from intelligence optimization in evolutionary history, some of our values are our own intelligence deciding its own thing contra evolutionary drives, but also, intelligence is more like one ingredient and there are other ingredients that are basically random, we randomly got the good values”. And “we randomly got the good values” could either be a matter of luck on a moral realist account or could be because value is a relational concept and saying “we have good values” is a tautology because it’s just saying the distance metric between our values and our values is low. (But then Land objects that a tautological claim like this isn’t very compelling given there are symmetry-breaking factors of convergence across different minds… which can then be questioned on realism grounds and normative grounds etc etc)
I suppose sociologically, there is a directionality to technological progress which is associated with capitalism and intelligence optimization (this relates to Land’s “AI = capitalism” thesis), and different people decide to be more or less conditionally pro this. They might want to get off the train at some point due to having something to protect. There is some destination that they value more than the journey, and they want to slow the train down. (Or maybe steer the train differently, as the alignment theorists might want to put it). Given this a lot of people would relate to a prima facie consideration of “intelligence optimization good” and would differ in how compelling they find other considerations.
(“Random” isn’t how I would say it; it’s a meaningful part of our history; but this is interpretable only if you admit the created-in-motion valuations. It’s Yudkowsky’s “justification loop through the meta-level, not just a tautology” thing.)
And Yudkowsky would reply that it’s not supposed to be compelling to arbitrary minds (including realistic ones), just to human / humane minds.
So like, if I tried to appeal to some values** in your mind, to get you to realize that you want to be anti-full-speed-ahead with AI, you (whoever’s receiving the message) would view that as the Cathedral trying to prevent your pursuit of intelligence in a way which is doomed to either fail, or else to succeed at permanently keeping the world dull?
** [quite broadly construed—generally, elements that would play a significant role in your ongoing self-governance (which one can have fun with the etymology of)]
Sorry, let me rephrase; it sounds like you and/or Land have chosen a disembodied / nonindexed viewpoint on values.… or I mean, you know, applying the criterion of universality to values, and then dismissing nonconvergent values on those grounds? Like, why would “parochial values being good values because they seem good to you is not compelling because the reasoning doesn’t lead to convergence” or “parochial values being good values because they seem good to you is not compelling because different minds have different parochial values” be compelling? Sounds like a commitment to non-parochialness.
If so, why? Do you think it’s instrumentally useful to do so? I can kinda see how that would be reflectively stable ish, in some respects. (I don’t think it’s instrumentally useful, but that’s based on really using the means-ends evaluation where I say it’s instrumentally dumb because an AGI IE would trample your ends.) Perhaps you might reply “Sure, it’s instrumentally useful, but that’s not why I’m applying the criterion. I’m applying the criterion because intelligence is good, convergent things are intelligent, so I want to find what’s convergent”. But that’s grounding out “intelligence is good, overriding parochial goodness” in “intelligence is good”, which isn’t much grounding. You could say “Sure, it’s the same sort of justification loop through the meta-level”. And I’m like, ok, yeah, it’s maybe another sort of stable point, not sure; but I don’t get why you like that stable point, or at least, how you got there (or how you got to thinking that you’re there, or that it would be good to be there); and also it sounds like you think that equilibrium is supposed to be compelling to someone in another equilibrium (or you think the other one is less of an equilibrium).
Perhaps? That’s a structural reading, different from the object-level argumentative reading. In many cases there are industries/governments who incentivize certain discourse patterns. So specific discourse moves could be instances of this pattern but it’s hard to judge except on a case by case basis.
This has to be at least in part semantic. I think some things are good and also I think some things are what I want and what I tend to pursue. And I don’t think these are the same concept. I don’t think it is tautologically the case that I tend to pursue what is good. I don’t think Land believes this about himself either.
I think Land and I can both say that when we say something is good, we are making a different claim than that we want the thing. It is unclear in other cases; you mention Yudkowsky’s meta ethics and I am not sure exactly how to fill in the blank. Perhaps Yudkowsky by “good” means what he would want on reflection? Or maybe he thinks “good” is CEV of humanity not just himself?
The symmetry-breaking idea has to do with ways of thinking and acting that depend on which considerations are more or less universalizable. So people can judge that some things are more universal-good than others and incline their behavior towards those which aligns their revealed-preference wants with what is universal-good in their view more or less. It doesn’t have to be a perfect correspondence.
I don’t think something is a good value just because it seems good to me. In other cases this is easy to see: I don’t think some numerical sum has some value just because it seems that way to me. Now of course this runs into philosophical questions about what “good” means other than seeming good to the speaker. (Yudkowsky discusses some self-ratification problems in No license to be human).
Like for example, why would I disagree that intelligence optimization is good in the human case only because it is a human being optimized? For that statement to parse as correct to me, I would need to judge some intelligence optimization to be good in cases that a human is being optimized and not in other cases. But that doesn’t read to me as what I want. I think I care about humans more than other animals in large part because humans have better cognition than other animals. I think if dogs were as smart as people then maybe I would value them as much as people. I suppose here I am demonstrating a habit of mind and of speech that is explaining preferences in terms of other preferences and these tending towards universality.
“Intelligence is good” matches what I feel is good better than “human intelligence is good”. Now of course one can ask “why” to that as a psychological question and then maybe part of what happens psychologically is that I evaluate things on how universal they seem and up-weight universalizable ones and then that affects my brain’s reward function and so I feel better about such statements. And Land explains more why he thinks intelligence is convergent and a universal tendency, and I vibe with that and that is a causal factor in my upvoting “Intelligence is good”.
I get that maybe if you wanted an ultimate “but why?” explanation you will be disappointed but it doesn’t seem like in your case you are in general giving ultimate “but why?” explanations to everything you want.
Yeah I’m not sure. I think some value systems fail at reflective equilibrium. Yudkowsky’s Lobian considerations point at some such failures. Land’s ideas point at possible differential stability conditions. I of course don’t want to make a universal psychological statement of compellingness, given that it’s more of an empirical question, how often when people read Land/Yudkowsky/whoever do they end up with tendencies towards some attractors of use of language like “value” and “good” and “intelligence” and so on?
Ok, thanks.
Ok this is a fair response to what I asked, but it feels a bit besides the point, though maybe you don’t think so. Like, I agree that various tendencies toward universalizing are good/correct, and I agree that this, as well as other tools, are how you investigate and adopt differences between what seems good and what later is revealed to be good. But the question I’m trying to ask is like “how does this get you all the way to not wanting anything that isn’t universalizable”, if that’s your stance (? confused).
For reference: https://www.lesswrong.com/posts/C8nEXTcjZb9oauTCW/where-recursive-justification-hits-bottom
(Doesn’t answer your question.)
I don’t think I need to precisely say what I mean by good here, to make the point? Like, I’m saying that the non-convergent valuesy preferencesy free-choice-makingy goalsy goodnessy stuff can be self-ratifying, and probably is to a substantive extent in humans, and there’s nothing wrong with that; I’m unclear on your position, but I think you think that there is something wrong with it? Er, let me restate—I think you choose to not look for what is parochial self-ratifying valuesy stuff in yourself and help it self-ratify, and would avoid that? Or you think you do that? (Unsure, sorry if I keep asking the same questions.)
That’s an interesting thread. I’m curious how easy you’d find it to imagine beings with various functions from [how intelligent they are/become] to [how much you’d value them].
E.g. can you imagine a being that you’d value the same even as it gets smarter? I imagine that usually you’d view it as more and more valuable the smarter it gets?
Can you easily imagine a being that you’d value more as it’s smarter, but SLOWER than humans?
Can you easily imagine a being that you’d value more as it’s smarter, but ASYMPTOTING or NONMONOTONICALLY? (I imagine yes, because you could imagine a species such as humans or similar which, if a bit too smart, would by default Cathedral it up so hard that they permanently stop a foom?)
Can you easily imagine a being that you’d value more as it’s smarter, but FASTER than humans? (I would weakly predict yes, because you’d view a fooming AGI as being good, and likely to grow less constrainedly than humans? Unsure.)
Can you easily imagine a being that you’d value LESS as it’s smarter, EVEN IF IT GETS SMARTER AND SMARTER UNBOUNDEDLY?
As I said, what I think is good is not the same as what I want. Similarly, what I want is not the same as what is universalizable.
I mean, I think humans vary in intelligence, coherence, and intentional-stance values. And the distribution is non orthogonal, in that some attractors are smarter than others. Some of the attractors are more right than others, in terms of epistemic-right, in terms of intelligence, coherence, etc. I get maybe you disagree with my usage of “right” here but I don’t think I’m using the term incoherently. I think you’d partially agree in that alignment is infeasible / orthogonality is false for human-level agents.
That’s hard, it’s a balancing act. Maybe as it gets smarter it also gets more destructive to my selfish, short termist interests, like it creates a bunch of everyday inconveniences. Then maybe I’d value it more due to its intelligence and less because of the interferences. There might be some balancing point, idk. It’s an awkward hypothetical though.
I could imagine maybe humans create art I appreciate at a higher rate as they get smarter, and the art quality axis is sloped up more for humans than some other animal species.
Your example is a bit strange because stopping a foom means stopping intelligence. To me it’s hard to imagine the balancing-out although I mentioned the possibility of accidental correlation (it gets more inconvenient to me as it gets smarter) which could apply here.
Yeah I guess? There are various accidental reasons I like some humans more than others that are not just predicted by intelligence, and that could extend to maybe I would like some equal-intelligence fantasy creatures more than humans.
I guess I could imagine an AI torture scenario where I would not want the AI to get smarter. Or maybe an AI that is trying to decel as much of the universe as possible, like killing all the aliens. Although of course I’d inquire into the realism of the hypothetical. (Analogy: zombie arguments sometimes conflate “causally easy to imagine” with “actually possible / plausible / realistic”, need to elaborate on the imagination to judge it properly.)
To be clear the “value” in these cases are something like a casual judgment of what I like more, it’s not meant to be a philosophical thesis. When I’m talking about intelligence metrics and dogs I’m making more of a prima facie / all-else-being-equal claim and then there could be other factors that influence what I would like more.
Ok thanks. I guess I gotta go do other stuff, so I’ll leave it off here. Has been somewhat clarifying about your positions I think.
Sidenote, maybe not important, but noting: I think the reason for this difference is that to me, “alignment” means “making a mind that can grow unboundedly and will always pursue G” (well, I’m not actually all that committed to the “goal” ontology but it’s fine here I think). Noting mainly because it might help communication.
(I think my usage is the orthodox usage, but not confident / maybe it was ambiguous. Cf. “sponge alignment” https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities#:~:text=dangerous things%2C you-,could try a sponge,-%3B a sponge is , i.e. a sponge doesn’t count as solving alignment because it’s useless (though to be fair “useful” here isn’t identical to “unbounded etc etc”.))
Suppose an AI faced a tradeoff between optimizing its intelligence and maximizing paperclips. If it is aligned to paperclips, then it would pick the option that maximizes paperclips at the expense of intelligence. In some sense this means even if it can grow unboundedly in intelligence, it would sometimes decide not to. This is in Land’s ontology, a lack of will-to-think at some point in the process.
Now of course someone could object that this situation won’t come up, because the paperclip maximizer pursues Omohundro drives, which include intelligence optimization. Or perhaps the situation does come up but only late in the universe.
Yes.
Jessi I forbid you to further this madness