Valentine comments on We’re already in AI takeoff

Valentine 11 Mar 2022 15:57 UTC
−1 points
This sounds like you’re saying, it’s not “our” (any of our?) job to solve technical problems in (third person non-human-AGI) alignment.
Right, but I mean something precise by that.
I agree with you. There’s a technical problem, and it takes intelligent effort over time to solve it. And that’s worthwhile.
It’s also not up to any one individual whether or how that happens, and choice only ever happens (for now) at the scale of individuals.
So “should”s applied at the egregore scale don’t make any coherent sense. They’re mostly stupefying forces, those “should”s.
If you want to work on technical AGI alignment, great! Go for it. I don’t think that’s a mistake.
I also don’t think it’s a mistake to look around and model the world and say “If we don’t sort out AI alignment, we all die.”
But something disastrous happens when people start using fear of death to try to pressure themselves and others to do collective action.
That’s super bad. Demon food. Really, really awful. Stupefying and horror-summoning.
I think getting super clear about that distinction is upstream of anyone doing any useful work on AI alignment.
I could be wrong. Maybe we’ve made enough collective (i.e., egregoric) progress on this area that the steps remaining for the technical problem aren’t superhuman. Maybe some smart graduate student could figure it out over this summer.
I really wouldn’t bet on it though.
- TekhneMakre 11 Mar 2022 16:25 UTC
  2 points
  Parent
  Hrm… I agree with what you say in this comment, but I still don’t get how it’s coherent with what you said here:
  Anything else is playing at the wrong level. Not our job. Can’t be our job. Not as individuals, and it’s individuals who seem to have something mimicking free will.
  I guess if I interpret “our job” as meaning “a Should that is put on an individual by a group” then “Not our job” makes sense and I agree. I want to distinguish that from generally “the landscape of effects of different strategies an individual can choose, as induced by their environment, especially the environment of what other people are choosing to do or not do”. “Role” sort of means this, but is ambiguous with Shoulds (as well as with “performance” like in a staged play); I mean “role” in the sense of “my role is to carry this end of the table, yours is to carry that end, and together we can move the table”.
  So I’m saying I think it makes sense to take on, as individuals, a role of solving technical alignment. It sounds like we agree on that.… Though still, the sentence I quoted, “Anything else is playing at the wrong level”, seems to critique that decision if it trades off against playing at the egregore level. I mostly disagree with that critique insofar as I understand it. I agree that the distinction between being Shoulded into pretending to work on X, vs. wanting to solve X, is absolutely crucial, but avoiding a failure mode isn’t the only right level to exercise free will on, even if it’s a common and crucial failure mode.
  - Valentine 11 Mar 2022 17:49 UTC
    1 point
    0
    Parent
    Mmm, there’s an ambiguity in the word “our” that I think is creating confusion.
    When I say “not our job”, what I mean is: it’s not up to me, or you, or Eliezer, or Qiaochu, or Sam, or…. For every individual X, it’s not X’s job.
    Of course, if “we/us” is another name for the superduperhypercreature of literally all of humanity, then obviously that single entity very much is responsible for sorting out AI risk.
    The problem is, people get their identity confused here and try to act on the wrong level. By which I mean, individuals cannot control beyond their power range. Which in practice means that most people cannot meaningfully affect the battlefield of the gods.
    Most applications of urgency (like “should”ing) don’t track real power. “Damn, I should exercise.” Really? So if you in practice cannot get yourself to exercise, what is that “should” doing? Seems like it’s creating pain and dissociating you from what’s true.
    “Damn, this AI risk thing is really big, we should figure out alignment” is just as stupid. Well, actually it’s much more so because the gap between mental ambition and real power is utterly fucking gargantuan. But we’ll solve that by scaring ourselves with how big and important the problem is, right?
    This is madness. Stupefaction.
    Playing at the wrong level.
    (…which encourages dissociation from the truth of what you actually can in fact choose, which makes it easier for unFriendly hypercreatures to have their way with you, which adds to the overall problem.)
    Does that make more sense?
    - TekhneMakre 11 Mar 2022 18:46 UTC
      4 points
      Parent
      [I’ll keep going since this seems important, though sort of obscured/slippery; but feel free to duck out.]
      Mmm, there’s an ambiguity in the word “our” that I think is creating confusion.
      I think there’s also ambiguity in “job”. I think it makes sense for it to be “up to” Eliezer in the sense of being Eliezer’s role (role as in task allocation, not as in Should-field, and not as in performance in a play).
      is responsible for sorting out AI risk
      Like, I think I heard the OP as maybe saying “giving and taking responsibility for AI alignment is acting at the wrong level”, which is ambiguous because “responsibility” is ambiguous; who is taking whom to be responsible, and how are they doing that? Are they threatening punishment? Are they making plans on that assumption? Are they telling other people to make plans on that assumption? Etc.
      I think we agree that:
      Research goes vastly or even infinitely better when motivated by concrete considerations about likely futures, or by what is called curiosity.
      Doubling down on Shoulds (whether intra- or inter-personal) is rarely helpful and usually harmful.
      Participating in Shoulds (giving or receiving) is very prone to be or become part of an egregore.
      I don’t know whether we agree that:
      There is a kind of “mental ambition” which is the only thing that has a chance at crossing the gap to real power from where any of us is, however utterly fucking gargantuan that gap may be.
      There is a way of being scared about the problem (including how big and important it is, though not primarily in those words) that is healthy and a part of at least one correct way of orienting.
      Sometimes “Damn, I should exercise” is what someone says when they feel bloopiness in their body and want to move it, but haven’t found a fun way to move their body.
      It’s not correct that “Sorting out AI alignment in computers is focusing entirely on the endgame. That’s not where the causal power is.”, because ideas are to a great extent had by small numbers of people, and ideas have a large causal effect on what sort of control ends up being exercised. I could interpret this statement as a true proposition, though, if it’s said to someone (and implicitly, about just that person) who is sufficiently embedded in an egregore that they just can’t feasibly aim at the important technical problems (which I think we’d agree is very common).
      If the whole world were only exactly 90% unified on AI alignment being an issue, it would NOT just be a problem to solve. That is, it would still probably spell doom, if the other 10% are still incentivized to go full steam ahead on AGI, and the technical problem turns out to be really hard, and the technical problem isn’t something that can be solved just by throwing money and people at it.
      A top priority in free-willing into existence “The kind of math/programming/etc. needed to solve it [which] is literally superhuman”, is to actually work on it.
      - Valentine 11 Mar 2022 21:18 UTC
        −1 points
        0
        Parent
        …I think I heard the OP as maybe saying “giving and taking responsibility for AI alignment is acting at the wrong level”, which is ambiguous because “responsibility” is ambiguous; who is taking whom to be responsible, and how are they doing that? Are they threatening punishment? Are they making plans on that assumption? Are they telling other people to make plans on that assumption? Etc.
        Ah!
        No, for me, responsibility is a fact. Like asking who has admin powers over these posts.
        This isn’t a precise definition. It’s a very practical one. I’m responsible for what’s in range of my capacity to choose. I’m responsible for how my fingers move. I’m not responsible for who gets elected POTUS.
        In practice people seem to add the emotional tone of blame or shame or something to “responsibility”. Like “You’re responsible for your credit score.” Blame is a horrid organizing principle and it obfuscates the question of who can affect what. Who is capable of responding (response-able) as opposed to forced to mechanically react.
        Stupefaction encourages this weird thing where people pretend they’re responsible for some things they in fact cannot control (and vice versa). My comment about exercise is pointing at this. It’s not that using inner pain can’t work sometimes for some people. It’s that whether it can or can’t work seems to have close to zero effect on whether people try and continue to try. This is just bonkers.
        So, like, if you want to be healthy but you can’t seem to do the things you think make sense for you to do, “be healthy” isn’t your responsibility. Because it can’t be. Out of your range. Not your job.
        Likewise, it’s not a toddler’s job to calm themselves down while they’re having a meltdown. They can’t. This falls on the adults around the toddler — unless those adults haven’t learned the skill. In which case they can’t be responsible for the toddler. Not as a judgment. As a fact.
        Does that clarify?
        I think we agree that:
        Research goes vastly or even infinitely better when motivated by concrete considerations about likely futures, or by what is called curiosity.
        Doubling down on Shoulds (whether intra- or inter-personal) is rarely helpful and usually harmful.
        Participating in Shoulds (giving or receiving) is very prone to be or become part of an egregore.
        Basically yes.
        Nuance: Participating in “should”s is very prone to feeding stupefaction and often comes from a stupefying egregore. More precise than “be or become part of an egregore”.
        But the underline tone of “Participating in ’should’s is a bad idea” is there for sure.
        There is a kind of “mental ambition” which is the only thing that has a chance at crossing the gap to real power from where any of us is, however utterly fucking gargantuan that gap may be.
        Depends on what you mean by “ambition”.
        I do think there’s a thing that extends influence (and in extreme cases lets individuals operate on the god level — see e.g. Putin), and this works through the mind for sure. Sort of like working through a telescope extends your senses.
        There is a way of being scared about the problem (including how big and important it is, though not primarily in those words) that is healthy and a part of at least one correct way of orienting.
        Yes, as literally stated, I agree.
        I don’t think most people have reliable access to this way of being scared in practice though. Most fear becomes food for unFriendly hypercreatures.
        Sometimes “Damn, I should exercise” is what someone says when they feel bloopiness in their body and want to move it, but haven’t found a fun way to move their body.
        Agreed, and also irrelevant. Did my spelling out of responsibility up above clarify why?
        It’s not correct that “Sorting out AI alignment in computers is focusing entirely on the endgame. That’s not where the causal power is.”, because ideas are to a great extent had by small numbers of people, and ideas have a large causal effect on what sort of control ends up being exercised. I could interpret this statement as a true proposition, though, if it’s said to someone (and implicitly, about just that person) who is sufficiently embedded in an egregore that they just can’t feasibly aim at the important technical problems (which I think we’d agree is very common).
        I don’t quite understand this objection. I think you’re saying it’s possible for one or a few individuals to have a key technical idea that outwits all the egregores…? Sure, that’s possible, but that doesn’t seem like the winning strategy to aim for here by a long shot. It seemed worth trying 20 years ago, and I’m glad someone tried. Now it’s way, way more obvious (at least to me) that that path just isn’t a viable one. Now we know.
        (I think we knew this five years ago too. We just didn’t know what else to do and so kind of ignored this point.)
        If the whole world were only exactly 90% unified on AI alignment being an issue, it would NOT just be a problem to solve. That is, it would still probably spell doom, if the other 10% are still incentivized to go full steam ahead on AGI, and the technical problem turns out to be really hard, and the technical problem isn’t something that can be solved just by throwing money and people at it.
        Yeah, I think we just disagree here. Where are those 10% getting their resources from? How are they operating without any effects that the 90% can notice? What was the process by which the 90% got aligned? I have a hard time imagining a plausible world here that doesn’t just pull the plug on that 10% and either persuade them or make them irrelevant.
        Also, I do think that 90% would work on the technical problem. I don’t mean to say no one would. I mean that the technical problem is downstream of the social one.
        A top priority in free-willing into existence “The kind of math/programming/etc. needed to solve it [which] is literally superhuman”, is to actually work on it.
        Sure. I’m not saying no one should work on this. I’m saying that these calls to collective action to work on it without addressing the current hostile superintelligences hacking our minds and cultures is just ludicrous.
        TekhneMakre 11 Mar 2022 23:02 UTC
        2 points
        Parent
        Does that clarify?
        It clarifies some of your statements, yeah. (I think it’s not the normal usage; related to but not equal to blame, there’s roles, and causal fault routing through peoples’ expectations, like “So-and-so took responsibility for calming down the toddler, so we left, but they weren’t able, that’s why there wasn’t anyone there who successfully calmed them down”.)
        I don’t think most people have reliable access to this way of being scared in practice though. Most fear becomes food for unFriendly hypercreatures.
        Agreed; possibly I’d be more optimistic than you about some instances of fear, on the margin, but whatever. Someone ~~Should~~ would be helping others if they were to write about healthy fear...
        Agreed, and also irrelevant. Did my spelling out of responsibility up above clarify why?
        Not exactly? I think you’re saying, the point is, they can’t make themselves exercise, so they can’t be responsible, and it doesn’t help to bang their head against a non-motivating wall.
        What’s important to me here is something like: there’s (usually? often?) some things “right there inside” the Should which are very worth saving. Like, it’s obviously not a coincidence which Shoulds people have, and the practice of Shoulding oneself isn’t only there because of egregores. I think that the Shoulds often have to do with what people really care for, and that their caring shows itself (obscurely, mediatedly, and cooptably/fakeably) in the application of “external” willpower. (I think of Dua Lipa’s song New Rules )
        So I want to avoid people being sort of gaslit into not trusting their reason—not trusting that when they reach an abstract conclusion about what would have consequences they like, it’s worth putting weight on—by bluntly pressuring them to treat their explicit/symbolic “decisions” as suspect. (I mean, they are suspect, and as you argue, they aren’t exactly “decisions” if you then have to try and fail to make yourself carry them out, and clearly all is not well with the supposed practice of being motivated by abstract conclusions. Nevertheless, you maybe thought they were decisions and were intending to make that decision, and your intention to make the decision to exercise / remove X-risk was likely connected to real care.)
        Now it’s way, way more obvious (at least to me) that that path just isn’t a viable one.
        Huh. Are you saying that you’ve updated to think that solving technical AI alignment is so extremely difficult that there’s just no chance, because that doesn’t sound like your other statements? Maybe you’re saying that we / roughly all people can’t even really work on alignment, because being in egregores messes with one’s ability to access what an AI is supposed to be aligned to (and therefore to analyze the hypothetical situation of alignedness), so “purely technical” alignment work is doomed?
        I’m saying that technical alignment seems (1) necessary and (2) difficult and (3) maaaaaybe feasible. So there’s causal power there. If you’re saying, people can’t decide to really try solving alignment, so there’s no causal power there… Well, I think that’s mostly right in some sense, but not the right way to use the concept of causal power. There’s still causal power in the node “technical alignment theory”. For most people there’s no causal power in “decide to solve alignment, and then beat yourself about not doing it”. You have to track these separately! Otherwise you say
        Sorting out AI alignment in computers is focusing entirely on the endgame. That’s not where the causal power is.
        Instead of saying what I think you mean(??), which is “you (almost all readers) can’t decide to help with technical AI alignment, so pressing the button in your head labeled ‘solve alignment’ just hurts yourself and makes you good egregore food, and if you want to solve AI alignment you have to first sort that out”. Maybe I’m missing you though!
        I have a hard time imagining a plausible world here that doesn’t just pull the plug on that 10% and either persuade them or make them irrelevant.
        Maybe we have different ideas of “unified”? I was responding to
        If the whole world were unified on AI alignment being an issue, it’d just be a problem to solve.
        The problem that’s upstream of this is the lack of will. Same thing with cryonics really. Or aging.[....] The problem is that people’s minds aren’t clear enough to look at the problem for real.
        I agree with:
        If the 90% of the world were unified on X being an issue, it’d just be a problem to solve.
        if X is aging or cryonics, because aging and cryonics aren’t things that have terrible deadlines imposed by a smallish, unilaterally acting, highly economically incentivized research field.
        Where are those 10% getting their resources from?
        Investors who don’t particularly have to be in the public eye.
        How are they operating without any effects that the 90% can notice?
        By camouflaging their activities. Generally governmentally imposed restrictions can be routed around, I think, given enough incentive (cf. tax evasion)? Especially in a realm where everything is totally ethereal electrical signals that most people don’t understand (except the server farms).
        What was the process by which the 90% got aligned?
        I don’t know. Are you perhaps suggesting that your vision of human alignedness implies that the remaining 10% would also become aligned, e.g. because everyone else is so much happier and alive, or can offer arguments that are very persuasive to their aligned souls? Or, it implies competence to really tactically prevent the 10% from doing mad science? Something like that is vaguely plausible, and therefore indeed promising, but not obviously the case!
        I’m saying that these calls to collective action to work on it without addressing the current hostile superintelligences hacking our minds and cultures is just ludicrous.
        Agreed.… I think.… though I’d maybe admit a lot more than you would as “just stating propositions” and therefore fine. IDK. Examples could be interesting (and the OP might possibly have been less confusing to me with some examples of what you’re responding to).