M. Y. Zuo comments on Sentience matters

M. Y. Zuo 29 May 2023 22:32 UTC
7 points
1
How would one arrive at a value system that supports the latter but rejects the former?
- Vladimir_Nesov 29 May 2023 22:46 UTC
  4 points
  0
  Parent
  It’s a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer). An example application is robustly leaving aliens alone even if you don’t like them (without a compulsion to give them the universe), or closer to home leaving humans alone (in a sense where not stepping on them with your megaprojects is part of the concept), even if your preference doesn’t consider them particularly valuable.
  
  This makes the alignment target something other than preference, a larger target that’s easier to hit. It’s not CEV and leaves value on the table, doesn’t make efficient use of all resources according to any particular preference. But it might suffice for establishing AGI-backed security against overeager maximizers, with aligned optimizers coming later, when there is time to design them properly.
  - M. Y. Zuo 29 May 2023 22:50 UTC
    6 points
    3
    Parent
    It’s a boundary concept (element of a deontological agent design),...
    What is this in reference to?
    The Stanford Encyclopedia of Philosophy has no reference entry for “boundary concept” nor any string matches at all to “deontological agent” or “deontological agent design”.
    - Vladimir_Nesov 29 May 2023 23:09 UTC
      10 points
      0
      Parent
      It’s a reference to Critch’s Boundaries Sequence and related ideas, see in particular the introductory post and Acausal Normalcy.
      
      It’s an element of a deontological agent design in the literal sense of being an element of a design of an agent that acts in a somewhat deontological manner, instead of being a naive consequentialist maximizer, even if the same design falls out of some acausal society norm equilibrium on consequentialist game theoretic grounds.
      What links here?
      Vladimir_Nesov's comment on Sentience matters by So8res (30 May 2023 1:48 UTC; 5 points)
      - M. Y. Zuo 29 May 2023 23:50 UTC
        −8 points
        3
        Parent
        I don’t get this, it seems your exclusively referencing another LW user’s personal opinions?
        I’ve never heard of this ‘Andrew_Critch’ or any of his writings before today, nor do they appear that popular, so I’m quite baffled.
        Daniel Kokotajlo 30 May 2023 16:36 UTC
        15 points
        1
        Parent
        Here’s where I think the conversation went off the rails. :( I think what happened is M.Y.Zuo’s bullshit/woo detector went off, and they started asking pointed questions about the credentials of Critch and his ideas. Vlad and LW more generally react allergically to arguments from authority/status, so downvoted M.Y.Zuo for making this about Critch’s authority instead of about the quality of his arguments.
        
        Personally I feel like this was all a tragic misunderstanding but I generally side with M.Y.Zuo here—I like Critch a lot as a person & I think he’s really smart, but his ideas here are far from rigorous clear argumentation as far as I can tell (I’ve read them all and still came away confused, which of course could be my fault, but still...) so I think M.Y.Zuo’s bullshit/woo detector was well-functioning.
        
        That said, I’d advise M.Y.Zuo to instead say something like “Hmm, a brief skim of those posts leaves me confused and skeptical, and a brief google makes it seem like this is just Critch’s opinion rather than something I should trust on authority. Got any better arguments to show me? If not, cool, we can part ways in peace having different opinions.”
        Raemon 30 May 2023 17:55 UTC
        18 points
        3
        Parent
        [edit]
        I appreciate the attempt at diagnosing what went wrong here. I agree this is ~where it went off the rails, and I think you are (maybe?) correctly describing what was going on from M.Y. Zou’s perspective. But this doesn’t feel like it captured what I found frustrating.
        [/edit]
        What feels wrong to me about this is that, for the question of:
        How would one arrive at a value system that supports the latter but rejects the former?
        it just doesn’t make sense to me to be that worried about either authority or rigor. I think the nonrigorous concept, generally held in society of “respect people’s boundaries/autonomy” is sufficient to answer the question, without even linking to Critch’s sequence. Critch’s sequence is a nice-to-have that sketches out a direction for how you might formalize this, but I don’t get why this level of formalization is even particularly desired here.
        (Like, last I checked we don’t have any rigorous conceptions of functioning human value systems that actually work, either for respecting boundaries or aggregating utility or anything else. For purposes of this conversation this just feels like an isolated demand for rigor)
        Richard_Ngo 31 May 2023 10:54 UTC
        8 points
        4
        Parent
        I think that there are many answers along these lines (like “I’m not talking about a whole value system, I’m talking about a deontological constraint”) which would have been fine here.
        The issue was that sentences like “It’s a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer)” use the phrasing of someone pointing to a well-known, clearly-defined concept, but then only link to Critch’s high-level metaphor.
        Raemon 31 May 2023 20:59 UTC
        2 points
        0
        Parent
        Okay, I get where you’re coming from now. Will have to mull over whether I agree but I am at least no longer feel confused about what the disagreement is about now.
        Raemon 30 May 2023 22:47 UTC
        4 points
        0
        Parent
        (updated the previous comment with some clearer context-setting)
        Daniel Kokotajlo 31 May 2023 13:45 UTC
        2 points
        0
        Parent
        Thanks, & thanks for putting in your own perspective here. I sympathize with that too; fwiw Vladimir_Nesov’s answer would have satisfied me, because I am sufficiently familiar with what the terms mean. But for someone new to those terms, they are just unexplained jargon, with links to lots of lengthy but difficult to understand writing. (I agree with Richard’s comment nearby). Like, I don’t think Vladimir did anything wrong by giving a jargon-heavy, links-heavy answer instead of saying something like “It may be hard to construct a utility function that supports the latter but rejects the former, but if instead of utility maximization we are doing something like utility-maximization-subject-to-deontological-constraints, it’s easy: just have a constraint that you shouldn’t harm sentient beings. This constraint doesn’t require you to produce more sentient beings, or squeeze existing ones into optimized shapes.” But I predict that this blowup wouldn’t have happened if he had instead said that.
        
        I may be misinterpreting things of course, wading in here thinking I can grok what either side was thinking. Open to being corrected!
        Raemon 31 May 2023 20:34 UTC
        4 points
        0
        Parent
        To be clear I super appreciate you stepping in and trying to see where people were coming from (I think ideally I’d have been doing a better job with that in the first place, but it was kinda hard to do so from inside the conversation)
        I found Richard’s explanation about what-was-up-with-Vlad’s comment to be helpful.
        M. Y. Zuo 30 May 2023 17:53 UTC
        0 points
        −2
        Parent
        Thanks for the insight. After looking into ’Vladimir_Nesov’s background I would tend to agree it was because of some issue with the phrasing of the parent comment that triggered the increasingly odd replies, instead of any substantive confusion.
        At the time I gave him the benefit of the doubt in confusing what SEP is, what referencing an entry in encyclopedias mean, what I wanted to convey, etc., but considering there are 1505 seemingly coherent wiki contributions to the account’s credit since 2009, these pretty common usages should not have been difficult to understand.
        To be fair, I didn’t consider his possible emotional states nor how my phrasing might be construed as being an attack on his beliefs. Perhaps I’m too used to the more formal STEM culture instead of this new culture that appears to be developing.
        Raemon 30 May 2023 1:24 UTC
        4 points
        1
        Parent
        I don’t get this, it seems your exclusively referencing another LW user’s personal opinions?
        I’d describe this as “Critch listed a bunch of arguments, and the arguments are compelling.”
        M. Y. Zuo 30 May 2023 1:29 UTC
        5 points
        9
        Parent
        I’m genuinely not seeing any linked or attached proofs for these arguments, whether logical, statistical, mathematical, etc.
        EDIT: Can you link or quote to what you believe is a credible argument?
        Raemon 30 May 2023 4:00 UTC
        6 points
        4
        Parent
        I think upon reflection I maybe agree that there isn’t exactly an “argument” here – I think most of what Critch is doing is saying “here is a frame of how to think about a lot of game theoretic stuff.” He doesn’t (much) argue for that frame, but he lays out how it works, shows a bunch of examples, and basically is hoping (at this point) that the examples resonate.”
        (I haven’t reread the whole sequence in detail but that was actually my recollection of it last time I read it)
        So, I’ll retract my particular phrasing here.
        I do think that intuitively, boundaries exist, and as soon as they are pointed out as a frame that’d be good to formalize and incorporate into game/decision theory, I’m like “oh, yeah obviously.” I don’t know how much I think lawful-neutral aliens would automatically respect boundaries, but I would be highly surprised if they didn’t at least include them as a term to be considered as they developed their coordination theories.
        Your original comment said “How would one arrive at a value system that supports the latter but rejects the former?”, Vlad said (paraphrased) “by invoking boundaries as a concept”. If that doesn’t make sense to you, okay, but, while I agree Critch doesn’t quite argue for the concept’s applicability, I do think he lays out a bunch of concepts and how they could relate, and this should at least be an existence proof for “it is possible to develop a theory that accomplishes the “care about allowing the continued survival of existing things without wanting to create more.” And I still don’t think it makes sense to summarize this as a “personal opinion.” It’s a framework, you can buy the framework or not.
        M. Y. Zuo 30 May 2023 4:36 UTC
        3 points
        −4
        Parent
        … Vlad said (paraphrased) “by invoking boundaries as a concept”. If that doesn’t make sense to you, okay, but, while I agree Critch doesn’t quite argue for the concept’s applicability, I do think he lays out a bunch of concepts and how they could relate, and this should at least be an existence proof for “it is possible to develop a theory that accomplishes the “care about allowing the continued survival of existing things without wanting to create more.”
        I appreciate the update. The actual meaning behind “invoking boundaries as a concept” is what I’m interested in, if that is the right paraphrase.
        If it made intuitive sense then the question wouldn’t have been asked, so your right that the concepts could relate but the crux is that this has not been proven to any degree. Thus, I’m still inclined to consider it a personal opinion.
        For the latter part, I don’t get the meaning, from what I understand there’s no such ‘should at least be an existence proof’.
        There’s ‘proven correct’, ‘proven incorrect’, ‘unproven’, ‘conjecture’, ‘hypothesis’, etc...
        Raemon 30 May 2023 7:35 UTC
        4 points
        0
        Parent
        Why do you need more than one description of such a value system in order to answer your original question? This isn’t about arguing the value system is ideal or that you should adopt it.
        And, like, respecting boundaries is a pretty mainstream concept lots of people care about.
        M. Y. Zuo 30 May 2023 12:11 UTC
        1 point
        0
        Parent
        Why do you need more than one description of such a value system in order to answer your original question?
        I don’t think I am asking for multiple descriptions of ‘such a value system’.
        What value system are you referring to and where does it appear I’m asking that?
        Also, I’m not quite sure how ‘respecting boundaries’ relates to this discussion, is it something to do with the idea of ‘invoking boundaries as a concept’?
        Vladimir_Nesov 30 May 2023 1:12 UTC
        3 points
        0
        Parent
        Research is full of instances of having nothing to go on but the argument itself, not even a reason to consider the argument.
        
        (Among Critch’s legible contributions is Parametric Bounded Löb, wrapping up one line of research in modal embedded agency. See also the recent paper on open source game theory institution design, which works as an introduction with grounding in the informal motivations behind the topic and its relevance to the real world.)
        M. Y. Zuo 30 May 2023 1:21 UTC
        1 point
        0
        Parent
        The work seems interesting but none of it makes an individual’s personal opinions a credible reference. If it was a group of folks with credible track records expressing a joint opinion in a conference, I’d be more willing to consider it, but literally a single individual just doesn’t make sense.
        Research is full of instances of having nothing to go on but the argument itself, not even a reason to consider the argument.
        I’m not sure how to parse this, the commonly accepted view is that research is based on experiments, observations, logical proofs, mathematical proofs, etc… do you not believe this?
        What links here?
        Vladimir_Nesov's comment on Sentience matters by So8res (30 May 2023 2:21 UTC; 5 points)
        Vladimir_Nesov 30 May 2023 1:27 UTC
        4 points
        −4
        Parent
        It’s not a “credible reference” in the sense of having behind it massive evidence of being probably worthwhile to study. But I in turn find the background demand for credible references (in their absence) baffling, both in principle and given that it’s not a constraint that non-mainstream research could survive under.
        Richard_Ngo 30 May 2023 1:36 UTC
        6 points
        5
        Parent
        I personally think it’s important to separate philosophical speculation from well-developed rigorous work, and Critch’s stuff on boundaries seems to land well in the former category.
        This is a communicative norm not an epistemic norm—you’re welcome to believe whatever you like about Critch’s stuff, but when you cite it as if it’s widely-understood (across the LW community, or elsewhere) to be a credible, well-developed idea, then this undermines our ability to convey the ideas that are widely-understood to be credible.
        Vladimir_Nesov 30 May 2023 1:48 UTC
        5 points
        −5
        Parent
        
        important to separate philosophical speculation from well-developed rigorous work
        
        Sure.
        
        when you cite it as if it’s widely-understood (across the LW community, or elsewhere) to be credible
        
        I don’t think I did though? My use of “reference” was merely in the sense of explaining the intended meaning of the word “boundary” I used in the top level comment, so it’s mostly about definitions and context of what I was saying. (I did assume that the reference would plausibly be understood, and I linked to a post on the topic right there in the original comment to gesture at the intended sense and context of the word. There’s also been a post on the meaning of this very word just yesterday.)
        
        And then M. Y. Zuo started talking about credibility, which still leaves me confused about what’s going on, despite some clarifying back and forth.
        M. Y. Zuo 30 May 2023 2:10 UTC
        3 points
        −2
        Parent
        And then M. Y. Zuo started talking about credibility, which still leaves me confused about what’s going on, despite some clarifying back and forth.
        A reference implies some associated credibility, as in the example found in comment #4:
        The Stanford Encyclopedia of Philosophy has no reference entry for “boundary concept” nor any string matches at all to “deontological agent” or “deontological agent design”.
        e.g. referencing entries in an encyclopedia, usually presumed to be authoritative to some degree, which grants some credibility to what’s written regarding the topic
        By the way, I’m not implying Andrew_Critch’s credibility is zero, but it’s certainly a lot lower then SEP, so much so that I think most LW readers, who likely haven’t heard of him, would sooner group his writings with random musings then SEP entries.
        Hence why I was surprised.
        Expand this thread
        Vladimir_Nesov 30 May 2023 2:21 UTC
        5 points
        1
        Parent
        Well, I’m pretty sure that’s not what the word means, but in any case that’s not what I meant by it, so that point isn’t relevant to any substantive disagreement, which does seem present; it’s best to taboo “reference” in this context.
        M. Y. Zuo 30 May 2023 2:33 UTC
        −3 points
        0
        Parent
        Well, I’m pretty sure that’s not what the word means,
        It appears you linked to tvtropes.org?
        I’m fairly certain the widely accepted definition of ‘reference’ encompasses the idea of referencing entries in an encyclopedia. So in this case I wouldn’t trust ‘TVTropes’ at all.
        Here’s Merriam-Webster:
        reference
        1 of 3
        noun
        ref·er·ence ˈre-fərn(t)s
        
        ˈre-f(ə-)rən(t)s
        1: the act of referring or consulting
        2: a bearing on a matter : RELATION
        in reference to your recent letter
        3: something that refers: such as
        a: ALLUSION, MENTION
        b: something (such as a sign or indication) that refers a reader or consulter to another source of information (such as a book or passage)
        c: consultation of sources of information
        4: one referred to or consulted: such as
        a: a person to whom inquiries as to character or ability can be made
        b: a statement of the qualifications of a person seeking employment or appointment given by someone familiar with the person
        c(1): a source of information (such as a book or passage) to which a reader or consulter is referred
        (2): a work (such as a dictionary or encyclopedia) containing useful facts or information
        TAG 30 May 2023 21:00 UTC
        0 points
        2
        Parent
        
        I personally think it’s important to separate philosophical speculation from well-developed rigorous work
        
        Yes, but of course Critch is the tip of a rather large iceberg. Rationalists tend to think you should familiarise yourself with a mass of ideas virtually none of which have been rigourously proven.
        [ ]
        [deleted]
        M. Y. Zuo 30 May 2023 1:36 UTC
        3 points
        2
        Parent
        But I in turn find the background demand for credible references (in their absence) baffling, both in principle and given that it’s not a constraint that non-mainstream research could survive under.
        The writings linked don’t exclude the possibility of ‘non-mainstream research’ having experiments, observations, logical proofs, mathematical proofs, etc...
        In fact the opposite, that happens every day on the internet, including on LW at least once a week.
        Did you intend to link to something else?
        What links here?
        Vladimir_Nesov's comment on Sentience matters by So8res (30 May 2023 2:21 UTC; 5 points)
        TAG 30 May 2023 20:56 UTC
        2 points
        0
        Parent
        Critch is a “local hero”...well known in rationalist circles.
        M. Y. Zuo 31 May 2023 3:10 UTC
        −5 points
        0
        Parent
        Huh, I would never have guessed that by looking at the karma his posts received on average. Guess that shows how misleading the karma score sometimes may be.
        TAG 31 May 2023 11:48 UTC
        2 points
        0
        Parent
        ? He has over 3000 karma.
        M. Y. Zuo 31 May 2023 16:22 UTC
        1 point
        0
        Parent
        ? He has over 3000 karma.
        I suggest to reread the first sentence.
        … on average.
        For example, if an account has 20 posts and 1000 post karma, that’s still only an average of 50 per post, which would indicate the account holder is not that well known.
  - Mikhail Samin 29 May 2023 22:56 UTC
    5 points
    0
    Parent
    If you were more like the person you wish to be, and you were smarter, do you think you’d still want our descendants not to optimise when needed to leave alone beings who’d prefer to be left alone? If you would still think that, why is it not CEV?
    - Vladimir_Nesov 29 May 2023 23:03 UTC
      2 points
      0
      Parent
      It’s probably implied by CEV. The point is that you don’t need the whole CEV to get it, it’s probably easier to get, a simpler concept and a larger alignment target that might be sufficient to at least notkilleveryone, even if in the end we lose most of the universe. Also, you gain the opportunity to work on CEV and eventually get there, even if you have many OOMs less resources to work with. It would of course be better to get CEV before building ASIs with different values or going on a long value drift trip ourselves.
      - Seth Herd 30 May 2023 17:23 UTC
        3 points
        0
        Parent
        I’d suggest that long-term corrigibility is a still easier target. If respecting future sentients’ preferences is the goal, why not make that the alignment target?
        
        While boundaries are a coherent idea, imposing them in our alignment solutions would seem to very much be dictating the future rather than letting it unfold with protection from benevolent ASI.
        Vladimir_Nesov 30 May 2023 20:13 UTC
        2 points
        0
        Parent
        In an easy world, boundaries are neutral, because you can set up corrigibility on the other side to eventually get aligned optimization there. The utility of boundaries is for worlds where we get values alignment or corrigibility wrong, and most of the universe eventually gets optimized in at least somewhat misaligned way.
        
        Slight misalignment concern also makes personal boundaries in this sense an important thing to set up first, before any meaningful optimization changes people, as people are different from each other and initial optimization pressure might be less than maximally nuanced.
        
        So it’s complementary and I suspect it’s a shard of human values that’s significantly easier to instill in this different-than-values role than either the whole thing or corrigibility towards it.

M. Y. Zuo comments on Sentience matters

reference

noun