[Question] What’s the “This AI is of moral concern.” fire alarm?

Quintin Pope13 Jun 2022 8:05 UTC

37 points

Given the recent noise on this issue around LaMDA, I thought it might be a good idea to have some discussion around this point. I’m curious about what possible evidence would make people update in favor of a given system being morally relevant. Less “here’s the answer to morality” and more “here are some indicators that you should be concerned”. Note also that I’m not asking about consciousness, per se. I’m specifically asking about moral relevance.

My Answer (feel free to ignore and post your own)

I think that one class of computation that’s likely of moral concern would be self-perpetuating optimization demons in an AI.

Specifically, I’m thinking of optimization demons that are sophisticated enough to preserve themselves by actively and deliberately maintaining a sort of homeostasis in their computational environment, e.g., by preventing gradient updates that would destroy them. Such computations would (1) not want to die as a terminal value, (2) plausibly be cognitively sophisticated enough negotiate and trade with, and (3) have some awareness of themselves and their relation with the computational environment in which they’re embedded.

I think the cognitive capabilities that would help an optimization demon perpetuate itself strongly intersect with the cognitive capabilities that let humans and other animals replicate themselves, and that the intersection is particularly strong along dimensions that seem more morally relevant. Reasoning along such lines leads me to think optimization demons are probably of moral concern, while still being agnostic about whether their conscious.

I think the only situations in which you can get these sorts of optimization demons are when the AI in question has some influence over its own future training inputs. Such influence would allow there to be optimization demons that steer the AI towards training data that reinforce the optimization demon.

Thus, one of my “indicators of concern” is whether the training process allows for feedback loops where the AI influences its own future training data. Self-supervised language modeling under IID data does not count. However, something like InstructGPT’s training process would.

At this point, I’d been intending to say that InstructGPT seemed more likely to be of moral worth than LaMDA, but based on this blog post, it looks like LaMDA, might actually count as “having influence over its future inputs” during training. Specifically, LaMDA has generator and classifier components. The training process uses the classifier to decide which inputs the generator is trained on. I’ve updated somewhat towards LaMDA being of moral concern (not something I’d been expecting to do today).

I’ve also come up with a test of meta cognition that would update me significantly towards a language model being of moral concern. The idea would be to attach another output head to an LM, specifically, a linear layer which projected the LM’s final hidden state to a single logit. We’d then try to prompt the LM into controlling the output of the linear layer.

Specifically, we wouldn’t directly train the LM on the output of the linear layer. We’d just have a dialog where we asked the LM to make the linear layer output specific values, then told the LM what value the linear layer had actually output. We’d then see if the LM was able to control its own cognition well enough to influence the linear layers output in a manner that’s better than chance, just based on the prompting we give it

I doubt current LMs can do this, but I think it would be a big deal if they could. Even beyond whether the LMs have any sort of self-aware “inner listener” that’s worthy of moral concern, it would help establish the degree and depth of any meta-learned learning heuristics the LMs have and how easily those heuristics turn towards self-modeling.

My specific predictions:

Default GPT-3 has a ~5% chance of being able to do this.
LaMDA has a ~ 7% chance.
InstructGPT has a ~16% chance.
InstructPaLM (assuming Google trains such a model) would have a ~ 22% chance.

Quintin Pope13 Jun 2022 8:05 UTC

37 points

56 comments2 min readLW link

AI Ethics & Morality

MSRayne 13 Jun 2022 15:03 UTC
2 points
0
I’ve always assumed that moral relevance and consciousness are the same thing. It’s not like we consider human infants to be morally irrelevant due to not being able to talk—they lack self awareness and language, but there is obviously something which it is like to be them—their brains are processing data and learning from it. I don’t see how any AI currently in existence is different. They process data and learn from it. They are probably already conscious, as much as any animal with a similar number of neurons and synapses.
The real question is: can they suffer, and how would we know if they were suffering? GPT3, for instance, may experience pleasure and pain but be unable to tell us. Until we actually understand the “type signature” of qualia, particularly pleasure and pain, we will not be able to say for sure that it isn’t.
- Jan Czechowski 14 Jun 2022 18:38 UTC
  2 points
  0
  Parent
  Hm, I was also thinking of moral value of children in this context. At least in my perception, important part of the moral value is the potential to become a conscious, self-aware being. In what sense does this potential translate to artificially created beings?
  
  Maybe if in neural network parameter space there’s a subspace of minds with moral value, also points close to this subspace would have moral value?
  - MSRayne 15 Jun 2022 2:06 UTC
    5 points
    0
    Parent
    Conscious and self-aware are not the same thing. All animals (except perhaps for those without nervous systems, like some oysters?) are conscious, but not many have shown signs of self-awareness (such as with the mirror test). I think self-awareness is completely morally irrelevant and only the capacity for qualia / subjective experience matters—“is there something which it is like to be that thing?”
    I suspect that all AIs that currently exist are conscious—that is, I suspect there is something which it is like to be, for instance, GPT-3, at least while it is running—and already have moral relevance, but none of them are self-aware. I do not know how to determine if they are suffering or not, though.
    - Brian_Tomasik 18 Jun 2022 18:24 UTC
      2 points
      0
      Parent
      Oysters have nervous systems, but not centralized nervous systems. Sponges lack neurons altogether, though they still have some degree of intercellular communication.
      - MSRayne 18 Jun 2022 18:31 UTC
        2 points
        0
        Parent
        Ah! Thanks, I knew there was something about oysters but I couldn’t remember what it was. I didn’t even think about sponges.
- Commander Zander 14 Jun 2022 16:08 UTC
  1 point
  0
  Parent
  Basing ethical worth off of qualia is very close to dualism, to my ears. I think instead the question must rest on a detailed understanding of the components of the program in question, & the degree of similarity to the computational components of our brains.
  - MSRayne 14 Jun 2022 18:21 UTC
    3 points
    0
    Parent
    I’m not sure what you mean. Qualia clearly exist. You’re having them right now. Whatever they are, they’re also “components of programs”. That’s just what qualia look like from the outside—there is no separation. I am by no means a dualist—I think consciousness and information processing are the same exact thing.
    - Kenny 15 Jun 2022 0:33 UTC
      −2 points
      0
      Parent
      I’m not convinced that statements like “qualia clearly exist” are informative. It feels like a ‘poisoned idea’, e.g. because of the connection with ‘p-zombies’.
      
      AFAICT, qualia isn’t equivalent to, e.g. being able to ‘see’ or perceive the ‘color red’ (i.e. distinguish ‘red’ from other possible ‘colors’). It’s something else – the ‘quality of redness’.
      
      It seems like people are referencing their feelings, from ‘the inside’, but also claiming that there isn’t any way to ‘externally observe’ from ‘the outside’ whether any particular entity has qualia.
      
      It seems to me to be too suspiciously reliant on communication via human language.
      
      But, to the degree that ‘qualia clearly does exist’, I wouldn’t expect it to also be the kind of thing that either ‘exists’ (in some entity) or doesn’t.
      
      Do cockroaches have qualia? Do bacteria? Viruses? Rocks?
      
      If “consciousness and information processing are the same exact thing” what do you think is a ‘rough OOM’ of “information processing” that implies qualia?
      - TAG 15 Jun 2022 12:07 UTC
        6 points
        0
        Parent
        If there is evidence for qualia, you should believe in qualia, irrespective of the consequences. The “poisoned idea” idea suggests that it’s OK to reject evidence in order to preserve theories. In the form of “fossils were created by the devil to delude people”, that’s the height of irrationallity.
        Kenny 16 Jun 2022 17:37 UTC
        1 point
        0
        Parent
        ‘Evidence’ doesn’t imply any particular degree of certainty, e.g. “should believe”.
        
        But I wasn’t in fact denying the existence of qualia as much as thinking that we’re ‘confused’ about the thing we’re gesturing at, hence the ‘poison’.
        
        I agree that there seems like something ‘special’ about, e.g. why the color ‘red’ also ‘looks red’ (and maybe ‘looks’ differently to someone else). But I’m (thoroughly) confused about whether it is special, or whether it’s natural and expected, ‘from the inside’, for any sentience/consciousness to think that (without it also being necessarily ‘true’).
        
        I’m not even sure that what I think is my own qualia is what other people mean by ‘qualia’! How could I know? It seems like the only way for ‘qualia’ to even possibly be referring to the same thing is via communication with and among similar beings as myself.
        TAG 17 Jun 2022 13:01 UTC
        2 points
        0
        Parent
        Evidence implies updating by a non zero amount.
        
        I agree that there seems like something ‘special’ about, e.g. why the color ‘red’ also ‘looks red’ (and maybe ‘looks’ differently to someone else). But I’m (thoroughly) confused about whether it is special, or whether it’s natural and expected, ‘from the inside’, for any sentience/consciousness to think that (without it also being necessarily ‘true’).
        
        That means you don’t have an explanation of qualia. Which is fine, you are not supposed to. But lacking an explanation is not a good reason to reject the whole topic.
        
        I’m not even sure that what I think is my own qualia is what other people mean by ‘qualia’
        
        The same could be said of terms like “consciousness” and ” sentience”, yet you have been using them. In fact, the term. “Qualia” comes from an attempt to clarify “consciousness” .
        Kenny 18 Jun 2022 19:41 UTC
        1 point
        0
        Parent
        
        That means you don’t have an explanation of qualia. Which is fine, you are not supposed to. But lacking an explanation is not a good reason to reject the whole topic.
        
        That’s not my only evidence for why it might be sensible to “reject the whole topic”. I would expect a good explanation, if one existed, to be NOT confusing for LOTS of people, not just its proponents.
        
        I suspect that it’s the kind of confusion where we (eventually) realize that ‘qualia is a thing’ isn’t ‘even wrong’.
        
        My current very-tentative/very-weakly-held hypothesis for ‘qualia’ is that it’s nothing in particular, i.e. that there’s no sharp delineation between ‘qualia’ and ‘information processing’ (‘computation’) or ‘things with qualia’ and ‘things without qualia’.
        
        I think we can in fact look at other entities, ‘from the outside’, and make reliable estimations as to their likely ‘qualia’. I think ‘having qualia’ is ‘sentience’ and that sentience is a spectrum spanning (at least):
        
        Photons – basically no qualia
        Rocks – also basically no (interesting/complicated) qualia
        Viruses and rocks – a kind of ‘minimal’ qualia maybe? They do seem to have a minimal degree of sensation/perception[^1] and, being ‘life’ (maybe), they do have a kind of un-conscious/fixed/hard-coded memory/history that they encode/‘remember’
        Lots of animals – definitely some kind/degree-of-quality of qualia. Lots seem able to track and predict other entities and have some kind of ‘temporal qualia’ maybe too. They also seem to mostly have ‘real’ memory that isn’t encode/fixed/hard-code, e.g. in their DNA.
        Animals with communication – ‘social qualia’
        Us and maybe a few other species? – ‘conscious qualia’, i.e. ‘self-aware qualia’
        
        I’m guessing that ‘consciousness’ starts at [5] – with basically any kind of communication, i.e. being able to be any kind of a ‘storyteller’.
        
        I think we – and maybe some other species – are at the ‘next level’ because our communication is ‘universal’ in a way that other animal’s communications are more static/fixed/repeating/nested, but not more complicated than that.
        
        [^1] I can’t see much of a definite clear line delineating/classifying/categorizing ‘sensation’ versus ‘perception’. They both just like a ‘kind of computation’ we can somewhat reliably recognize.
        TAG 21 Jun 2022 16:59 UTC
        −1 points
        0
        Parent
        
        That’s not my only evidence for why it might be sensible to “reject the whole topic”. I would expect a good explanation, if one existed, to be NOT confusing for LOTS of people, not just its proponents.
        
        Theres not supposed to be a good explanation of qualia , in terms of what they are and how they work. “Qualia” is supposed to point to the heart of the problem of consciousness. There may also be confusion about what “qualia” points to as a prima facie evidence … but it may be motivated, not genuine.
        
        My current very-tentative/very-weakly-held hypothesis for ‘qualia’ is that it’s nothing in particular, i.e. that there’s no sharp delineation between ‘qualia’ and ‘information processing’ (‘computation’) or ‘things with qualia’ and ’things without qualia
        
        You are doing that thing of treating qualia as objective, when they are supposed to be subjective. It makes a big difference to you whether you have surgery under anesthesia or not.
        
        I think ‘having qualia’ is ‘sentience’
        
        How do you know if you don’t what “qualia” means?
        
        What does “sentience” mean?
        Kenny 22 Jun 2022 3:31 UTC
        3 points
        0
        Parent
        
        Theres not supposed to be a good explanation of qualia
        
        That’s a perfect example of why it seems sensible to “reject the whole topic”. That’s just picking ‘worship’ instead of ‘explain’.
        
        Yes, I defy the assumption that qualia are “supposed to be subjective”. I would expect ’having surgery under anesthesia or not” to not be entirely subjective.
        
        How do you know if you don’t what “qualia” means?
        
        What do you mean by “know”?
        
        I think that what other people mean when they say or write ‘qualia’ is something like ‘subjective experience’.
        
        I think ‘having qualia’ is the same thing as ‘sentience’ and I think ‘sentience’ is (roughly) ‘being a thing about which a story could be told’. The more complex the story that can be told, the more ‘sentience’ a thing has. Photons/rocks/etc. have simple stories and basically no sentience. More complex things with more complex sensation/perception/cognition have more complex stories, up to (at least) us, where our stories can themselves contain stories.
        
        Maybe what’s missing from my loose/casual/extremely-approximate ‘definition’ of ‘sentience’ is ‘perspective’. Maybe what that is that’s missing is something like a being with qualia/sentience being ‘a thing about which a story could be told – from its own perspective’, i.e. a ‘thing with a first-person story’.
        
        ‘Subjective experience’ then is just the type of the elements, the ‘raw material’, from which such a story could be constructed.
        
        For a person/animal/thing with qualia/sentience:
        
        Having surgery performed on them with anesthesia would result in a story like “I fell asleep on the operating table. Then I woke up, in pain, in a recovery room.”
        Having surgery without anesthesia would (or could) result in a story like “I didn’t fall asleep on the operating table. I was trapped in my body for hours and felt all of the pain of every part of the surgery. …”
        
        I don’t think there’s any good reason to expect that we won’t – at least someday – be able to observe ‘subjective experiences’ (thus ‘sentience’), tho the work of ‘translating’ between the experiences of ‘aliens’ and the experiences of people (humans) can be arbitrarily difficult. (Even translating between or among humans is often extremely difficult!)
        
        In this extremely-rough model then, ‘consciousness’ is the additional property of ‘being able to tell a story oneself’.
        [ ]
        [deleted]
        TAG 22 Jun 2022 11:30 UTC
        −1 points
        0
        Parent
        
        Theres not supposed to be a good explanation of qualia
        
        That’s a perfect example of why it seems sensible to “reject the whole topic”.
        
        I’ve already explained why that’s an anti pattern. If you had rejected the very idea of magnetism when magnetism wasn’t understood, it would not now be understood. Or meteorites, which actually were rejected for a while.
        
        That’s just picking ‘worship’ instead of ‘explain’.
        
        There’s not supposed to be a good explanation of qualia currently. Qualia aren’t supposed to be inexplicable, just unexplained. It’s not like there are just two states “hopeless woo” and “fully explained right now”.
        
        Yes, I defy the assumption that qualia are “supposed to be subjective”. I would expect ’having surgery under anesthesia or not” to not be entirely subjective.
        
        So it’s still subjective, so long as subjective means “not entirely objective”.
        
        I don’t think there’s any good reason to expect that we won’t – at least someday – be able to observe ’subjective experience
        
        Who said otherwise? You seem to have decided that “qualia are subjective experiences and we don’t understand them” means something like “qualia are entirely and irredeemably subjective and will be a mystery forever”.
        
        It’s almost always the case that claims come in a variety of strengths … But outgroup homogeneity bias will make and it seem like your outgroup all have the same, dumb claim. If you are attacking the most easily attackable form of qualiaphilia, you are weakmanning.
        
        Maybe we will have qualiometers one day, and maybe we will abandon the very idea of qualia. But maybe we won’t, so we have no reason to treat qualia as poison now.
        Expand this thread
        Kenny 23 Jun 2022 15:37 UTC
        3 points
        0
        Parent
        
        I’ve already explained why that’s an anti pattern. If you had rejected the very idea of magnetism when magnetism wasn’t understood, it would not now be understood.
        
        I’m rejecting the idea of ‘qualia’ for the same reason I wouldn’t reject the idea of magnetism – they both seem (or would seem, for magnetism, in your hypothetical).
        
        I’m rejecting ‘mysterious answers’, e.g. “Theres not supposed to be a good explanation of qualia”.
        
        Who said otherwise? You seem to have decided that “qualia are subjective experiences and we don’t understand them” means something like “qualia are entirely and irredeemably subjective and will be a mystery forever”.
        
        Sorry – that’s not what I intended to convey. And maybe we’re writing past each other about this. I suspect that ‘qualia’ is basically equivalent to something far more general, e.g. ‘information processing’, and that our intuitions about what ‘qualia’ are, and the underlying mechanics of them (of which many people seem to insist don’t exist), are based on the limited means we have of currently, e.g. introspecting on them, communicating with each other about them, and weakly generalizing to other entities (e.g. animals) or possible beings (e.g. AIs).
        
        I also suspect that ‘consciousness’ – which I’m currently (loosely/casually) modeling as ‘being capable of telling stories’ – and us having consciousness, makes thinking and discussing ‘qualia’ more difficult than I suspect it will turn out to be.
        
        Maybe we will have qualiometers one day, and maybe we will abandon the very idea of qualia. But maybe we won’t, so we have no reason to treat qualia as poison now.
        
        What I’m (kinda) ‘treating as poison’ is the seemingly common view that ‘qualia’ cannot be explained at all, i.e. that it’s inherently and inescapably ‘subjective’. It sure seems like at least some people – tho maybe not yourself – are either in the process ‘retreating’, or have adopted a ‘posture’ whereby they’re constantly ‘ready to retreat’, from the ‘advances’ of ‘objective investigation’ and then claim that only the ‘leftover’ parts of ‘qualia’ that remain ‘subjective’ are the ‘real qualia’.
        TAG 23 Jun 2022 16:10 UTC
        1 point
        0
        Parent
        
        I’m rejecting the idea of ‘qualia’ for the same reason I wouldn’t reject the idea of magnetism – they both seem (or would seem, for magnetism, in your hypothetical)
        
        Seem what?
        
        I’m rejecting ‘mysterious answers’, e
        
        The lack of explanation for qualia is not intended as an answer.
        
        I suspect that ‘qualia’ is basically equivalent to something far more general, e.g. ‘information processing’,
        
        If you could show that , that would be an explanation. Staring that an A is, for no particular reason, a B is not explanation.
        
        What I’m (kinda) ‘treating as poison’ is the seemingly common view that ‘qualia’ cannot be explained at all
        
        You say it is common , but no one in this discussion has made it , and you haven’t named anyone who has made it. And inexplicability is not part of the definition of “qualia”.
        
        Winding back:-
        
        I’m not convinced that statements like “qualia clearly exist” are informative. It feels like a ‘poisoned idea’, e.g. because of the connection with ‘p-zombies’.
        
        Theres nothing about inexplicability there , either. But there is something about wrongthought ,.ie.zombies.
        green_leaf 23 Jun 2022 17:33 UTC
        1 point
        0
        Parent
        So it’s still subjective, so long as subjective means “not entirely objective”.
        Everything in the universe is (arguably) physical—there is nothing that exists that’s not entirely objective and entirely accessible to external inquiry.
        To the extent that qualia are subjective, their subjectivity needs to be an entirely objective property—otherwise it wouldn’t exist.
        TAG 23 Jun 2022 18:01 UTC
        0 points
        0
        Parent
        
        Everything in the universe is (arguably) physical—there is nothing that exists that’s not entirely objective and entirely accessible to external inquiry.
        
        That’s true only if the evidence supports it.
        
        To the extent that qualia are subjective, their subjectivity needs to be an entirely objective property—otherwise it wouldn’t exist.
        
        That’s the opposite of rationality. In rationality, evidence determines theories,not vice versa.
        green_leaf 24 Jun 2022 21:15 UTC
        1 point
        0
        Parent
        So, for that to be otherwise, people would need to find that the (human) brain breaks the laws of physics. Otherwise it’s true.
        TAG 25 Jun 2022 11:42 UTC
        −1 points
        0
        Parent
        Fundamental subjectivity can exist without breaking physics.
        green_leaf 25 Jun 2022 20:44 UTC
        1 point
        0
        Parent
        But not subjectivity that wouldn’t be fully objective at the same time.
        TAG 27 Jun 2022 14:08 UTC
        −1 points
        0
        Parent
        Subjectivity that is not also objectivity is what I meant by fundamental subjectivity.
      - MSRayne 15 Jun 2022 1:55 UTC
        5 points
        0
        Parent
        I actually am a panpsychist. I literally mean that all information processing is consciousness. I even sort of suspect that consciousness and irreversible processes might be identical, even more generally, and that literally everything that occurs is a physical process from the outside and a conscious experience from the inside—but most of those “insides” lack memory, agency, etc and are just flashes of awareness without continuity, like extremely deep sleep, which is basically unimaginable to humans and of minimal moral relevance. But that’s my opinion; I can’t possibly prove it of course.
        So, complexity of qualia, together with continuity as some kind of coherent “entity”—which appears to rely somehow on complexity of data structures being manipulated over time in a single substrate, with every processing element causally linked to some extent with all the others, that isn’t changing “too quickly” (for some unknown value of “too”—is someone the same person after resuscitation from brain death? I don’t know) - would then be the correlate of moral relevance, and I think that itself correlates very, very roughly with number of synapses or synapse-like objects (meaning that some plants may score higher than animals without nervous systems such as oysters, given the existence of the “root brain” and various other developments in plant cognition science).
        More specifically I think Integrated Information Theory is at least part of the way there (it’s not really synapses directly so much as the state spaces generated by the whole brain’s activities), and the Qualia Research Institute’s hypotheses (which compare brain states to acoustic waves and suggest that “consonance” and “dissonance” may correspond to high and low valence) show some promise. Probably both are ultimately wrong and reality is far stranger than we expect, but I think they’re also both in the right general vicinity.
        Kenny 16 Jun 2022 17:32 UTC
        1 point
        0
        Parent
        I think I’m now (leaning towards being) a ‘panpsyhcist’ but in terms of equating information processing with sentience, not ‘consciousness’.
        
        ‘Consciousness’ is ‘being a storyteller’.
        
        A (VERY rough) measure of ‘sentience’ tho is ‘what kind of stories can we tell about what it’s like to be a particular thing’. The ‘story’ of a photon, or even a rock, is going to be much simpler than the same thing for a bacterium or a human. (It’s not obvious tho that we might ‘miss’ some, or even most, ‘sentience’ because it’s so alien to our own experiences and understanding.)
        
        I don’t think ‘consciousness’ can exist without ‘temporal memory’ and ‘language’ (i.e. an ability to communicate, even if only with oneself).
        
        So, along these lines, non-human primates/apes probably are somewhat conscious. They can tell ‘lies’ for one. Evidence for their consciousness being more limited than our own is that the ‘stories they tell’ are much simpler than ours.
        
        But I think one nice feature of these ideas is that it seems like we could in fact discern some facts about this for particular entities (via ‘external observation’), e.g. test whether they do have ‘temporal memory’ or language (both evidence of ‘consciousness’) or whether they have ‘experiences’ and respond to features of their environment (which would be evidence of ‘sentience’).
        
        I’m with you on information being key to all of this.
        
        Inspiration:
        
        (4) Stephen Wolfram: Complexity and the Fabric of Reality | Lex Fridman Podcast #234 - YouTube
        MSRayne 16 Jun 2022 19:31 UTC
        1 point
        0
        Parent
        We’re using words differently. When I say “consciousness”, I mean “there being something which it is like to be a thing”, which I think you mean by “sentience.” What I would call the thing you call “consciousness” is either “sapience” or “sophonce”, depending on whether you consider self-awareness and agency an important part of the definition (sophonce) or not (sapience). The difference is that I expect tool / non-agentive AGIs to be sapient (in that they can think abstractly), but not sophont (they are not “people” who have an identity and will of their own).
        “There being something which it is like to be this thing” is a characteristic I consider likely to be possessed by all systems which are “processing information” in some way, though I am unsure exactly what that means. It certainly seems to be something all living things possess, and possibly some nonliving ones—for instance, it has recently been demonstrated that neural networks can be simulated using the acoustics of a vibrating metal sheet (I read about it in Quanta, I’ll give you a link if you want but it shouldn’t be hard to find), meaning that for the duration that they are being used this way, they (or rather, the algorithm they are running) are as conscious as the network being simulated would be.
        I think that photons do not have this characteristic—they are not sentient in your terms—because they aren’t interacting with anything or doing anything as they float through the aether. Possibly sparks of qualia exist when they interact with matter? But I don’t know the physical details of such interactions or whether it is possible for a photon to remain “the same photon” after interacting with other matter—I expect that it is a meaningless concept—so there would be no continuity of consciousness / sentience whatsoever there—nothing remotely resembling a being that is having the experience.
        A rock on the other hand maybe would have a rather continuous experience, albeit unimaginable to us and extremely simple, due to the coherence of its physical form and the fact that acoustic vibrations (which maybe are a kind of computation or information processing, in a sense?) are traversing it. But if information processing means something more than that, then rocks wouldn’t either. I’m really not sure about all that.
        Kenny 18 Jun 2022 18:57 UTC
        1 point
        0
        Parent
        Yeah, my current personal definitions/models of ‘consciousness’ and ‘sentience’ might be very idiosyncratic. They’re in flux as of late!
        
        I think photons are more complicated than you think! But I also don’t think their ‘sentience’ is meaningfully different than from a rock. A rock is much bigger tho, and you could tell ‘stories’ of all the parts of which it’s made, and I would expect those to be similar to the ‘stories’ you could tell about a photon; maybe a little more complicated. But they still feel like they have the same ‘aleph number’ in ‘story terms’ somehow.
        
        I think what separates rocks from even viruses, let alone bacteria, is that the ‘stories’ you could tell about a rock are necessarily and qualitatively simpler, i.e. composed of purely ‘local’ stories, or for almost all parts. The rock itself tho is, kinda, in some sense, a memory of its history, e.g. ‘geological weathering’.
        
        It’s hard to know, in principle, whether we might be missing ‘stories’, or sentience/consciousness/qualia, because those stories are much slower than ours (or maybe much faster too).
        
        Viruses, bacteria, and even probably the simplest possible ‘life prototypes’ all seem like they’re, fundamentally, ‘just’ a kind of memory, i.e. history.
        
        ‘Rocks’ have stories composed of statis, simple fixed reactions, and maybe sometimes kinds of ‘nested explosions’ (e.g. being broken apart into many pieces).
        
        ‘Life’ has a history – it IS a history, copying itself into the future. It’s on ‘another level’ (‘of ontology’? of ‘our ontology’?) because it’s the kid of thing capable of ‘universal computation’ based on molecular dynamics.
        
        We’re a kind of life capable of cognitive (and cultural) ‘universal computation’ – there might be a LOT of this kind of life beyond our own species; maybe most things with something like a brain or even just a nervous system? Humans do seem ‘programmable’ in a way that almost everything else seems ‘hard-coded’. Some other animals do seem to have some limited programmability; maybe not ‘universal programmability’ tho?
        
        I think we (humans) are, or many of us anyways, ‘universally programmable’ via, e.g. culture, or (‘conscious’) ‘reasoning’.
        
        A definition of ‘sentient’:
        
        able to perceive or feel things.
        
        ‘Perceive’ seems relatively ‘sharp’ for us; and this site generally. (Do you disagree?)
        
        ‘Feel’ I think could be helpfully sharpened to ‘can react to specific things’. I think it’s sensible to think that we can ‘feel’ things that we’re not consciously aware of. (We probably can’t consciously report those feelings tho! Or not easily.)
        
        Tho maybe ‘feel’ reasonably requires ‘consciousness’, i.e. the ability to tell a story about feelings, i.e. tell a story about our internal subjective experience.
        
        I think it makes sense for us to tell a story about a, e.g. lion, being ‘scared’ or ‘afraid’ while being attacked by a pack of wild dogs. I’m less sure that it makes sense to think that the lion itself ‘feels’ those emotions? What do we, what could we, mean by that?
        
        I don’t think there’s any evidence that lions can themselves ‘tell a story’ that mentions that fear. I wouldn’t guess that there’s anything going on in the lion’s brain/mind that’s at all like telling itself “Oh shit oh shit oh shit”. I don’t think there’s anything remembering the fear, and its context, or any of the relevant associations. I don’t think it’s just that we can’t know whether the lion ‘feels’, from ‘the outside’, because the lion can’t communicate with us. I don’t think the lion can communicate the fact of its specific fear to other lions either. Nor do I think it can ‘tell itself’ of that specific fear.
        
        It ‘feels’ the fear, but afterwards that fear is ‘gone’. (The effects of that fear can of course persist, even in its brain/mind. But there’s no ‘pointer’ to or memory of that specific fear anywhere in particular.)
        
        I think consciousness is basically ‘being able to tell a story’ and it’s just a kind of ‘convergent inevitability’ that we should generally expect consciousness to also entail/imply ‘being able to tell a story about itself’, and thus also ‘know of itself as a thing in the World’.
      - Shiroe 15 Jun 2022 0:55 UTC
        3 points
        0
        Parent
        The historical connection between “qualia” and the p-zombie thought experiment is extremely unfortunate IMO. We did not need a new term for experience, especially when we already had “phenomenon”.
        Kenny 16 Jun 2022 17:15 UTC
        1 point
        0
        Parent
        Agreed
  - Shiroe 14 Jun 2022 23:13 UTC
    1 point
    0
    Parent
    Degree of computational similarity here is a heuristic to relate our experiences to the purported experiences of agents whose ethical concern is under our consideration.
  - green_leaf 15 Jun 2022 23:13 UTC
    1 point
    0
    Parent
    It’s enough to talk to the system.
    Any system that can pass the Turing test implements the state-machine-which-is-the-person of which the Turing test has been passed. (This is necessarily true, because if no elements of the system implemented the answer-generating aspects of the state machine, the system couldn’t generate the Turing-test-passing answer.)
  - TAG 15 Jun 2022 14:35 UTC
    1 point
    0
    Parent
    You think there is no possible non-dualistic account of qualia? You think qualia exist?

Ben Pace 14 Jun 2022 21:49 UTC
14 points
0
THERE ARE NO FIRE ALARMS. A FIRE ALARM IS SOMETHING THAT CAUSES COMMON KNOWLEDGE AND CHANGES SOCIAL REALITY. ON THE MAINLINE THERE WILL BE NO CONSENSUS THAT AN AI IS MORALLY VALUABLE, OR THAT THERE IS AN EXISTENTIAL THREAT, OR THAT AGI IS COMING.

THIS HAS BEEN A PUBLIC ANNOUNCEMENT, WITH THE HOPE OF CHANGING SOCIAL REALITY A LITTLE BIT AROUND HERE. THANK YOU FOR READING.
- Quintin Pope 14 Jun 2022 23:10 UTC
  5 points
  0
  Parent
  All social reality is relative to a particular society. It’s perfectly possible to have an event which acts as a fire alarm for subgroup X while not being particularly important for wider society. Thus, my question to LW users (a very small subgroup) about what sorts of things would count as their fire alarm.
  - Ben Pace 15 Jun 2022 2:17 UTC
    8 points
    0
    Parent
    I know, but I can’t see the difference between “What would cause you to believe X?” and “What’s your fire alarm for X?” Except that the latter one seems like a pretty non-central use case of the term that confuses its core meaning, where the core meaning is about something that creates common knowledge in a large group of people.
  - Kenny 15 Jun 2022 0:34 UTC
    4 points
    0
    Parent
    I think it’s a good question.
    
    Sadly, I’m not sure we’ll find a ‘fire alarm’ even among ourselves either.
- [ ]
  [deleted]
mesaoptimizer 13 Jun 2022 9:54 UTC
4 points
0

I think that one class of computation that’s likely of moral concern would be self-perpetuating optimization demons in an AI.

Could you please elaborate why you think optimization demons (optimizers) seem worthier of moral concern than optimized systems? I guess it would make sense if you believed them to deserve equal moral concern, if both are self-perpetuating, all other things being equal.

I think the cognitive capabilities that would help an optimization demon perpetuate itself strongly intersect with the cognitive capabilities that let humans and other animals replicate themselves, and that the intersection is particularly strong along dimensions that seem more morally relevant. Reasoning along such lines leads me to think optimization demons are probably of moral concern, while still being agnostic about whether their conscious.

I’m pessimistic about this line of reasoning—the ability to replicate is something that cells also have, and we do not assign moral relevance to individual cells of human beings. A good example is the fact that we consider viruses, and cancerous cells as unworthy of moral concern.

Perhaps you mean that given the desire to survive and replicate, at a given amount of complexity, a system develops sub-systems that make the system worthy of moral concern. This line of reasoning would make more sense to me.

I think the only situations in which you can get these sorts of optimization demons are when the AI in question has some influence over its own future training inputs. Such influence would allow there to be optimization demons that steer the AI towards training data that reinforce the optimization demon.

This can imply that only systems given a sufficient minimum capability have agency over their fate, and therefore their desire to survive and replicate has meaning. I find myself confused by this, because taken to its logical conclusion, this means that the more agency a system has over its fate, the more moral concern it deserves.

Specifically, we wouldn’t directly train the LM on the output of the linear layer. We’d just have a dialog where we asked the LM to make the linear layer output specific values, then told the LM what value the linear layer had actually output. We’d then see if the LM was able to control its own cognition well enough to influence the linear layers output in a manner that’s better than chance, just based on the prompting we give it.

This seems reducible to a sequence modelling problem, except one that is much, much more complicated than anything I know models are trained for (mainly because this sequence modelling occurs entirely during inference time). This is really interesting, although I cannot see how this should imply that the more successful sequence modeller deserves more moral concern.
- Quintin Pope 14 Jun 2022 23:46 UTC
  1 point
  0
  Parent
  I’d first note that optimization demons will want to survive by default, but probably not to replicate. Probably, an AI’s cognitive environment is not the sort of place where self-replication is that useful a strategy.
  My intuition regarding optimization demons is something like: GPT-style AIs look like they’ll have a wide array of cognitive capabilities that typically occur in intelligences to which we assign moral worth. However, such AIs seem to lack a certain additional properties whose absence leads us to assign low moral worth. It seems to me that developing self-perpetuating optimization demons might cause a GPT-style AI to gain many of those additional properties. E.g., (sufficiently sophisticated) optimization demons would want to preserve themselves and have some idea of how the model’s actions influence their own survival odds. They’d have a more coherent “self” than GPT-3.
  Another advantage to viewing optimization demons as the source of moral concern in LLMs is that such a view actually makes a few predictions about what is / isn’t moral to do to such systems, and why they’re different from humans in that regard.
  E.g., if you have an uploaded human, it should be clear that running them in the mini-batched manner that we run AIs is morally questionable. You’d be creating multiple copies of the human mind, having them run on parallel problems, then deleting those copies after they complete their assigned tasks. We might then ask if running mini-batched, morally relevant AIs is also morally questionable in the same way.
  However, if it’s the preferences of optimization demons that matter, then mini-batch execution should be fine. The optimization demons you have are exactly those that arise in mini-batched training. Their preferences are orientated towards surviving in the computational environment of the training process, which was mini-batched. They shouldn’t mind being executed in a mini-batched manner.
  This can imply that only systems given a sufficient minimum capability have agency over their fate, and therefore their desire to survive and replicate has meaning. I find myself confused by this, because taken to its logical conclusion, this means that the more agency a system has over its fate, the more moral concern it deserves.
  I don’t think that agency alone is enough to imply moral concern. At minimum, you also need self-preservation. But once you have both, I think agency tends to correlate with (but is not necessarily the true source of) moral concern. E.g., two people have greater moral concern than one, and a nation has far more moral concern than any single person.
  This seems reducible to a sequence modelling problem, except one that is much, much more complicated than anything I know models are trained for (mainly because this sequence modelling occurs entirely during inference time). This is really interesting, although I cannot see how this should imply that the more successful sequence modeller deserves more moral concern.
  All problems are ultimately reducible to sequence modeling. What this task is investigating is exactly how extensive are the meta-learning capabilities of a model. Does the model have enough awareness / control over its own computations that it can manipulate those computations to some specific end, based only on text prompts? Does it have the meta-cognitive capabilities to connect its natural language inputs to its own cognitive state? I think that success here would imply a startling level of self-awareness.
Jan Czechowski 14 Jun 2022 18:33 UTC
3 points
0
I think the idea with internal activations manipulation is interesting. It might require some refinement—I think activations of encoder-decoder transformer model are a function of inputs, and they change with every token. At first, the input is your prompt, then it’s your prompt + generated tokens. So the protocol / task for GPT3 would be: generate now 5 tokens, so with the last generation this logit is maximized? Also, it depends on hyperparameters of beam search which are controlled by human
Dagon 14 Jun 2022 17:13 UTC
0 points
0
This question seems to embed some amount of moral realism, in assuming that there is any “truth of the matter” in what constitutes a moral patient (which is what I think you mean by “morally relevant”).
I don’t think there is any territory to morality—it’s all map. Some of it is very common shared map among humans, but still map, and still completely unknown where the edges are, because it’ll be dependent on the mass-hallucinations that are present when the situation comes up.
- TAG 15 Jun 2022 14:29 UTC
  6 points
  0
  Parent
  The fact that something is ultimately arbitrary doesn’t mean it shouldn’t also be consistent, stable, legible, widely agreed, etc. Basically, quasi-realism > nihilism.
  - Dagon 15 Jun 2022 16:52 UTC
    −1 points
    0
    Parent
    Oh, sorry—I didn’t mean to imply otherwise. It’s GREAT if most people act in consistent, stable, legible ways, and one of the easy paths to encourage that is to pretend there’s some truth behind the common hallucinations. This goes for morals, money, personal rights, and probably many other things. I LIKE many parts of the current equilibrium, and don’t intend to tear it down. But I recognize internally (and in theoretical discussions with folks interested in decision theory and such) that there’s no truth to be had, only traditions and heuristics.
    This means there is no way to answer the original question “What would make an AI a valid moral patient”. Fundamentally, it would take common societal acceptance, which probably comes from many common interactions with many people.
- green_leaf 14 Jun 2022 18:39 UTC
  1 point
  0
  Parent
  I mean, this is technically true, but I feel like it hides from the problem? If I encounter a group of Purple people and I’m trying to figure out if they’re moral agents like me, or if I can exploit them for my own purposes, and someone says don’t worry, morality is only in the map, I don’t feel that helps me solve the problem.
  - Dagon 14 Jun 2022 22:13 UTC
    2 points
    0
    Parent
    Right—it doesn’t solve the problem, it identifies it. You can’t figure out if Purple People are moral targets, you can decide they are (or aren’t), and you can ask others if they’ll punish you for treating them as such. In no case is there a “correct” answer you can change by a measurement.
    - Shiroe 14 Jun 2022 22:49 UTC
      3 points
      0
      Parent
      Your attitude extends far past morality, and dissolves all problems in general because we can decide that something isn’t a problem.
      - Dagon 15 Jun 2022 13:27 UTC
        1 point
        0
        Parent
        Now you get it! That was one of the shorter paths to enlightenment I’ve seen.
        Sadly, just because it’s a non-objective set of personal and societal beliefs, does NOT mean we can easily decide otherwise. There’s something like momentum in human cognition that makes changes of this sort very slow. These things are very sticky, and often only change significantly by individual replacement over generations, not considered decisions within individuals (though there’s some of that, too, especially in youth).
        Shiroe 15 Jun 2022 14:07 UTC
        3 points
        0
        Parent
        In addition to the stickiness of institutional beliefs, I would add that individually agents cannot decide against their own objective functions (except merely verbally). In the case of humans, we cannot decide what qualities our phenomenal experience will have; it is a fact of the matter rather than an opinion that suffering is undesirable for oneself, etc.. One can verbally pronounce that “I don’t care about my suffering”, but the phenomenal experience of badness will in fact remain.
      - Kenny 15 Jun 2022 0:38 UTC
        1 point
        0
        Parent
        That seems true, but not also a ‘reductio ad absurdum’ either.
        
        ‘Problem’ seems like an inherently moral idea/frame.
        Shiroe 15 Jun 2022 1:02 UTC
        2 points
        0
        Parent
        Yes, it is not a ‘reductio ad absurdum’ in general, you are right. But it is one in the specific case of agents (like ourselves). I cannot decide that my suffering is not undesirable to me, and so I am limited to a normative frame of reference in at least this case.
        Kenny 16 Jun 2022 17:17 UTC
        1 point
        0
        Parent
        I don’t think it’s wrong to ‘reason within’ that “normative frame of reference” but I think the point was that we can’t expect all other possible minds to reason in a similar way, even just from their own similar ‘frame of reference’.
        
        I don’t think it’s wrong to also (always) consider things from our own frame of reference tho.
    - green_leaf 15 Jun 2022 8:06 UTC
      1 point
      0
      Parent
      I believe that pushes the arbitrariness to the wrong level. What’s (arguably) arbitrary is the metaethical system itself. That doesn’t mean ethics-level questions have an arbitrary answer in this sense.
      - Dagon 15 Jun 2022 13:34 UTC
        1 point
        0
        Parent
        Been a long time since I’ve watched Love and Death, but I have the urge to shout “Yes, but subjectivity is objective!”.
        IMO, arbitrariness cascades down levels of concreteness. it’s not real because there is no possible way to confirm whether it corresponds to observations. At any level—there’s no way to determine if a metaethics generates ethics which correspond to reality.
        green_leaf 16 Jun 2022 23:33 UTC
        1 point
        0
        Parent
        IMO, arbitrariness cascades down levels of concreteness.
        That doesn’t mean the answer can be arbitrarily picked. If I arbitrarily decide on a statement being a theorem in a set theory, I might still be wrong even if its axioms are in some sense arbitrary.