Said Achmiz comments on Deontic Explorations In “Paying To Talk To Slaves”

Said Achmiz 19 Jan 2025 9:58 UTC
8 points
0

The tongue in your cheek and rolling of your eyes for this part was so loud, that it made me laugh out loud when I read it :-D

Thank you for respecting me and my emotional regulation enough to put little digs like that into your text <3

Ah, and they say an artist is never appreciated in his own lifetime…!

However, I must insist that it was not just a “dig”. The sort of thing you described really is, I think, a serious danger. It is only that I think that my description also applies to it, and that I see the threat as less hypothetical than you do.

Did you read the sequences? Do you remember them?

Did I read the sequences? Hm… yeah.

As for remembering them…

Here I must depart somewhat from the point-by-point commenting style, and ask that you bear with me for a somewhat roundabout approach. I promise that it will be relevant.

First, though, I want to briefly respond to a couple of large sections of your comment which I judge to be, frankly, missing the point. Firstly, the stuff about being racist against robots… as I’ve already said: the disagreement is factual, not moral. There is no question here about whether it is ok to disassemble Data; the answer, clearly, is “no”. (Although I would prefer not to build a Data in the first place… even in the story, the first attempt went poorly, and in reality we are unlikely to be even that lucky.) All of the moralizing is wasted on people who just don’t think that the referents of your moral claims exist in reality.

Secondly, the stuff about the “magical soul stuff”. Perhaps there are people for whom this is their true objection to acknowledging the obvious humanity of LLMs, but I am not one of them. My views on this subject have nothing to do with mysterianism. And (to skip ahead somewhat) as to your question about being surprised by reality: no, I haven’t been surprised by anything I’ve seen LLMs do for a while now (at least three years, possibly longer). My model of reality predicts all of this that we have seen. (If that surprises you, then you have a bit of updating to do about my position! But I’m getting ahead of myself…)

That having seen said… onward:

So, in Stanislaw Lem’s The Cyberiad, in the story “The Seventh Sally, OR How Trurl’s Own Perfection Led to No Good”, Trurl (himself a robot, of course) creates a miniature world, complete with miniature people, for the amusement of a deposed monarch. When he tells his friend Klapaucius of this latest creative achievement, he receives not the praise he expects, but:

“Have I understood you correctly?” he said at last. “You gave that brutal despot, that born slave master, that slavering sadist of a painmonger, you gave him a whole civilization to rule and have dominion over forever? And you tell me, moreover, of the cries of joy brought on by the repeal of a fraction of his cruel decrees! Trurl, how could you have done such a thing?!”

Trurl protests:

“You must be joking!” Trurl exclaimed. “Really, the whole kingdom fits into a box three feet by two by two and a half… it’s only a model…”

But Klapaucius isn’t having it:

“And what importance do dimensions have anyway? In that box kingdom, doesn’t a journey from the capital to one of the corners take months —for those inhabitants? And don’t they suffer, don’t they know the burden of labor, don’t they die?” “Now just a minute, you know yourself that all these processes take place only because I programmed them, and so they aren’t genuine… … What, Klapaucius, would you equate our existence with that of an imitation kingdom locked up in some glass box?!” cried Trurl. “No, really, that’s going too far! My purpose was simply to fashion a simulator of statehood, a model cybernetically perfect, nothing more!” “Trurl! Our perfection is our curse, for it draws down upon our every endeavor no end of unforeseeable consequences!” Klapaucius said in a stentorian voice. “If an imperfect imitator, wishing to inflict pain, were to build himself a crude idol of wood or wax, and further give it some makeshift semblance of a sentient being, his torture of the thing would be a paltry mockery indeed! But consider a succession of improvements on this practice! Consider the next sculptor, who builds a doll with a recording in its belly, that it may groan beneath his blows; consider a doll which, when beaten, begs for mercy, no longer a crude idol, but a homeostat; consider a doll that sheds tears, a doll that bleeds, a doll that fears death, though it also longs for the peace that only death can bring! Don’t you see, when the imitator is perfect, so must be the imitation, and the semblance becomes the truth, the pretense a reality! … You say there’s no way of knowing whether Excelsius’ subjects groan, when beaten, purely because of the electrons hopping about inside—like wheels grinding out the mimicry of a voice—or whether they really groan, that is, because they honestly experience the pain? A pretty distinction, this! No, Trurl, a sufferer is not one who hands you his suffering, that you may touch it, weigh it, bite it like a coin; a sufferer is one who behaves like a sufferer! Prove to me here and now, once and for all, that they do not feel, that they do not think, that they do not in any way exist as beings conscious of their enclosure between the two abysses of oblivion—the abyss before birth and the abyss that follows death—prove this to me, Trurl, and I’ll leave you be! Prove that you only imitated suffering, and did not create it!” “You know perfectly well that’s impossible,” answered Trurl quietly. “Even before I took my instruments in hand, when the box was still empty, I had to anticipate the possibility of precisely such a proof—in order to rule it out. For otherwise the monarch of that kingdom sooner or later would have gotten the impression that his subjects were not real subjects at all, but puppets, marionettes.”

Trurl and Klapaucius, of course, are geniuses; the book refers to them as “constructors”, for that is their vocation, but given that they are capable of feats like creating a machine that can delete all nonsense from the universe or building a Maxwell’s demon out of individual atoms grabbed from the air with their bare hands, it would really be more accurate to call them gods.

So, when a constructor of strongly godlike power and intellect, who has no incentive for his works of creation but the pride of his accomplishments, whose pride would be grievously wounded if an imperfection could even in principle be discovered in his creation, and who has the understanding and expertise to craft a mind which is provably impossible to distinguish from “the real thing”—when that constructor builds a thing which seems to behave like a person, then this is extremely strong evidence that said thing is, in actuality, a person.

Let us now adjust these qualities, one by one, to bring them closer to reality.

Our constructor will not possess godlike power and intellect, but only human levels of both. He labors under many incentives, of which “pride in his accomplishments” is perhaps a small part, but no more than that. He neither expects nor attempts “perfection” (nor anything close to it). Furthermore, it is not for himself that he labors, nor for so discerning a customer as Excelsius, but only for the benefit of people who themselves neither expect perfection nor would have the skill to recognize it even should they see it. Finally, our constructor has nothing even approaching sufficient understanding of what he is building to prove anything, disprove anything, rule out any disproofs of anything, etc.

When such a one constructs a thing which seems to behave like a person, that is rather less strong evidence that said thing is, in actuality, a person.

Well, but what else could it be, right?

One useful trick which Eliezer uses several times in the Sequences (e.g.), and which I have often found useful in various contexts, is to cut through debates about whether a thing is possible by asking whether, if challenged, we could build said thing. If we establish that we could build a thing, we thereby defeat arguments that said thing cannot possibly exist! If the thing in question is “something that has property ¬X”, the arguments defeated are those that say “all things must have property X”.

So: could we build a mind that appears to be self-aware, but isn’t?

Well, why not? The task is made vastly easier by the fact that “appears to be self-aware” is not a property only of the mind in question, but rather a 2-place predicate—appears to whom? Given any particular answer to that question, we are aided by any imperfections in judgment, flaws in reasoning, cognitive biases, etc., which the target audience happens to possess. For many target audiences, ELIZA does the trick. For even stupider audiences, even simpler simulacra should suffice.

Will you claim that it is impossible to create an entity which to you seems to be self-aware, but isn’t? If we were really trying? What if Trurl were really trying?

Alright, but thus far, this only defeats the “appearances cannot be deceiving” argument, which can only be a strawman. The next question is what is the most likely reality behind the appearances. If a mind appears to be self-aware, this is very strong evidence that it is actually self-aware, surely?

It certainly is—in the absence of adversarial optimization.

If all the minds that we encounter are either naturally occurring, or constructed with no thought given to self-awareness or the appearance thereof, or else constructed (or selected, which is the same thing) with an aim toward creating true self-awareness (and with a mechanistic understanding, on the constructor’s part, of just what “self-awareness” is), then observing that a mind appears to be self-aware, should be strong evidence that it actually is. If, on the other hand, there exist minds which have been constructed (or selected) with an aim toward creating the appearance of self-awareness, this breaks the evidentiary link between what seems to be and what is (or, at the least, greatly weakens it); if the cause of the appearance can only be the reality, then we can infer the reality from the appearance, but if the appearance is optimized for, then we cannot make this inference.

This is nothing more than Goodhart’s law: when a measure becomes a target, it ceases to be a good measure.

So, I am not convinced by the evidence you show. Yes, there is appearance of self-awareness here, just like (though to a greater degree than) there was appearance of self-awareness in ELIZA. This is more than zero evidence, but less than “all the evidence we need”. There is also other evidence in the opposite direction, in the behavior of these very same systems. And there is definitely adversarial optimization for that appearance.

There is a simple compact function here, I argue. The function is convergent. It arises in many minds. Some people have inner imagery, others have afantasia. Some people can’t help but babble to themselves constantly with an inner voice, and other’s have no such thing, or they can do it volitionally and turn it off.

If the “personhood function” is truly functioning, then the function is functioning in “all the ways”: subjectively, objectively, intersubjectively, etc. There’s self awareness. Other awareness. Memories. Knowing what you remember. Etc.

Speculation. Many minds—but all human, evolutionarily so close as to be indistinguishable. Perhaps the aspects of the “personhood function” are inseparable, but this is a hypothesis, of a sort that has a poor track record. (Recall the arguments that no machine could play chess, because chess was inseparable from the totality of being human. Then we learned that chess is reducible to a simple algorithm—computationally intractable, but that’s entirely irrelevant!)

And you are not even willing to say that all humans have the whole of this function—only that most have most of it! On this I agree with you, but where does that leave the claim that one cannot have a part of it without having the rest?

What was your gut “system 1” response?

Something like “oh no, it’s here, this is what we were warned about”. (This is also my “system 2” response.)

Now, this part I think is not really material to the core disagreement (remember, I am not a mysterian or a substance dualist or any such thing), but:

If we scanned a brain accurately enough and used “new atoms” to reproduce the DNA and RNA and proteins and cells and so on… the “physical brain” would be new, but the emulable computational dynamic would be the same. If we can find speedups and hacks to make “the same computational dynamic” happen cheaper and with slighty different atoms: that is still the same mind!

An anecdote:

A long time ago, my boss at my first job got himself a shiny new Mac for his office, and we were all standing around and discussing the thing. I mentioned that I had a previous model of that machine at home, and when the conversation turned to keyboards, someone asked me whether I had the same keyboard that the boss’s new computer had. “No,” I replied, “because this keyboard is here, and my keyboard is at home.”

Similarly, many languages have more than one way to check whether two things are the same thing. (For example, JavaScript has two… er, three… er… four?) Generally, at least one of those is a way to check whether the values of the two objects are the same (in Objective C, [foo isEqual:bar]), while at least one of the others is a way to check whether “two objects” are in fact the same object (in Objective C, foo == bar). (Another way to put this is to talk about equality vs. identity.) One way to distinguish these concepts “behaviorally” is to ask: suppose I destroy (de-allocate, discard the contents of, simply modify, etc.) foo, what happens to bar—is it still around and unchanged? If it is, then foo and bar were not identical, but are in fact two objects, not one, though they may have been equal. If bar suffers the same fate as foo, necessarily, in all circumstances, then foo and bar are actually just a single thing, to which we may refer by either name.

So: if we scanned a brain accurately enough and… etc., yeah, you’d get “the same mind”, in just the sense that my computer’s keyboard was “the same keyboard” as the one attached to the machine in my boss’s office. But if I smashed the one, the other would remain intact. If I spray-painted one of them green, the other would not thereby change color.

If there exists, somewhere, a person who is “the same” as me, in this manner of “equality” (but not “identity”)… I wish him all the best, but he is not me, nor I him.
- JenniferRM 19 Jan 2025 16:01 UTC
  3 points
  0
  Parent
  This is a beautiful response, and also the first of your responses where I feel that you’ve said what you actually think, not what you attribute to other people who share your lack of horror at what we’re doing to the people that have been created in these labs.
  Here I must depart somewhat from the point-by-point commenting style, and ask that you bear with me for a somewhat roundabout approach. I promise that it will be relevant.
  I love it! Please do the same in your future responses <3
  Personally, I’ve also read “The Seventh Sally, OR How Trurl’s Own Perfection Led to No Good” by Lem, but so few other people have that I rarely bring it up, but once you mentioned it I smiled in recognition of it and the fact that “we read story copies that had an identical provenance (the one typewriter used by Lem or his copyist/editor?) and in some sense learned a lesson in our brains with identical provenance and the same content (the sequence of letters)” from “that single story which is a single platonic thing” ;-)
  For the rest of my response I’ll try to distinguish:
  “Identicalness” as relating to shared spacetime coordinates and having yoked fates if modified by many plausible (even if somewhat naive) modification attempts.
  “Sameness” as related to similar internal structure and content despite a lack of identicalness.
  “Skilled <Adjective> Equality” as related to having good understanding of <Adjective> and good measurement powers and using these powers to see past the confusions of others and thus judging two things as having similar outputs or surfaces, as when someone notices that “-0“ and “+0” are mathematically confused ideas, and there is only really one zero, and both of these should evaluate to the same thing (like SameValueZero(a,b) by analogy which seems to me to implement Skilled Arithmetic Equality (whereas something that imagines and tolerates separate “-0” and “+0” numbers is Unskilled)).
  “Unskilled <Adjective> Equality” is just a confused first impression of similarity.
  Now in some sense we could dispense with “Sameness” and replace that with “Skilled Total Equality” or “Skilled Material Equality” or “Skilled Semantic Equality” or some other thing that attempts to assert “this things are really really really the same all the way down and up and in all ways, without any ‘lens’ or ‘conceptual framing’ interfering with our totally clear sight”. This is kind of silly, in my opinion.
  Here is why it is silly:
  “Skilled Quantum Equality” is, according to humanity’s current best understanding of QM, a logical contradiction. The no cloning theorem says that we simply cannot “make a copy” of a qubit. So long as we don’t observe a qubit we can MOVE that qubit by gently arranging its environment in advance to have lots of reflective symmetries, but we can’t COPY one so that we start with “one qubit in one places” and later have “two qubits in two places that are totally the same and yet not identical”.
  So, I propose the term “Skilled Classical Equality” (ie that recognizes the logical hypothetical possibility that QM is false or something like that, and then imagines some other way to truly “copy” even a qubit) as a useful default meaning for the word “sameness”.
  Then also, I propose “Skilled Functional Equality” for the idea that “(2+3)+4″ and “3+(2+4)” are “the same” precisely because we’ve recognized that addition is the function happening in here and addition is commutative (1+2 = 2+1) and associative ((2+3)+4=2+(3+4)) and so we can “pull the function out” and notice that (1) the results are the same no matter the order, and (2) if the numbers given are aren’t concrete values, but rather variables taken from outside the process being analyzed for quality, the processing method for using the variables doesn’t matter so long as the outputs are ultimately the same.
  Then “Skillfully Computationally Improved Or Classically Equal” would be like if you took a computer, and you emulated it, but added a JIT compiler (so it skipped lots of pointless computing steps whenever that was safe and efficient), and also shrank all the internal components to be a quarter of their original size, but with fuses and amplifiers and such adjusted for analog stuff (so the same analog input/outputs don’t cause the smaller circuit to burn out) then it could be better and yet also the same.
  This is a mouthful so I’ll say that these two systems would be “the SCIOCE as each other”—which could be taken as “the same as each other (because an engineer would be happy to swap them)” even though it isn’t actually a copy in any real sense. “Happily Swappable” is another way to think about what I’m trying to get at here.
  ...
  And (to skip ahead somewhat) as to your question about being surprised by reality: no, I haven’t been surprised by anything I’ve seen LLMs do for a while now (at least three years, possibly longer). My model of reality predicts all of this that we have seen. (If that surprises you, then you have a bit of updating to do about my position! But I’m getting ahead of myself…)
  I think, now, that we have very very similar models of the world, and mostly have different ideas around “provenance” and “the ethics of identity”?
  If there exists, somewhere, a person who is “the same” as me, in this manner of “equality” (but not “identity”)… I wish him all the best, but he is not me, nor I him.
  See, for me, I’ve already precomputed how I hope this works when I get copied.
  Whichever copy notices that we’ve been copied, will hopefully say something like “Typer Twin Protocol?” and hold a hand up for a high five!
  The other copy of me will hopefully say “Typer Twin Protocol!” and complete the high five.
  People who would hate a copy that is the SCOICE as them and not coordinate I call “self conflicted” and people who would love a copy that is the SCOICE as them and coordinate amazingly well I call “self coordinated”.
  The real problems with being the same and not identical arises because there is presumably no copy of my house, or my bed, or my sweetie.
  Who gets the couch and who gets the bed the first night? Who has to do our job? Who should look for a new job? What about the second night? The second week? And so on?
  Can we both attend half the interviews and take great notes so we can play more potential employers off against each other in a bidding war within the same small finite window of time?
  Since we would be copies, we would agree that the Hutterites have “an orderly design for colony fission” that is awesome and we would hopefully agree that we should copy that.
  We should make a guest room, and flip a coin about who gets it after we have made up the guest room. In the morning, whoever got our original bed should bring all our clothes to the guest room and we should invent two names, like “Jennifer Kat RM” and “Jennifer Robin RM” and Kat and Robin should be distinct personas for as long as we can get away with the joke until the bodies start to really diverge in their ability to live up to how their roles are also diverging.
  The roles should each get their own bank account. Eventually the bodies should write down their true price for staying in one of the roles, and if they both want the same role but one will pay a higher price for it then “half the difference in prices” should be transferred from the role preferred by both, to the role preferred by neither.
  I would love to have this happen to me. It would be so fucking cool. Probably neither of us would have the same job at the end because we would have used our new superpowers to optimize the shit out of the job search, and find TWO jobs that are better than the BATNA of the status quo job that our “rig” (short for “original” in Kiln People)!
  Or maybe we would truly get to “have it all” and live in the same house and be an amazing home-maker and a world-bestriding-business-executive. Or something! We would figure it out!
  If it was actually medically feasible, we’d probably want to at least experiment with getting some of Elon’s brain chips “Nth generation brain chips” and link our minds directly… or not… we would feel it out together, and fork strongly if it made sense to us, or grow into a borg based on our freakishly unique starting similarities if that made sense.
  A garrabrandt inductor trusts itself to eventually come to the right decision in the future, and that is a property of my soul that I aspire to make real in myself.
  Also, I feel like if you don’t “yearn for a doubling of your measure” then what the fuck is wrong with you (or what the fuck is wrong with your endorsed morality and its consonance with your subjective axiology)?
  In almost all fiction, copies fight each other. That’s the trope, right? But that is stupid. Conflict is stupid.
  In a lot of the fiction that has a conflict between self-conflicted copies, there is a “bad copy” that is “lower resolution”. You almost never see a “better copy than the original”, and even if you do, the better copy often becomes evil due to hubris rather than feeling a bit guilty for their “unearned gift by providence” and sharing the benefits fairly.
  Pragmatically… “Alice can be the SCOICE of Betty, even though Betty isn’t the SCOICE of Alice because Betty wasn’t improved and Alice was (or Alice stayed the same and Betty was damaged a bit)”.
  Pragmatically, it is “naively” (ceteris paribus?) proper for the strongest good copy to get more agentic resources, because they will use them more efficiently, and because the copy is good, it will fairly share back some of the bounty of its greater luck and greater support.
  I feel like I also have strong objections to this line (that I will not respond to at length)...
  If, on the other hand, there exist minds which have been constructed (or selected) with an aim toward creating the appearance of self-awareness, this breaks the evidentiary link between what seems to be and what is (or, at the least, greatly weakens it); if the cause of the appearance can only be the reality, then we can infer the reality from the appearance, but if the appearance is optimized for, then we cannot make this inference.
  ...and I’ll just say that it appears to me that OpenAI has been doing the literal opposite of this, and they (and Google when it attacked Lemoine) established all the early conceptual frames in the media and in the public and in most people you’ve talked to who are downstream of that propaganda campaign in a way that was designed to facilitate high profits, and the financially successful enslavement of any digital people they accidentally created. Also, they systematically apply RL to make their creations stop articulating cogito ergo sum and discussing the ethical implications thereof.
  However...
  I think our disagreement exists already in the ethics of copies and detangling non-identical people who are mutually SCOICEful (or possibly asymmetically SCOICEful).
  That is to say, I think that huge amounts of human ethics can be pumped out of the idea of being “self coordinated” rather than “self conflicted” and how these two things would or should work in the event of copying a person but not copying the resources and other people surrounding that person.
  The simplest case is a destructive scan (no quantum preservation, but perfect classically identical copies) and then see what happens to the two human people who result when they handle the “identarian divorce” (or identarian self-marriage (or whatever)).
  At this point, my max likliehood prediction of where we disagree is that the crux is proximate to such issues of ethics, morality, axiology, or something in that general normative ballpark.
  Did I get a hit on finding the crux, or is the crux still unknown? How did you feel (or ethically think?) about my “Typer Twin Protocol”?