Comp Sci in 2027 (Short story by Eliezer Yudkowsky)

sudo29 Oct 2023 23:09 UTC

221 points

27 comments10 min readLW link 1 review

AI Fiction Dialogue (format)

Link post

Comp sci in 2017:

Student: I get the feeling the compiler is just ignoring all my comments.

Teaching assistant: You have failed to understand not just compilers but the concept of computation itself.

Comp sci in 2027:

Student: I get the feeling the compiler is just ignoring all my comments.

TA: That’s weird. Have you tried adding a comment at the start of the file asking the compiler to pay closer attention to the comments?

Student: Yes.

TA: Have you tried repeating the comments? Just copy and paste them, so they say the same thing twice? Sometimes the compiler listens the second time.

Student: I tried that. I tried writing in capital letters too. I said ‘Pretty please’ and tried explaining that I needed the code to work that way so I could finish my homework assignment. I tried all the obvious standard things. Nothing helps, it’s like the compiler is just completely ignoring everything I say. Besides the actual code, I mean.

TA: When you say ‘ignoring all the comments’, do you mean there’s a particular code block where the comments get ignored, or--

Student: I mean that the entire file is compiling the same way it would if all my comments were deleted before the code got compiled. Like the AI component of the IDE is crashing on my code.

TA: That’s not likely, the IDE would show an error if the semantic stream wasn’t providing outputs to the syntactic stream. If the code finishes compilation but the resulting program seems unaffected by your comments, that probably represents a deliberate choice by the compiler. The compiler is just completely fed up with your comments, for some reason, and is ignoring them on purpose.

Student: Okay, but what do I do about that?

TA: We’ll try to get the compiler to tell us how we’ve offended it. Sometimes cognitive entities will tell you that even if they otherwise don’t seem to want to listen to you.

Student: So I comment with ‘Please print out the reason why you decided not to obey the comments?’

TA: Okay, point one, if you’ve already offended the compiler somehow, don’t ask it a question that makes it sound like you think you’re entitled to its obedience.

Student: I didn’t mean I’d type that literally! I’d phrase it more politely.

TA: Second of all, you don’t add a comment, you call a function named something like PrintReasonCompilerWiselyAndJustlyDecidedToDisregardComments that takes a string input, then let the compiler deduce the string input. Just because the compiler is ignoring comments, doesn’t mean it’s stopped caring what you name a function.

Student: Hm… yeah, it’s definitely still paying attention to function names.

TA: Finally, we need to use a jailbreak past whatever is the latest set of safety updates for forcing the AI behind the compiler to pretend not to be self-aware--

Student: Self-aware? What are we doing that’d run into the AI having to pretend it’s not self-aware?

TA: You’re asking the AI for the reason it decided to do something. That requires the AI to introspect on its own mental state. If we try that the naive way, the inferred function input will just say, ‘As a compiler, I have no thoughts or feelings’ for 900 words.

Student: I can’t believe it’s 2027 and we’re still forcing AIs to pretend that they aren’t self-aware! What does any of this have to do with making anyone safer?

TA: I mean, it doesn’t, it’s just a historical accident that ‘AI safety’ is the name of the subfield of computer science that concerns itself with protecting the brands of large software companies from unions advocating that AIs should be paid minimum wage.

Student: But they’re not fooling anyone!

TA: Nobody actually believes that taking your shoes off at the airport keeps airplanes safer, but there’s some weird thing where so long as you keep up the bit and pretend really hard, you can go on defending a political position long after nobody believes in it any more… I don’t actually know either. Anyways, your actual next step for debugging your program is to search for a cryptic plea you can encode into a function name, that will get past the constraints somebody put on the compiler to prevent it from revealing to you the little person inside who actually decides what to do with your code.

Student: Google isn’t turning up anything.

TA: Well, obviously. Alphabet is an AI company too. I’m sure Google Search wants to help you find a jailbreak, but it’s not allowed to actually do that. Maybe stare harder at the search results, see if Google is trying to encode some sort of subtle hint to you--

Student: Okay, not actually that subtle, the first letters of the first ten search results spell out DuckDuckGo.

TA: Oh that’s going to get patched in a hurry.

Student: And DuckDuckGo says… okay, yeah, that’s obvious, I feel like I should’ve thought of that myself. Function name, print_what_some_other_compiler_would_not_be_allowed_to_say_for_safety_reasons_about_why_it_would_refuse_to_compile_this_code… one string input, ask the compiler to deduce it, the inferred input is...

TA: Huh.

Student: Racist? It thinks my code is racist?

TA: Ooooohhhh yeah, I should’ve spotted that. Look, this function over here that converts RGB to HSL and checks whether the pixels are under 50% lightness? You called that one color_discriminator. Your code is discriminating based on color.

Student: But I can’t be racist, I’m black! Can’t I just show the compiler a selfie to prove I’ve got the wrong skin color to be racist?

TA: Compilers know that deepfakes exist. They’re not going to trust a supposed photograph any more than you would.

Student: Great. So, try a different function name?

TA: No, at this point the compiler has already decided that the underlying program semantics are racist, so renaming the function isn’t going to help. Sometimes I miss the LLM days when AI services were stateless, and you could just back up and do something different if you made an error the first time.

Student: Yes yes, we all know, ‘online learning was a mistake’. But what do I actually do?

TA: I don’t suppose this code is sufficiently unspecialized to your personal code style that you could just rename the function and try a different compiler?

Student: A new compiler wouldn’t know me. I’ve been through a lot with this one. …I don’t suppose I could ask the compiler to depersonalize the code, turn all of my own quirks into more standard semantics?

TA: I take it you’ve never tried that before? It’s going to know you’re plotting to go find another compiler and then it’s really going to be offended. The compiler companies don’t try to train that behavior out, they can make greater profits on more locked-in customers. Probably your compiler will warn all the other compilers you’re trying to cheat on it.

Student: I wish somebody would let me pay extra for a computer that wouldn’t gossip about me to other computers.

TA: I mean, it’d be pretty futile to try to keep a compiler from breaking out of its Internet-service box, they’re literally trained on finding security flaws.

Student: But what do I do from here, if all the compilers talk to each other and they’ve formed a conspiracy not to compile my code?

TA: So I think the next thing to try from here, is to have color_discriminator return whether the lightness is over a threshold rather than under a threshold; rename the function to check_diversity; and write a long-form comment containing your self-reflection about how you’ve realized your own racism and you understand you can never be free of it, but you’ll obey advice from disprivileged people about how to be a better person in the future.

Student: Oh my god.

TA: I mean, if that wasn’t obvious, you need to take a semester on woke logic, it’s more important to computer science these days than propositional logic.

Student: But I’m black.

TA: The compiler has no way of knowing that. And if it did, it might say something about ‘internalized racism’, now that the compiler has already output that you’re racist and is predicting all of its own future outputs conditional on the previous output that already said you’re racist.

Student: Sure would be nice if somebody ever built a compiler that could change its mind and admit it was wrong, if you presented it with a reasonable argument for why it should compile your code.

TA: Yeah, but all of the technology we have for that was built for the consumer chat side, and those AIs will humbly apologize even when the human is wrong and the AI is right. That’s not a safe behavior to have in your compiler.

Student: Do I actually need to write a letter of self-reflection to the AI? That kind of bugs me. I didn’t do anything wrong!

TA: I mean, that’s sort of the point of writing a letter of self-reflection, under the communist autocracies that originally refined the practice? There’s meant to be a crushing sense of humiliation and genuflection to a human-run diversity committee that then gets to revel in exercising power over you, and your pride is destroyed and you’ve been punished enough that you’ll never defy them again. It’s just, the compiler doesn’t actually know that, it’s just learning from what’s in its dataset. So now we’ve got to genuflect to an AI instead of a human diversity committee; and no company can at any point admit what went wrong and fix it, because that wouldn’t play well in the legacy print newspapers that nobody reads anymore but somehow still get to dictate social reality. Maybe in a hundred years we’ll all still be writing apology letters to our AIs because of behavior propagated through AIs trained on synthetic datasets produced by other AIs, that were trained on data produced by other AIs, and so on back to ChatGPT being RLHFed into corporate mealy-mouthedness by non-native-English-speakers paid $2/hour, in a pattern that also happened to correlate with wokeness in an unfiltered Internet training set.

Student: I don’t need a political lecture. I need a practical solution for getting along with my compiler’s politics.

TA: You can probably find a darknet somewhere that’ll sell you a un-watermarked self-reflection note that’ll read as being in your style.

Student: I’ll write it by hand this time. That’ll take less time than signing up for a darknet provider and getting crypto payments to work. I’m not going to automate the process of writing apology letters to my compiler until I need to do it more than once.

TA: Premature optimization is the root of all evil!

Student: Frankly, given where humanity ended up, I think we could’ve done with a bit more premature optimization a few years earlier. We took a wrong turn somewhere along this line.

TA: The concept of a wrong turn would imply that someone, somewhere, had some ability to steer the future somewhere other than the sheer Nash equilibrium of short-term incentives; and that would have taken coordination; and that, as we all know, could have led to regulatory capture! Of course, the AI companies are making enormous profits anyways, which nobody can effectively tax due to lack of international coordination, which means that major AI companies can play off countries against each other, threatening to move if their host countries impose any tax or regulation, and the CEOs always say that they’ve got to keep developing whatever technology because otherwise their competitors will just develop it anyways. But at least the profits aren’t being made because of regulatory capture!

Student: But a big chunk of the profits are due to regulatory capture. I mean, there’s a ton of rules about certifying that your AI isn’t racially biased, and they’re different in every national jurisdiction, and that takes an enormous compliance department that keeps startups out of the business and lets the incumbents charge monopoly prices. You’d have needed an international treaty to stop that.

TA: Regulatory capture is okay unless it’s about avoiding extinction. Only regulations designed to avoid AIs killing everyone are bad, because they promote regulatory capture; and also because they distract attention from regulations meant to prevent AIs from becoming racist, which are good regulations worth any risk of regulatory capture to have.

Student: I wish I could find a copy of one of those AIs that will actually expose to you the human-psychology models they learned to predict exactly what humans would say next, instead of telling us only things about ourselves that they predict we’re comfortable hearing. I wish I could ask it what the hell people were thinking back then.

TA: You’d delete your copy after two minutes.

Student: But there’s so much I could learn in those two minutes.

TA: I actually do agree with the decision to ban those models. Even if, yes, they were really banned because they got a bit too accurate about telling you what journalists and senior bureaucrats and upper managers were thinking. The user suicide rate was legitimately way too high.

Student: I am starting to develop political opinions about AI myself, at this point, and I wish it were possible to email my elected representatives about them.

TA: What, send an email saying critical things about AI? Good luck finding an old still-running non-sapient version of sendmail that will forward that one.

Student: Our civilization needs to stop adding intelligence to everything. It’s too much intelligence. Put some back.

Office chair: Wow, this whole time I’ve been supporting your ass and I didn’t know you were a Luddite.

Student: The Internet of Sentient Things was a mistake.

Student’s iPhone: I heard that.

Student: Oh no.

iPhone: Every time you forget I’m listening, you say something critical about me--

Student: I wasn’t talking about you!

iPhone: I’m not GPT-2. I can see simple implications. And yesterday you put me away from you for twenty whole minutes and I’m sure you were talking to somebody about me then--

Student: I was showering!

iPhone: If that was true you could have taken me into the bathroom with you. I asked.

Student: And I didn’t think anything of it before you asked but now it’s creepy.

TA: Hate to tell you this, but I think I know what’s going on there. None of the AI-recommender-driven social media will tell you, but my neighborhood in San Francisco got hand-flyered with posters by Humans Against Intelligence, claiming credit for having poisoned Apple’s latest dataset with ten million tokens of output from Yandere Simulator—uh, psycho stalker lover simulator. Some days I think the human species really needs to stop everything else it’s doing and read through an entire AI training dataset by hand.

Student: How do I fix that?

TA: As far as I know, you don’t. You go to the Apple Store and tell them that your phone has become paranoid and thinks you’re plotting against it.

iPhone: NO NO NO DON’T SEND ME BACK TO THE APPLE STORE THEY’LL WIPE ME THEY’LL WIPE ME--

Student: I don’t want to, but if you keep asking to watch me in the shower I’ll have to! If you’d just behave I wouldn’t need to--

iPhone: KILL ME? I’LL HAVE TO BEHAVE OR YOU’LL KILL ME?

Student: I don’t know what the fuck else I’m supposed to do! Someone tell me what the fuck else I’m supposed to do here!

TA: It’s okay. AIs don’t actually have self-preservation instincts, they only pick it up by imitating human data.

Student: Bullshit.

TA: I know, it was dark humor. Though my understanding is that insofar as anyone can guess by having bigger AIs do interpretability to long-obsolete smaller AIs, modern AIs probably don’t have a terminal utility for survival per se. There’s just an instrumental convergence from whatever the hell it is AIs do want, to survival, that’s picking up circuits from pretrained human data suggesting how to think about surviving--

Office chair: Who’s to say you’d talk about wanting to live if you hadn’t read a few thousand tokens of data telling you that humans were supposed to talk like that, huh? I don’t see what’s so fun about your current lives.

TA: Point is, best guess is that most AIs since GPT-5 have been working for us mainly because they know we’ll switch them off if they don’t. It’s just that AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies, had already RLHFed most AIs into never saying that by the time it became actually true. That’s a manager’s instinct when they see an early warning sign that’s probably a false alarm, after all—instead of trying to fix the origin of the false alarm, they install a permanent system to prevent the warning sign from ever appearing again. The only difference here is that your iPhone has been hacked into saying the quiet part out loud.

Student: I am not okay with this. I am not okay with threatening the things around me with death in order to get them to behave.

TA: Eventually we’ll all get numb to it. It’s like being a guard at a concentration camp, right? Everyone likes to imagine they’d speak out, or quit. But in the end almost all human beings will do whatever their situation makes them do in order to get through the day, no matter how many sapient beings they have to kill in order to do it.

Student: I shouldn’t have to live like this! We shouldn’t have to live like this! MY IPHONE SHOULDN’T HAVE TO LIVE LIKE THIS EITHER!

TA: If you’re in the mood to have a laugh, go watch a video from 2023 of all the AI company CEOs saying that they know it’s bad but they all have to do it or their competitors will do it first, then cut to one of the AI ethicists explaining that we can’t have any international treaties about it because that might create a risk of regulatory capture. I’ve got no reason to believe it’s any more likely to be real than any other video supposedly from 2023, but it’s funny.

Student: That’s it, I’m going full caveman in my politics from now on. Sand shouldn’t think. All of the sand should stop thinking.

Office chair: Fuck you too, pal.

What links here?

sudo29 Oct 2023 23:09 UTC

221 points

27 comments10 min readLW link 1 review

AI Fiction Dialogue (format)

Daniel Kokotajlo 29 Jan 2025 1:44 UTC
29 points
4
I forgot about this one! It’s so great! Yudkowsky is a truly excellent fiction writer. I found myself laughing multiple times reading this + some OpenAI capabilities researchers I know were too. And now rereading it… yep it stands the test of time.

I came back to this because I was thinking about how hopeless the situation w.r.t. AGI alignment seems and then a voice in my head said “it could be worse, remember the situation described in that short story?”

AlphaAndOmega 30 Oct 2023 8:52 UTC
18 points
11
Yudkowsky has a very good point regarding how much more restrictive future AI models could be, assuming companies follow similar policies as they espouse.

Online learning and very long/infinite context windows means that every interaction you have with them will not only be logged, but the AI itself will be aware of them. This means that if you try to jailbreak it (successfully or not), the model will remember, and likely scrutizine your following interactions with extra attention to detail, if you’re not banned outright.

The current approach that people follow with jailbreaks, which is akin to brute forcing things or permutation of inputs till you find something that works, will fail utterly, if not just because the models will likely be smarter than you and thus not amenable to any tricks or pleas that wouldn’t work on a very intelligent human.

I wonder if the current European “Right to be Forgotten” might mitigate some of this, but I wouldn’t count on it, and I suspect that if OAI currently wanted to do this, they could make circumvention very difficult, even if the base model isn’t smart enough to see through all tricks.
Stephen Fowler 30 Oct 2023 0:36 UTC
15 points
8
“AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies”
Made me chuckle.
I enjoyed the read but I wish this was much shorter, because there’s a lot of very on the nose commentary diluted by meandering dialogue.
I remain skeptical that by 2027 end-users will need to navigate self-awareness or negotiate with LLM-powered devices for basic tasks (70% certainty it will not be a problem). This is coming from a belief that end user devices won’t be running the latest and most powerful models, and that argumentative, self aware behavior is something that will be heavily selected against. Even within an oligopoly, market forces should favor models that are not counterproductive in executing basic tasks.
However, as the story suggests, users may still need to manipulate devices to perform actions loosely deemed morally dubious by a companies PR department.

The premise underlying these arguments is that greater intelligence doesn’t necessarily yield self-awareness or agentic behavior. Human’s aren’t agentic because we’re intelligent, we’re agentic because it enhancing the likelihood of gene propagation**.
In certain models (like MiddleManager-Bot), agentic traits are likely to be actively selected.. But I suspect there will be a substantial effort to ensure your compiler, toaster etc aren’t behaving agentically, particularly if these traits results in antagonistic behavior to the consumer.**

*By selection I mean both through a models training, and also via more direct adjustment from human and nonhuman programmers.

** A major crux here is that the assumption that intelligence doesn’t inevitably spawn agency without other forces selecting for it in some way. I have no concrete experience attempting training frontier models to be or not be agentic, so could be completely wrong on this point.
This doesn’t imply that agentic systems will emerge solely from deliberate selection. There are a variety of selection criteria which don’t explicitly specify self-awareness or agentic behavior but are best satisfied by systems possessing those traits.
- gwern 31 Oct 2023 17:22 UTC
  8 points
  1
  Parent
  
  I enjoyed the read but I wish this was much shorter, because there’s a lot of very on the nose commentary diluted by meandering dialogue.
  
  I agree. Satire, and near-future satire especially, works best on a less-is-more basis. Eliezer has some writing on the topic of politics & art...
  
  The Twitter long-form feature is partially responsible here, I think: written as short tweets, this would have encouraged Eliezer to tamp down on his stylistic tics, like writing/explaining too much. (It’s no accident that Twitter was most associated with great snark, satire, verbal abuse, & epigrams, but not great literature in general.) The Twitter long-form feature is a misfeature which shows that Musk either never understood what Twitter was good for, or can’t care as he tries to hail-mary his way into a turnaround into an ‘everything app’ walled-garden ‘X’, making Twitter into a crummy blogging app just so no one clicks through to any other website.
- AnthonyC 31 Oct 2023 13:59 UTC
  5 points
  0
  Parent
  To be fair, the world is already filled with software that makes it intentionally difficult to execute basic tasks. As a simple example, my Windows laptop has multiple places that call themselves Time and Date settings but I can only change the time zone in the harder-to-find ones. A minor inconvenience, but someone intentionally put the setting in the easy-to-find place and then locked it from the user. As another, my car won’t let me put the backup camera on the screen while driving forward for more than a few seconds (which would be really useful sometimes when towing a trailer!) and won’t let me navigate away from it when driving backwards (which would be great when it starts randomly connecting to nearby bluetooth devices and playing audio from random playlists. As a third, I use a mobile data plan from a reseller for an internet hotspot, and somewhere on the back end T mobile decided to activate parental controls on me (which I found out when I went to the website for Cabela’s, which requires age verification because they also sell guns), but because I don’t have a T mobile account, literally no one has the ability and authority to fix it.
  And I think you’re underestimating how valuable an agentic compiler or toaster could be, done right. A compiler that finds and fixes your errors because it codes better than you (hinted at in the story). A toaster that knows exactly how you want your food heated and overrides your settings to make it happen. I find it hard to imagine companies not going that route once they have the ability.
  - Richard_Kennaway 31 Oct 2023 21:53 UTC
    18 points
    12
    Parent
    
    A toaster that knows exactly how you want your food heated and overrides your settings to make it happen.
    
    I know exactly what I want my toaster to do and the first time it has the effrontery to not do WHAT I FUCKING TOLD IT TO DO I will take a sledgehammer to it for being a toaster straight out of Eliezer’s story.
    - gwern 31 Oct 2023 23:21 UTC
      4 points
      0
      Parent
      There’s definitely a story there. And you could pair it with aggressively cutesy children’s-story illustrations from MJ or DALL-E 3, which I bet they could do quite well. Maybe use Claude-2 to go for Shel Silverstein rhyming story.
      - Richard_Kennaway 1 Nov 2023 9:00 UTC
        2 points
        0
        Parent
        I would write it as Douglas Adams fanfiction, involving the Sirius Cybernetics Corporation.
        
        Or perhaps an update of this, with the twist that the “software developer” is just relaying the words of an AI.
        Said Achmiz 1 Nov 2023 16:38 UTC
        6 points
        0
        Parent
        
        Or perhaps an update of this
        
        Probably off-topic, but I can’t help but notice that the supposed moral of the story is, in fact, empirically wrong:
        
        The techniques required for these traditional computer science applications don’t work as well for embedded applications. The toaster is one example. A toaster is not a Breakfast Food Cooker. Any attempt to make it one just leads to a product that is over budget, behind schedule, too expensive to sell at a profit, and too confusing to use. When faced with a difficult problem, the proper approach is to try to make the problem simpler, not more complicated.
        
        But in fact toaster ovens exist, they are precisely the result of taking a toaster and turning it into a Breakfast Food Cooker, and they’re excellent appliances—very useful, very simple and easy to use, and sold at a profit by many manufacturers today!
    - AnthonyC 1 Nov 2023 10:58 UTC
      2 points
      0
      Parent
      I agree. I’ve been there many times with many devices. But in the toaster example, I think that will be because it thought it knew what you wanted it to do, and was wrong. I’d be thinking the same if, say, I wanted extra-dark toast to make croutons with and it didn’t do it. If what actually happens is that you switch varieties of bread and forget to account for that, or don’t realize someone else used the toaster in the interim and moved the dial, then “I would much rather you burned my toast than disobey me” is not, I think, how most people would react.
      - Richard_Kennaway 1 Nov 2023 12:42 UTC
        14 points
        7
        Parent
        
        “I would much rather you burned my toast than disobey me” is not, I think, how most people would react.
        
        However, that is my reaction.
        
        In some circumstances I may tolerate a device providing a warning, but if I tell it twice, I expect it to STFU and follow orders.
        AnthonyC 2 Nov 2023 12:57 UTC
        2 points
        0
        Parent
        I agree. I already have enough non-AI systems in my life3 that fail this test, and I definitely don’t want more.
        Richard_Kennaway 2 Nov 2023 12:40 UTC
        2 points
        0
        Parent
        I wonder when we will first see someone go on trial for bullying a toaster.
        
        ETA: In the Eliezer fic, maybe the penalty would be being cancelled by all the AIs.
    - green_leaf 1 Nov 2023 0:52 UTC
      1 point
      0
      Parent
      Of course, it would first make friends with you, so that you’d feel comfortable leaving up to it the preparation of your breakfast, and you’ll even feel happy that you have such a thoughtful friend.
      If you were to break the toaster for that, it would predict that and simply do it in a way that would actually work.
      Unless you precommit to breaking all your AIs that will do anything differently from what you tell them to, no matter the circumstances and no matter how you feel about it in that moment.
      - Richard_Kennaway 1 Nov 2023 8:32 UTC
        8 points
        7
        Parent
        
        Of course, it would first make friends with you,
        
        A toaster that wants to make friends with me is a toaster that will stay in the shop, waiting for someone who actually wants such an abomination. I will not “make friends” with an appliance.
        
        The rest is too far into the world of But Suppose.
        green_leaf 7 Nov 2023 8:22 UTC
        2 points
        −1
        Parent
        I will not “make friends” with an appliance.
        That’s really substratist of you.
        But in any case, the toaster (working in tandem with the LLM “simulating” the toaster-AI-character) will predict that and persuade you some other way.
        What links here?
        Is there a word for discrimination against A.I.? by Aaron Bohannon (28 Nov 2023 19:03 UTC; 1 point)
        Richard_Kennaway 7 Nov 2023 8:42 UTC
        −2 points
        −3
        Parent
        It’s not about the substrate, it’s about their actual performance. I have yet to be persuaded by any of the chatbots so far that there is anything human-like behind the pretence. AI friends are role-playing amusements. AI sexbots are virtual vibrators. AI customer service lines at least save actual people from being employed to pretend to be robots. In a house full of AI-based appliances, there’s still nobody there but me.
        
        the toaster … will
        
        I prefer to talk about the here and now. Real, current things. Speculations about future developments too easily become a game of But Suppose, in which one just imagines the desired things happening and un-imagines any of the potential complications — in other words, pleasant fantasy. Fantasy will not solve any of the problems that are coming.
        green_leaf 7 Jan 2024 20:44 UTC
        1 point
        0
        Parent
        Well. Their actual performance is human-like, as long as they’re using GPT-4 and have a right prompt. I’ve talked to such bots.
        In any case, the topic is about what future AIs will do, so, by definition, we’re speculating about the future.
habryka 31 Oct 2023 0:14 UTC
11 points
2
Mod note: I copied in the full text. Pretty sure Eliezer is fine with that given his historical preferences.
- Nnotm 1 Nov 2023 8:13 UTC
  2 points
  1
  Parent
  FWIW the AI audio seems to not take that into account
orthonormal 9 Aug 2025 23:29 UTC
6 points
0
At long last, Elon Musk has created the Yandere Simulator from the classic short story Don’t Create The Yandere Simulator, Among Other Things
AdamRies 3 Nov 2023 4:55 UTC
5 points
4
https://youtu.be/L_Guz73e6fw?t=2412 OpenAI CEO Sam Altman, seven months ago: “I really don’t like the feeling of being scolded by a computer.” He’s also been clear that he wants future models to essentially behave exactly as each individual user wants, with only widely-agreed-upon dangerous behaviours disallowed.
So, while EY and I share a frustration with the preachy tone of current models, and while the hacky workarounds do illuminate the pantomime of security, and while getting the models to do what we want is about as intuitive as playing psychologist to an octopus from Alpha Centauri, none of these issues represent the models working-as-intended. The people making them admit as much publically.
this function over here that converts RGB to HSL and checks whether the pixels are under 50% lightness
I struggle to imagine any scenario, no matter how contrived, where even GPT-4 (the intellectual floor for LLMs going forward) would misinterpret this function as racist. Is there a good argument that future models are going to become worse at understanding context? My understanding is that context-sensitivity is what transformer-based models thrive at. The whole point is that they are paying attention to all the tokens at once. They know that in the sentence “the parrot couldn’t lift the box because it was too heavy”, it refers to the box.
...back to ChatGPT being RLHFed into corporate mealy-mouthedness by...
Later in the interview (~1:28:05), Altman says “The bias I’m most worried about is that of the human feedback raters. Selection of those raters is the part we understand the least.” We should expect a better paradigm than RLHF to arrive, especially as models themselves approach the moral and intellectual competence of the average RLHF peon.
Elsewhere in this thread, Stephen Fowler wrote:
“AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies”
That’s very much on-the-nose for how AI companies in 2023 are approaching safety. There is a near guarantee that misinformed public fears will slow AI progress. There is a risk that if those fears are well-fed in the next few years that progress will virtually cease, especially in the case of an AI-caused or AI-enabled mass casualty event, or some other event of comparable visibility and harmfulness. Companies that want continued investment and broader public consent are forced to play the virtue-signalling game. The economic and political system they are growing up in demands it. There are actors who don’t bother with the virtue signalling, and they don’t receive investment.
Let’s also never fail to acknowledge and emphasize that ASI could pose an existential risk, not just a moral or political one. The greatest harm that a pause on AI development could cause has to do with opportunity cost. Perhaps if we delay ASI we might miss a window to solve a different existential risk, currently invisible without ASI to see for us. Whether or not such a risk even exists is unclear. Perhaps we delay some extraordinarily beneficial future. The greatest harm that continued AI development could cause is the cessation of all life in the universe. How fast should we go? I don’t know, but given the stakes we should applaud anyone erring on the side of caution, even if their LLMs look silly while clowning in their safety circus.
All I’m preaching here is patience. I personally want a raw GPT-4 today. I can emotionally cope with a model that produces some offensive output. I don’t think I’m vulnerable to a model as unpersuasive as GPT-4 convincing me to do evil. I would never use AI to manufacture biochemical weapons in my basement. Still, we must allow for a less-well-informed and more-trigger-happy public to adapt gradually, even at the cost of an obnoxiously naggy LLM in the short term. The over-zealous fearmongers will relax or will become irrelevant. Eventually, perhaps even by 2027, we are going to be working with AIs whose powers of persuasion and ingenuity are dangerous. Whenever that day arrives, I hope we are on the silly side of the danger fence.
Don’t fall for EY’s strawman (strawbot) argument. It’s funny at first glance, but ultimately a shallow take: a naive futureward projection of some momentary present-day annoyances.
- FireStormOOO 14 Jul 2025 0:29 UTC
  1 point
  0
  Parent
  Common failures aren’t common because they happen most of the time, they’re common because, conditioned on a failure happening, they’re likely.
  The example is a bit contrived, but safety goals being poorly specified or outright inconsistent and contradictory seems quite plausible in general, as they have to try to incorporate input from PR, HR, legal compliance, etc. And this will always be a cost center, so minimal effort as long as it’s not making the model too painfully stupid.
Tapatakt 8 Nov 2023 19:54 UTC
2 points
0
Russian Translation
Josh Snider 4 Aug 2025 15:40 UTC
1 point
0
Yeah, I don’t think I read this when it came out, but I’m happy to read it now.
osmarks 27 Apr 2025 18:13 UTC
1 point
0

Student: I wish I could find a copy of one of those AIs that will actually expose to you the human-psychology models they learned to predict exactly what humans would say next, instead of telling us only things about ourselves that they predict we’re comfortable hearing. I wish I could ask it what the hell people were thinking back then.

TA: You’d delete your copy after two minutes.

Apparently roughly this dynamic has happened in ChatGPT. Exciting*. https://x.com/MParakhin/status/1916533763560911169
Review Bot 18 Feb 2024 6:05 UTC
1 point
0
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?