I agree that creating bespoke AI love interests that are fully sentient beings would have problems in its own right. Both scenarios are unsettling for different reasons.
brambleboy
Here’s a simple reason why “X% of our code is written by AI” doesn’t mean much: I could write 100% of my code with an LLM from three years ago. I would just have to specify everything in painstaking detail, to the point where I’m almost just typing it myself. It certainly wouldn’t mean I’ve become more productive, and if I was an AI developer, it wouldn’t mean I’ve achieved RSI.
Now, percentage of AI-written code is probably somewhat correlated with productivity gains in practice, but AI companies seem to be Goodharting this metric.
I have a view of LLMs that I think is super important, and I have a lengthy draft post justifying this view in detail that’s been lying around for over a year now. I’ve decided to finally just get the main points out there without much elaboration or editing.
LLMs are still basically just predicting what token comes next. This isn’t a statement about their intelligence or capabilities! This is just what they’re trying to do, as opposed to trying to make things happen in the world or communicate certain things to people.
There are partial explanations as to why LLMs hallucinate, such as:
they’re deceptive
their intelligence is fake
they have poorly calibrated confidence
they have glitches in the attention mechanism
they’re not incentivized to say “I don’t know”
… but they fail to explain all the weird hallucinatory behaviors at once. “This is just a prediction of what a hypothetical AI assistant might say” straightforwardly explains hallucinations.
The difference between the underlying LLM (“the shoggoth”) and the character it’s predicting the behavior of (“the mask”) is still incredibly distinct and important.
AI companies try to hide this distinction because it’s confusing and they hope it won’t matter in the future, so they name both the LLM and the assistant character “Claude” or whatever. This just confuses everyone even more. This would seem obviously silly in other contexts: Imagine if OpenAI named their video model “Sora”, and also named a robot character that appears in the model’s videos “Sora”, and made the robot say “Hi! I’m Sora, a text-to-video model developed by OpenAI!”, and the world only cared about debating whether “Sora” the robot is friendly or not.
Hallucinations can be mitigated by:
providing examples of the assistant character elegantly correcting weird confabulations instead of turning evil or going insane, to avoid the Waluigi Effect
iteratively shrinking the gap between what the LLM predicts the assistant will do or say, and what they LLM is actually capable of (for example, make the assistant’s knowledge cutoff the same as the LLM’s knowledge cutoff)
...but as long as the LLM is still just trying to predict what text is coming up next, as opposed to trying to write the text for a particular end, the issue will never fully go away.
“But we have RL post-training that turns the base LLM into a consequentialist agent!” No, it doesn’t (yet). If that were true, it wouldn’t be hallucinating. Outcome-based RL is inefficient right now and mostly just biases the predictions towards a few good problem-solving tricks, and RLHF was always just fancier fine-tuning.
For all of pretraining, the LLM has zero ability to influence the world. It has no experience with changing the data it’s seeing. Why would it be easy to teach it to do this? There’s no simple way to snap an AI whose goal is world-predicting into an AI whose goal is world-influencing; these things are superficially similar to us humans, but to think we can go from one to the other with a little post-training is like thinking we can breed cats into bats in a few centuries.
Am I saying this to downplay AI progress? No! In fact, I think this implies:
There might be a huge capabilities overhang, because current AIs aren’t even trying!
Current interpretability and alignment techniques totally break if the LLM starts scheming while the LLM’s model of the assistant remains innocent! Our methods can’t work without these distinctions!
Philosophers have come up with a bunch of elaborate, if flawed, arguments for moral realism over the years. This professor gave me the book The Moral Universe which is a recent instance of this. To be fair, people who haven’t already gotten got by modern philosophy or religion can be sold a form of anti-realism with simple thought experiments, like the aliens who desire nests with prime-numbered stones from IABIED.
I think moral realism is something many people believe for emotional reasons (“How DARE you suggest otherwise?”), but it’s also a conclusion that can be gotten to with subtly flawed abstract reasoning.
You could probably sidestep the moral realism debate when talking about x-risk, because it seems plausible that AI could be wrong about morality, or it could simply be an unfeeling force of nature to which moral reasoning doesn’t apply. I’m realizing now that if I wasn’t so eager to debate morality, I could’ve avoided it altogether.
Given that the basic case for x-risks is so simple/obvious[1], I think most people arguing against any risk are probably doing so due to some kind of myopic/irrational subconscious motive.
It isn’t simple or obvious to many people. I’ve discussed it with an open-minded philosophy professor and he had many doubts, like:
doubts about the feasibility of building AGI or ASI (he had read objections like Searle’s Chinese Room and didn’t know what ChatGPT is capable of currently)
doubts about such an AI having goals
doubts about the plausibility of an ASI wanting us dead, due to his credence in moral realism
doubts about the feasibility of the AI gaining power (he asked “How would it get all the energy? Couldn’t we just unplug the data center or whatever?”)
doubts about this being more concerning than mainstream risks, like autonomous weapons
So far I’ve had answers to these things, but they required their own long discussions, and the thornier ones (like moral realism) didn’t get resolved. Overall, he seems to take it somewhat seriously, but he also has lots of experience with philosophers, students, coworkers, etc. trying to convince him of weird things, so it’s unfortunately understandable that he isn’t that concerned about this thing in particular yet.
I suppose you could argue that all of his objections are trivial and he’s obviously biased, but I don’t think that tackling his emotions instead of his arguments would help much.
Wanting competent people to lead our government and wanting a god to solve every possible problem for us are different things. This post doesn’t say anything about the former.
I believe the vast majority of people who vote in presidential elections do so because they genuinely anticipate that their candidate will make things better, and I think your view that most people are moral monsters demonstrates a lack of empathy and understanding of how others think. It’s hard to figure out who’s right in politics!
brambleboy’s Shortform
Some people can be too dismissive of the differences between humans and LLMs.
One one hand, it’s true that some people cherry-pick the mistakes that LLMs make and use them to denounce their intelligence, even though they’re mistakes that many humans make. For example, some have said LLMs can’t be intelligent because they can’t multiply big numbers accurately without a calculator or a scratchpad; but humans can’t do that, either.
On the other hand, I see people hand-wave away some important things. Someone will point out how strange it is that LLMs still hallucinate, and someone else will say “nah, humans make things up all the time!” But like, if you ask an LLM for someone’s biographical information, it sometimes will give highly specific fake details mixed in with real details, without being misled by unreliable sources or having an agenda to persuade you of. Even an overconfident and dishonest human wouldn’t do that. There’s clearly something different in kind from what we humans do.
I don’t think this means much, because dense models with 100% active parameters are still common, and some MoEs have high percentages, such as the largest version of DeepSeekMOE with 15% active.
It’s sad because the AI partners in the story seem to be fake. Not fake because they’re AI, fake because they’re fiction. For example, it’s sad to fall in love with a character on character.ai because the LLM is simply roleplaying, it’s not really summoning the soul of Hatsune Miku or whoever. I assume the world models are the same; they’re basically experience machines.
This tells me that people might step into experience machines not because they don’t care about reality, but because they convince themselves the world inside is reality.
Yes, their goal is to make extremely parameter-efficient tiny models, which is quite different from the goal of making scalable large models. Tiny LMs and LLMs have evolved to have their own sets of techniques. Parameter sharing and recurrence works well for tiny models but increases compute costs a lot for large ones, for example.
There was that RCT showing that creatine supplementation boosted the IQs of only vegetarians.
While looking for the RCT you’re referencing, I instead found this one from 2023 which claims to be the largest to date and which states “Vegetarians did not benefit more from creatine than omnivores.” (They tested 123 people altogether over 6 weeks; these RCTs tend to be small.)
A systematic review from 2024 states:
To summarize, we can say that the evidence from research into the effects of creatine supplementation on brain creatine content of vegetarians and omnivores suggests that vegetarianism does not affect brain creatine content very much, if at all, when compared to omnivores. However, there seems to be little doubt that vegans do not intake sufficient (if any) exogenous creatine to ensure the levels necessary for maintaining optimal cognitive output.
I tried googling to find the answer. First I tried “melting chocolate in microwave” and “melting chocolate bar in microwave”, but those just brought up recipes. Then I tried “melting chocolate bar in microwave test”, and the experiment came up. So I had to guess it involved testing something, but from there it was easy to solve. (Of course, I might’ve tried other things first if I didn’t know the answer already.)
This is a neat question, but it’s also a pretty straightforward recall test because descriptions of the experiment for teachers are available online.
I think alcohol’s effects are at least somewhat psychosomatic, but that doesn’t mean you can easily get the same effect without it. Once nobody’s actually drinking and everyone knows it, then the context where you’re expected to let loose is broken. You’d have to construct a new ritual that encourages the same behavior without drugs, which is probably pretty hard.
I agree that the vocals have gotten a lot better. They’re not free of distortion, but it’s almost imperceptible on some songs, especially without headphones.
The biggest tell for me that these songs are AI is the generic and cringey lyrics, like what you’d get if you asked ChatGPT to write them without much prompting. They often have the name of the genre in the song. Plus the way they’re performed doesn’t always fit with the meaning. You can provide your own lyrics, though, so it’s probably easy to get your AI songs to fly under the radar if you’re a good writer.
Also, while some of the songs on that page sound novel to me, they’re usually more conventional than the prompt suggests. Like, tell me what part of the last song I linked to is afropiano.
This is what I think he means:
The object-level facts are not written by or comprehensible to humans, no. What’s comprehensible is the algorithm the AI agent uses to form beliefs and make decisions based on those beliefs. Yudkowsky often compares gradient descent optimizing a model to evolution optimizing brains, so he seems to think that understanding the outer optimization algorithm is separate from understanding the inner algorithms of the neural network’s “mind”.
I think what he imagines as a non-inscrutable AI design is something vaguely like “This module takes in sense data and uses it to generate beliefs about the world which are represented as X and updated with algorithm Y, and algorithm Z generates actions, and they’re graded with a utility function represented as W, and we can prove theorems and do experiments with all these things in order to make confident claims about what the whole system will do.”(The true design would be way more complicated, but still comprehensible.)
Putting GPT back in the name but making it lowercase is a fun new installment in the “OpenAI can’t name things consistently” saga.
Looks like BS. They basically just prompted ChatGPT to churn out a bunch of random architectures that ended up with similar performance. It seems likely that the ones they claim to be “SoTA” just had good numbers due to random variation. ChatGPT probably had a big role in writing the paper, too. The grandiose claims reek of its praise.
I think your version of honesty is bad for reasons you seem to already have experience with: it’s easy to come up with elaborate justifications for why manipulating people’s beliefs will lead to good outcomes and might actually lead them to the truth.
I also struggle with habitually lying: specifically, I hide things about myself that other people would dislike. I found it easy to justify through reasoning like “they’ll think I’m bad or stupid if they know this, but that’s not true, so if I hide this from them they’ll have a more accurate view of me”. Now I realize that strategy requires lots of lying to maintain and distorts their view of me in all kinds of ways.