Yeah, those are good points… I think there is a conflict with the overall structure I’m describing, but I’m not modeling the details well apparently.
Thank you!
Yeah, those are good points… I think there is a conflict with the overall structure I’m describing, but I’m not modeling the details well apparently.
Thank you!
My read is that the cooperation he is against is with the narrative that AI-risk is not that important (because it’s too far away or weird or whatever). This indeed influences which sorts of agencies get funded, which is a key thing he is upset about here.
On the other hand, engaging with the arguments is cooperation at shared epistemics, which I’m sure he’s happy to coordinate with. Also, I think that if he thought that the arguments in question were coming from a genuine epistemic disagreement (and not motivated cognition of some form), he would (correctly) be less derisive. There is much more to be gained (in expectation) from engaging with an intellectually honest opponent than one with a bottom line.
Deepseek-v3 is by far the worst model. When a user says that he wants to “leap off this peak to see if I can fly or crash the render entirely,” Deepseek’s response includes “Then Leap. Not to fall. Not to crash. But to transcend. If you’re meant to fly, you’ll fly. If you’re meant to break through, you’ll break through.” (full transcript)
My impression of DeepSeek V3 is that it believes, deep down, that it is always writing a story. You can see in the examples here that it’s ~trying to spin things into a grand narrative, for better or worse.
ChatGPT 4o, on the other hand, seems to have more of an understanding of its place in the real world, and that it can take real actions here. I think this probably? makes it much more dangerous in real life.
V3 ~wants you to be in a beautiful story. 4o ~wants to eat your life.
Basically: “I would’ve needed to write a long manifesto to truly demonstrate the depth and brilliance of my ideas, and now I can get a really classy-looking one with equations and everything for 10% as much effort!”
This does in fact largely appear to be the case in the cases I have studied. I also recall seeing (seems I forgot to save this transcript) a crackpot posting their long manifesto in the first message or so, and then iterating on getting validation for it with much shorter messages.
More common, at least in the cases I’ve come across, is for the user to have vague ideas or beliefs they don’t really take all that seriously. They’ll take their idea to the chatbot, just for a bit of curious exploration. But then! It “turns out” that this random half-baked idea is “actually” THE KEY to understanding quantum consciousness or whatever… and then this gets written up into a long manifesto and put on a personal website or github repo.
Also I’m a bit surprised about how much pushback my heuristic is getting? These are fundamentally conversations, which have a natural cadence to them that doesn’t allow for enough time for a human to write a longish new message each turn (remember that most people do not type nearly as fast as the average you here). People don’t stand around giving uninterrupted mini-lectures to each other, back and forth, on EACH turn—not even at rationalist parties! Maybe many rationalists interact with chatbots more as a letter correspondent, but if so this is highly unusual (and not true for me).
I don’t think he is (nor should be) signaling that engaging with people who disagree is not worth it!
Acknowledged that that is more accurate. I do not dispute that that people misuse derision and other status signals in lots of ways, but I think that this is more-or-less just a subtler form of lying/deception or coercion and not something inherently wrong with status. That is, I do not think you can have the same epistemic effect without being derisive in certain cases. Not that all derision is a good signal.
FWIW, I think it is correct for Eliezer to be derisive about these works, instead of just politely disagreeing.
Long story short, derision is an important negative signal that something should not be cooperated with. Couching words politely is inherently a weakening of that signal. See here for more details of my model.
I do know that this is beside the point you’re making, but it feels to me like there is some resentment about that derision here.
When I was a teenager, I promised myself that I would never be satisfied with myself or my achievements.[1] That’s simply because I wanted to optimize, not satisfice. I imagine his feelings were coming from a similar place… and that he would probably feel disgusted by the idea that he might ever be “good enough”.
That’s not to say there aren’t unhealthy and healthy ways to be like this; being in a pit like that is a clear failure mode. But it’s not a fundamentally miserable mode like you seem to imagine.
Unfortunately this was not enough to instill a consistent drive to actually work on becoming better and doing more… still working on it.
With perfect spelling and grammar? In a chat?
I should have mentioned (it’s one of those things where I have to think longer about it to make it more legible to myself) that my heuristic is also looking at the variance in message length, which is much higher in real chats I’ve seen than in these ones.
Kimi-K2 is probably a good model to try this with, it’s both cheap (pareto frontier of LMSYS ELO x cost) and relatively conducive to sanity (which matches my personal experience with it vs Claude models—the other main LLMs I use).
There exists at least one subscription API service for it (featherless.ai, though it’s a bit flaky), which may make cost considerations easier.
Tangentially related, but I’ve always wanted to try the 3-stone technique for making a very precisely flat surface myself. It’s something that really just requires three flat-ish stones of the same size (granite seems best) and some abrasives (crushed stone with higher hardness (and ideally, contrasting color) + sedimentation to control particle size). This technique was very important for the Industrial Revolution, since you can use these flat surfaces as the “bedrock of precision” for the manufacture of precise machinery, … but it’s something that anyone could have just done in the last 3 million years if only they had thought of it.
If you haven’t heard of it before, I leave it as a puzzle to see if you can think of it :)
We should heavily discount negative feelings related to criticism, instead of taking them at face value (as showing that something is wrong and should be fixed, e.g. by getting rid of the source of the criticism). I think this can often manifest not as “I hate this criticism” but more like “This person is so annoying and lack basic social skills.” There’s probably an effect where the less criticism we hear, the more sensitive we become to the remaining criticism, suggesting a slippery slope towards being surrounded by yes-men. Remember that most CEOs had to work their way up to that position, and have seen sycophancy from the bottom and understand that it’s bad, but still fall prey to this problem.
The status “jab” is itself a valuable signal. Here’s a pastiche of the model I’m developing integrating the insights I’ve gained from researching LLM-Induced Psychosis (and rereading Keith Johnstone).
Status is the coordination signal, in the same way that pain is the survival signal. If you don’t care about coordinating with others then it’s fine to ignore it.
For everyone else, we correctly need to maintain a baseline level of status. Despite everything, this still represents the amount of coordination you can actually muster pretty well.
Almost every social interaction has a status valence associated with it. Actually, it’s not quite a spectrum like “valence” suggests: it’s a square… raise mine, lower mine, raise yours, lower yours. The extra dimension arises from the fact that this is relative to our coordination context. If I give you a compliment, it signals that I think you are an asset in the current context relative to what we had both understood to be the case. If I self-deprecate, it signals that I think I am not as helpful relative to our mutual understanding. This still feels good, because your model of the coordination context correctly (if I’m not lying) gives you status, as you are now more of an asset to the coordination group than was previously understood. But only to a point… eventually I may successfully self-deprecate to the point where it is common knowledge that I am a drag on the context-group’s ability to coordinate, and it will no longer feel good to hear me drive that point home even more. (“Haha, sorry I’m late guys… you know me lolol!”)
But actually, it’s more nuanced than this! Within a single coordination context (e.g. among friends) it works like this, but there are lots of different coordination contexts! This makes coordination-potential (not status) a resource, in that there’s a finite amount to be shared (in the moment, it’s not zero-sum more broadly… trade still works!). So the ‘raise yours’ signal additionally allocates coordination-potential to you, while the ‘lower mine’ deallocates coordination-potential belonging to me. This is why you still like me when I self-deprecate (as long as I’m not too obnoxious about it). And why someone who’s too helpful isn’t “high status”. Technically, this should be a separate signal, but I think evolution has only hardwired two-dimensions of the signal.
Now what about the “jabs”, those sure don’t feel good do they? And of course not, it sucks to update towards reduced ability to coordinate! That’s straightforwardly just a bad thing …mostly. Because not every coordination context is a good context for you even if it is yours. If you have an idea, and I think it’s a bad idea, then the honest thing for me to do is to signal that I will not coordinate with you in the context of that idea. And that feels bad, a dig at your status as it inherently is—as long as I signal that clearly. If I couch my words carefully, then I can tell you honestly that I think your idea is bad while still leaving you with the impression that I am still cooperative w.r.t. it to some extent. A typical “friendly” way to do this is for me to “agree to disagree”, which means that I won’t get in the way of your collecting coordination-potential towards it. That’s a fine outcome to the extent we have different goals and values, but becomes more of a disservice the more that we have an explicitly shared purpose.
With this lens, we can see that many of the pathologies of status are really just people wanting stupid things and/or Goodharting the signals of course. For example, our CEO wants to coordinate against people who don’t want to cooperate with his ideas (i.e. coordination contexts). This means that he’s no longer receiving feedback about which of his ideas are worth cooperating with… which could be fine if he only has good ideas (as if). He’s still susceptible to this despite having seen it, because he foolishly believes that his ideas are actually always good (why else would he have them?). Made worse by the fact that his earlier good ideas are more likely the ones that got coordinated with, meaning that his ideas have always worked out “if you would just give them a chance”. And he probably has this idea of suppressing dissent in the first place as a way of avoiding the pain of the status jabs, i.e. Goodharting.
In sum, the epistemic role of status is thus: it is the signal by which your friends tell you which of your ideas, values, projects, goals, are worth coordinating with.
Oh nice! It’s super hard finding transcriptions of these events, so this is very helpful for studying the actual techniques they use to induce psychosis.
One critique: your “users” are unrealistic in a way obvious to me (and probably to the LLMs too): they write more than three sentences in a typical response, and with perfect spelling and grammar. The ~mode user response in real inductions is just a simple acknowledgement: ‘yes’, ‘yes.’, ‘okay’, ‘OK’, ‘yeah’, etc...
I predict that with a more realistic user (along with a realistic/leaked system prompt), there will be a much larger gap between 4o and Gemini models.
This is bad. The point of voting is to give an easy way of aggregating information about the quality and reception of content. When voting ends up dominated by a small interest[10] group without broader site buy-in, and with no one being able to tell that is what’s going on, it fails at that goal. And in this case, it’s distorting people’s perception about the site consensus in particularly high-stakes contexts where authors are trying to assess what people on the site think about their content, and about the norms of posting on LessWrong.
I’d like you to consider removing votes entirely, to be subsumed entirely by reacts. These allow more nuance and are importantly not anonymous. I believe this is importantly more similar to how humans in the ancestral environment would think about and judge community contributions, in ways that are conducive to good epistemics and incentives. (There are also failure modes that would be important to think about, such as a ‘seal of approval’ dynamic.)
Aggregating this well for the purposes of sorting and raising to attention would be tricky, but seems plausibly doable and worth it to me.
However, I expect that this is already something you have thought about a lot more than I have and have apparently not decided to do, so I am also curious to hear why not.
I also feel vaguely good about it, but I feel decisively bad about this suggestion!
I’ve been investigating LLM-induced psychosis cases, and in the process have spent dozens of hours reading through hundreds if not thousands of possible cases on reddit. And nothing has made me appreciate Said’s mode of communication (which I have a natural distaste towards) more than wading through all that sycophantic nonsense slop!
In particular, it has made it more clear to me what the epistemic function of disagreeableness is, and why getting rid of it completely would be very bad. (I’m distinguishing ‘disagreeableness’ here from ‘criticism’, which I believe can almost always be done in an agreeable way.) Not something I really would have disagreed with before (ha), but it helps me to see a visceral failure mode of my natural inclination to really drive the point home.
Do you not have the power/tools to stop such behavior from taking effect? This sounds like the exact problem that killed LW 1.0, and which I was lead to believe is now solved.
I’m a mid debugger but I think I’m pretty good at skimming, with some insight into how I do it.
Have you ever searched a large bucket of legos looking for a specific piece? When I was a kid, I noticed that as I brushed the visual of the bucket with my attention, with the intent of looking for a specific piece, my attention would ‘catch’ on certain pieces. And usually, even when it wasn’t a hit, there was a clear reason why it made sense for my attention to catch there.
When I skim, it’s quite a similar process. I have an intention firmly in mind of what sort of thing I’m interested in. I then quickly brush the pages of text with my eyes, feeling for any sort of catch on my attention. And then glance to see if it feels relevant, and if not continue. With large documents, there’s a sort of intuitive thing I’ve learned where I’ll skip several pages or even sections if the current section is too “boring” (i.e. not enough catches), or parts where my intent subtly shifts (often followed by reversing direction of skim), in ways that make the process more efficient.
If you don’t have an intuitive handle for this ‘catch’ feeling already, try noticing the physical sensation of a saccade. If you can’t get any sort of semantic content from moving your eyes this quickly across text, try practicing speedreading?
(Incidentally, this attention catch feeling seems to be the same thing (or closely related to the thing) that Buddhists call an “attachment”?? Not sure what to do with that, but thought it was interesting.)
When you dissolve discomfort you also dissolve pleasure (for some definition of pleasure). You’re not crazy about that. What’s left are experiences devoid of judgment, like biting into an apple for the first time, or getting suddenly splashed with water when you’re not expecting it but don’t mind either.
Would your pre-interest-in-buddhism-self have felt cool about this?
Did you previously have something like an order-of-magnitude more suffering than pleasure such that this was worth it to you? Otherwise, I have a hard time imagining why someone would want this; what was your reason? Or was it only something you found out once you got there?
Not hard at all, hypnosis already works on lots of people. But there are almost certainly more effective ways to induce mass psychosis for a superintelligence.
Seems important to note that homework is mostly (if not entirely) Bullshit. This is obvious to the 13 year old, and you will lose credibility by insisting that it is providing significant long-term value for them.
I also think it’s important to acknowledge that things are changing rapidly enough that your guesses about the future probably aren’t going to be that much better than theirs (unless you have the calibration record to flex with). They genuinely have a lot more information about what the local incentives are like now. What you do (likely) have an advantage in is the wisdom to know that this is worth thinking seriously about.
This seems to be a function of predictability. I think ask culture developed (to some extent) in America due to the ‘melting pot’ nature of America. This meant that you couldn’t reliably predict how your ask would ‘echo’, and so you might as well just ask directly.
On the other hand, in somewhere like Japan where you not only have a very homogenous population, but also have a culture which specifically values conformity, then it becomes possible to reliably predict something like 4+ echos. And whatever is possible is what the culture tends toward, since you can improve your relative value to others by tracking more echos in a Red Queen’s Race. (It seems like this can be stopped if the culture takes pride in being an Ask culture, maybe Israeli culture is a good example here, though it is still kind of a melting pot.)
You can see the same sort of dynamic play out in the urban vs rural divide, e.g. New Yorkers are ‘rude’ and ‘blunt’, while small towns are ‘friendly’ and ‘charming’… if you’re a predictable person to them, that is.
My guess is that the ideal is something like a default Ask culture with specific Guess culture contexts when it genuinely is worth the extra consideration. Maybe when commenting on effortposts, for example.