jimmy comments on Beyond Hardcoded Evolutionary Psychology

jimmy 10 Jun 2026 16:27 UTC
3 points
0
But I don’t know whether you’re trying to change cultural or individual rationality here:
Like, I’m mostly optimistic about getting a few individuals to not do Crappy Epistemics, whereas I feel like you’re targeting groups,
If it’s from the criticisms of the rationality community for not berating people into rationality, oops. I should have put those in quotation marks. The point was “you’ll hear this from someone in this mode”, not to assert those things myself. I mostly see those exhortations as examples of the thing they’re railing against.
I’m not really trying to change anything here, just describing what is actually going on.
which seems difficult if I’m one of very few people who get what you’re saying.
Oh, for sure :)
But to be clear, I’m excited that you seem to be picking up on the same thing from another angle, separate from anything I’ve said.
I’ve gotten pretty comfortable with my ability to help people see the things needed to realign themselves on some object level issue, but I’m still new at communicating the meta thing of how to help people see how this alignment process works. Difficult, but fun/interesting, and I don’t think too difficult to learn.
If you’re saying “we need dramatically better instrumental rationality, of which short-term optimization targets are a big component” then yes, strong agree. I feel like you’re saying something else though, maybe about coordination between humans?
Yeah, I think that’s a facet of it. But also, yeah, it’s more than that. The stuff about coordinating between humans is just another facet too.
There’s a failure mode in changework where people try to “fix their irrational fear”, or “walk the client through what they need to do to fix their irrational fear”. These seem like perfectly reasonable responses which is why people do them, but watch it play out enough times and you start to notice that the stubborn attempt to control away the fear is the problem. That once you notice that you don’t actually know your fear to be irrational, you naturally turn towards noticing whether you’re actually safe, and that’s the move that conditions away inappropriate fears. That once you notice that trying to push people towards having a certain “correct” orientation to their pain is actually the thing that causes the suffering, you naturally turn towards “what’s the real problem here?” and that’s what dissolves the suffering in a word or “a few messages”.
This move of “Oh, control isn’t working. I wonder why?” turns out to be very general. Not just applied to one’s own mind, or helping others with their own minds, but also helping others learn to help others with their own minds and so on and so on. When my friend asked for help getting her four year old to take her eyedrops, I was able to play with my friends discomfort which led to her being able to play with her daughters discomfort, which led to her daughter playing with her own discomfort. “You’re a brown belt in jiu jitsu, what do you mean how do I get my child to take her eyedrops!?” → “Sigh. I just feel like I shouldn’t have to use force… and I guess this is another one of those things where my own tension is telling her to be tense, huh?” → “Mommy, can we play the eyedrop game!?”
Valentine has a good post “We are already in AI takeoff”, which I took a stab at putting into my own words in a comment there.
In short, and translating into the language we’re using here, “trying to align AGI” is itself an instance of this same exact failure to align ourselves. Because no one is looking at it like “Oh, yeah, easy peasy. I predict that I will not experience prediction error, because I got this”. It’s all pushing back in attempt to control away prediction error because the consequences of failing are unimaginably bad, while failing to act on the uneasiness coming from predictions of this not panning out. Which turns out to be where all the most useful information is.
When I look instead towards “If this goes well, what does that look like? How did we get here?”, the answer I see is one where the people guiding the development of AI aren’t pushing away from any of the relevant information, let alone the information that they themselves perceive as most important with respect to whether what they’re doing is working and what kind of moves might actually work.
In other words, it’s one where the researchers themselves have enough embodied skill in alignment that they can approach the problem with their full faculties. Not just because “That’s what rationality is, and good instrumental rationality is necessary for succeeding at hard things”, but also “it’s literally the same skill”. In the same way that relating to one’s own mind is the same skill as helping a friend relate to theirs is the same as helping a friend helping their kid relate to theirs.
Same skill, applied on multiple levels. The skill in “becoming rational”/”coordinating groups of humans”/”aligning AI” is all skill in alignment. Borrowing Val’s words again, “It’s fractal”.
When I think about “how to align AI”, I notice that I don’t actually know how to do this. There’s nothing I can see, where I think “Ah, this is the code I need to write” or “here’s the things I need to exhort at people to do” that will predictably yield the outcomes I want. Not through “targeting groups”, or “targeting individuals”, or “targeting code”.
And I notice that stopping to notice this is by far the most important thing I can do, since “trying” would necessarily blind and therefore doomed to success by luck at best. And that one of the better “object level applications” for me right now is to highlight the nature of this move, since a big part of “Why is this control not working?” is “Because people aren’t aware of how control works”—Okay, cool. So if we change that, this part of the problem dissolves.
There’s something even more general and self referential that I’m fumbling towards though, since doing that thing is the thing I can actually expect to lead to the best possible outcomes—structurally and necessarily. But it’s a bit mindbending because “trying to generalize” is itself an instance of the thing I’d be trying to avoid (and so would “trying to not try” or concluding “we shouldn’t try”). “Generalizing is hard. I wonder why?” is the generalization. So I guess that’s the next thing to wonder, once I have some mental room for it.
Anyway, your post on BCI facilitated AI alignment looks to me like a step in the same direction. A step towards noticing that AI alignment is downstream of human alignment (in this case, because aligned and augmented humans are more competent which is instrumentally useful), and that the solutions which actually work have more competent humans more tightly integrated in the alignment process for longer—rather than keeping a stance of “I’m outside the system, aligning THAT THING is what I’m trying to do, dammit”.
I don’t think you’ve been explicitly thinking about it in the same terms I’m laying out here, but it does seem like it might be downstream of beginning to sense and act on the same thing I’m fumbling towards. Like you might be already on the same path fumbling towards the same thing that I’m trying to put a finger on (and noticing myself not fully having yet, in this sentence. Lol). Does this fit?
- Elliot Callender 17 Jun 2026 0:04 UTC
  1 point
  0
  Parent
  Same skill, applied on multiple levels. The skill in “becoming rational”/”coordinating groups of humans”/”aligning AI” is all skill in alignment.
  I see what these words might say, but don’t follow the link. Like, seems basically true that rationalism → human coordination works, but AI alignment is such a different thing, so alien to whatever concepts help humans self-align and coordinate.
  Perhaps I just need more time to work through this concept. Right now I’m more focused on understanding my own mind, to make better decisions, because I’m finding a crapton of low-hanging fruit very quickly.
  I’ve rolled this around in my head for a bit, and it seems to me like, for rationality, “control” of lower processes is better done by something like “improved training data” than operant force.
  This is an vignette from my life, not a quote from anyone:
  I’ve noticed since I was about 16 that I get sad when seeing attractive women. This always sucked! I tried introspecting, but I was looking at the feeling of sadness (and often trying misguidedly to control that feeling).
  Two days ago, I started looking at what the other parts of my mind were doing when I see women; what am I pulling towards, where’s the tension? Oh, partnered romance feels unreachable? Where’s that coming from?
  It mostly seems to me that this (unreachable romance) is the wrong conceptualization. Like, my non-deliberate processes are using the wrong concepts. My intuitive ontology is factually wrong, in the sense I care about; and this wrongness tied itself into a self-perpetuating loop (what I’m calling a cognitive attractor).
  And so most of the art of rationality, in fact possibly exactly all of it, is to intuitively correct these errors:
  I can in fact date, I’ve been accidentally choosing not to
  It doesn’t actually hurt me to admit I’m wrong, in most cases
  Eyedrops aren’t scary :D
  because imposing top-down “control” misaligns:
  - The optimizer which calls itself an Elliot, wants human flourishing, wants to hold someone
  - vs the constellation of smaller optimizers Elliot is an intelligible supervenience of
  Anyway, your post on BCI facilitated AI alignment looks to me like a step in the same direction. A step towards noticing that AI alignment is downstream of human alignment (in this case, because aligned and augmented humans are more competent which is instrumentally useful), and that the solutions which actually work have more competent humans more tightly integrated in the alignment process for longer
  Sounds plausible
  rather than keeping a stance of “I’m outside the system, aligning THAT THING is what I’m trying to do, dammit”.
  I very much have come to think of myself as a system, nothing special other than this weird consciousness phlogiston, though I haven’t stopped using self/other borders.
  Does this fit?
  Maybe? I find myself confused about your explanation of AI alignment; but after reading Valentine’s post about memeplexes, I’m thinking you’re talking about being in entirely the wrong frame, where “align the AI” straightforwardly might not be a thing. And “stop doom” might also not be a thing.
  (Thanks for linking that, by the way. I preliminarily do expect, if BCIs or similar enhancements work, that moderately superintelligent humans will birth more powerful Friendly hypercreatures. If you have a list of older, similarly gearsy posts, I’d definitely like to read them.)