But I don’t know whether you’re trying to change cultural or individual rationality here:
Like, I’m mostly optimistic about getting a few individuals to not do Crappy Epistemics, whereas I feel like you’re targeting groups,
If it’s from the criticisms of the rationality community for not berating people into rationality, oops. I should have put those in quotation marks. The point was “you’ll hear this from someone in this mode”, not to assert those things myself. I mostly see those exhortations as examples of the thing they’re railing against.
I’m not really trying to change anything here, just describing what is actually going on.
which seems difficult if I’m one of very few people who get what you’re saying.
Oh, for sure :)
But to be clear, I’m excited that you seem to be picking up on the same thing from another angle, separate from anything I’ve said.
I’ve gotten pretty comfortable with my ability to help people see the things needed to realign themselves on some object level issue, but I’m still new at communicating the meta thing of how to help people see how this alignment process works. Difficult, but fun/interesting, and I don’t think too difficult to learn.
If you’re saying “we need dramatically better instrumental rationality, of which short-term optimization targets are a big component” then yes, strong agree. I feel like you’re saying something else though, maybe about coordination between humans?
Yeah, I think that’s a facet of it. But also, yeah, it’s more than that. The stuff about coordinating between humans is just another facet too.
There’s a failure mode in changework where people try to “fix their irrational fear”, or “walk the client through what they need to do to fix their irrational fear”. These seem like perfectly reasonable responses which is why people do them, but watch it play out enough times and you start to notice that the stubborn attempt to control away the fear is the problem. That once you notice that you don’t actually know your fear to be irrational, you naturally turn towards noticing whether you’re actually safe, and that’s the move that conditions away inappropriate fears. That once you notice that trying to push people towards having a certain “correct” orientation to their pain is actually the thing that causes the suffering, you naturally turn towards “what’s the real problem here?” and that’s what dissolves the suffering in a word or “a few messages”.
This move of “Oh, control isn’t working. I wonder why?” turns out to be very general. Not just applied to one’s own mind, or helping others with their own minds, but also helping others learn to help others with their own minds and so on and so on. When my friend asked for help getting her four year old to take her eyedrops, I was able to play with my friends discomfort which led to her being able to play with her daughters discomfort, which led to her daughter playing with her own discomfort. “You’re a brown belt in jiu jitsu, what do you mean how do I get my child to take her eyedrops!?” → “Sigh. I just feel like I shouldn’t have to use force… and I guess this is another one of those things where my own tension is telling her to be tense, huh?” → “Mommy, can we play the eyedrop game!?”
In short, and translating into the language we’re using here, “trying to align AGI” is itself an instance of this same exact failure to align ourselves. Because no one is looking at it like “Oh, yeah, easy peasy. I predict that I will not experience prediction error, because I got this”. It’s all pushing back in attempt to control away prediction error because the consequences of failing are unimaginably bad, while failing to act on the uneasiness coming from predictions of this not panning out. Which turns out to be where all the most useful information is.
When I look instead towards “If this goes well, what does that look like? How did we get here?”, the answer I see is one where the people guiding the development of AI aren’t pushing away from any of the relevant information, let alone the information that they themselves perceive as most important with respect to whether what they’re doing is working and what kind of moves might actually work.
In other words, it’s one where the researchers themselves have enough embodied skill in alignment that they can approach the problem with their full faculties. Not just because “That’s what rationality is, and good instrumental rationality is necessary for succeeding at hard things”, but also “it’s literally the same skill”. In the same way that relating to one’s own mind is the same skill as helping a friend relate to theirs is the same as helping a friend helping their kid relate to theirs.
Same skill, applied on multiple levels. The skill in “becoming rational”/”coordinating groups of humans”/”aligning AI” is all skill in alignment. Borrowing Val’s words again, “It’s fractal”.
When I think about “how to align AI”, I notice that I don’t actually know how to do this. There’s nothing I can see, where I think “Ah, this is the code I need to write” or “here’s the things I need to exhort at people to do” that will predictably yield the outcomes I want. Not through “targeting groups”, or “targeting individuals”, or “targeting code”.
And I notice that stopping to notice this is by far the most important thing I can do, since “trying” would necessarily blind and therefore doomed to success by luck at best. And that one of the better “object level applications” for me right now is to highlight the nature of this move, since a big part of “Why is this control not working?” is “Because people aren’t aware of how control works”—Okay, cool. So if we change that, this part of the problem dissolves.
There’s something even more general and self referential that I’m fumbling towards though, since doing that thing is the thing I can actually expect to lead to the best possible outcomes—structurally and necessarily. But it’s a bit mindbending because “trying to generalize” is itself an instance of the thing I’d be trying to avoid (and so would “trying to not try” or concluding “we shouldn’t try”). “Generalizing is hard. I wonder why?” is the generalization. So I guess that’s the next thing to wonder, once I have some mental room for it.
Anyway, your post on BCI facilitated AI alignment looks to me like a step in the same direction. A step towards noticing that AI alignment is downstream of human alignment (in this case, because aligned and augmented humans are more competent which is instrumentally useful), and that the solutions which actually work have more competent humans more tightly integrated in the alignment process for longer—rather than keeping a stance of “I’m outside the system, aligning THAT THING is what I’m trying to do, dammit”.
I don’t think you’ve been explicitly thinking about it in the same terms I’m laying out here, but it does seem like it might be downstream of beginning to sense and act on the same thing I’m fumbling towards. Like you might be already on the same path fumbling towards the same thing that I’m trying to put a finger on (and noticing myself not fully having yet, in this sentence. Lol). Does this fit?
If it’s from the criticisms of the rationality community for not berating people into rationality, oops. I should have put those in quotation marks. The point was “you’ll hear this from someone in this mode”, not to assert those things myself. I mostly see those exhortations as examples of the thing they’re railing against.
I’m not really trying to change anything here, just describing what is actually going on.
Oh, for sure :)
But to be clear, I’m excited that you seem to be picking up on the same thing from another angle, separate from anything I’ve said.
I’ve gotten pretty comfortable with my ability to help people see the things needed to realign themselves on some object level issue, but I’m still new at communicating the meta thing of how to help people see how this alignment process works. Difficult, but fun/interesting, and I don’t think too difficult to learn.
Yeah, I think that’s a facet of it. But also, yeah, it’s more than that. The stuff about coordinating between humans is just another facet too.
There’s a failure mode in changework where people try to “fix their irrational fear”, or “walk the client through what they need to do to fix their irrational fear”. These seem like perfectly reasonable responses which is why people do them, but watch it play out enough times and you start to notice that the stubborn attempt to control away the fear is the problem. That once you notice that you don’t actually know your fear to be irrational, you naturally turn towards noticing whether you’re actually safe, and that’s the move that conditions away inappropriate fears. That once you notice that trying to push people towards having a certain “correct” orientation to their pain is actually the thing that causes the suffering, you naturally turn towards “what’s the real problem here?” and that’s what dissolves the suffering in a word or “a few messages”.
This move of “Oh, control isn’t working. I wonder why?” turns out to be very general. Not just applied to one’s own mind, or helping others with their own minds, but also helping others learn to help others with their own minds and so on and so on. When my friend asked for help getting her four year old to take her eyedrops, I was able to play with my friends discomfort which led to her being able to play with her daughters discomfort, which led to her daughter playing with her own discomfort. “You’re a brown belt in jiu jitsu, what do you mean how do I get my child to take her eyedrops!?” → “Sigh. I just feel like I shouldn’t have to use force… and I guess this is another one of those things where my own tension is telling her to be tense, huh?” → “Mommy, can we play the eyedrop game!?”
Valentine has a good post “We are already in AI takeoff”, which I took a stab at putting into my own words in a comment there.
In short, and translating into the language we’re using here, “trying to align AGI” is itself an instance of this same exact failure to align ourselves. Because no one is looking at it like “Oh, yeah, easy peasy. I predict that I will not experience prediction error, because I got this”. It’s all pushing back in attempt to control away prediction error because the consequences of failing are unimaginably bad, while failing to act on the uneasiness coming from predictions of this not panning out. Which turns out to be where all the most useful information is.
When I look instead towards “If this goes well, what does that look like? How did we get here?”, the answer I see is one where the people guiding the development of AI aren’t pushing away from any of the relevant information, let alone the information that they themselves perceive as most important with respect to whether what they’re doing is working and what kind of moves might actually work.
In other words, it’s one where the researchers themselves have enough embodied skill in alignment that they can approach the problem with their full faculties. Not just because “That’s what rationality is, and good instrumental rationality is necessary for succeeding at hard things”, but also “it’s literally the same skill”. In the same way that relating to one’s own mind is the same skill as helping a friend relate to theirs is the same as helping a friend helping their kid relate to theirs.
Same skill, applied on multiple levels. The skill in “becoming rational”/”coordinating groups of humans”/”aligning AI” is all skill in alignment. Borrowing Val’s words again, “It’s fractal”.
When I think about “how to align AI”, I notice that I don’t actually know how to do this. There’s nothing I can see, where I think “Ah, this is the code I need to write” or “here’s the things I need to exhort at people to do” that will predictably yield the outcomes I want. Not through “targeting groups”, or “targeting individuals”, or “targeting code”.
And I notice that stopping to notice this is by far the most important thing I can do, since “trying” would necessarily blind and therefore doomed to success by luck at best. And that one of the better “object level applications” for me right now is to highlight the nature of this move, since a big part of “Why is this control not working?” is “Because people aren’t aware of how control works”—Okay, cool. So if we change that, this part of the problem dissolves.
There’s something even more general and self referential that I’m fumbling towards though, since doing that thing is the thing I can actually expect to lead to the best possible outcomes—structurally and necessarily. But it’s a bit mindbending because “trying to generalize” is itself an instance of the thing I’d be trying to avoid (and so would “trying to not try” or concluding “we shouldn’t try”). “Generalizing is hard. I wonder why?” is the generalization. So I guess that’s the next thing to wonder, once I have some mental room for it.
Anyway, your post on BCI facilitated AI alignment looks to me like a step in the same direction. A step towards noticing that AI alignment is downstream of human alignment (in this case, because aligned and augmented humans are more competent which is instrumentally useful), and that the solutions which actually work have more competent humans more tightly integrated in the alignment process for longer—rather than keeping a stance of “I’m outside the system, aligning THAT THING is what I’m trying to do, dammit”.
I don’t think you’ve been explicitly thinking about it in the same terms I’m laying out here, but it does seem like it might be downstream of beginning to sense and act on the same thing I’m fumbling towards. Like you might be already on the same path fumbling towards the same thing that I’m trying to put a finger on (and noticing myself not fully having yet, in this sentence. Lol). Does this fit?