This fits what I’ve seen from talking to women who have experienced rape/sexual abuse.
In one case, the abuse happened as a kid and she didn’t feel traumatized for many years until she heard people talking about such things as if they’re supposed to be traumatic and it was helpful to her to have someone give her permission to disregard those pressures as ill informed and stupid.
In another case, the woman was much older and somewhat traumatized by the experience, but talking to her friend about it just made her feel more traumatized because he emphasized how bad the situation was even more than she did. There was actually a considerable amount of humor in that particular case, and being invited to see it and recognize that she’s actually totally fine and doesn’t have to freak out about it anymore was helpful.
With respect to the question of “How do we communicate that sexual abuse is really not ok, without making victims of it feel like it’s worse than it actually is?”, it depends where the harm is. Hypothetically, if the only harm is in the psychological trauma, and the act isn’t actually psychologically traumatic in itself, then you bite that bullet and say that nothing bad happened and no one did anything that wrong. Or perhaps you treat it like drunk driving, to the extent that you believe the lack of trauma was not predictable.
However, the premise that the harm resides solely in the psychological trauma is false. Trauma serves an important function, and aiming away from experiencing trauma at all costs can be dangerous.
Imagine one day you go driving after a first rain, fail to account for the added dangers of water on the road, and as a result nearly drive off a cliff. No harm happened, and if you’re sufficiently unintelligent or unobservant—or afraid to experience fear—you might not recognize what’s scary about this and end up non-traumatized. Given the situation though, that lack of trauma doesn’t mean you’re “psychologically healthy” or well adjusted; it means the opposite. It means that you didn’t notice the problem, and so next time it rains you’re going to drive again like the roads are dry, putting yourself at further unnecessary risk. If instead you were a little more traumatized, you might be appropriately scared of driving fast on wet roads—at least, until you learn the new limits.
You want psychological trauma to match objective threats and your ability to handle them. It’s possible to be overtraumatized, or to fail to recover from trauma through more specific learning, but trauma has it’s legitimate place. In the second example where the woman talked to her friend and ended up more traumatized, I don’t know that the friend was wrong to do so—certainly the two of them as a system decided that the situation was more dangerous than she had been giving credit for, and maybe it was. She got out of that one fine, but the next one she might not have and she did some dumb things to get there.
There is a more fundamental and more squishy issue here too.
In the first example I gave, society’s local reaction was clearly maladaptive and just harmed the kid while giving her nothing actionable to do to either reduce risk or to get out of the trauma. I’ll even hazard a guess that the initial experience wasn’t particularly harmful to her in this specific case, but the squickiness I feel when thinking about those things doesn’t stem from “I think it’s likely to be traumatic for the kids” and the reason I think she got away from it unharmed is because such things are so frowned upon by society that it was kept from growing to a scale that likely would have been harmful to her—even if nontraumatic.
This stuff is all much harder to figure out so I can’t point to concrete damages and justify them well, but “this stuff is all much harder to figure out” is kinda the point; no one really knows what the effects are or what the damages are, so no one really knows what’s safe and what’s harmful and how. “How do our early experiences (sexual and otherwise) determine who we grow up to be, and what do we want to grow up to be?” is a big messy question, and “Well, the kid didn’t feel harmed” isn’t very strong evidence that they weren’t harmed (even if it is strong evidence that they aren’t traumatized, and that you shouldn’t be pushing unaimed trauma willy nilly). To give a stupid example to illustrate the type of non-obvious things that can happen, if a kid were to hypothetically “play doctor” and pick up a medical instrument fetish from it, the kid doesn’t have to be traumatized and the fetish doesn’t have to be morally wrong for it to negatively impact their life when their partner pool is narrowed to “People who also have medical fetishes”. “What kind of people we become, and how” is just a bigger question than we know how to navigate deliberately as kids or as adults or as a society. We have a sense that certain things are not okay, and it’s not entirely clear where they came from because our justifications break down here, but that doesn’t mean there’s no informational value there (or that it’s correct at face value either).
So as the bottom line, returning to the question of “How do we communicate that sexual abuse is really not ok, without making victims of it feel like it’s worse than it actually is?”, it is a specific instance of the question of how to relate to people in general, and the same principles apply. We want to orient empathically both to the experience of the person we’re talking to and also everything else we have that bears on the reality of what happened, and then see what happens when we integrate all of the things.
Sometimes this will simplify when it turns out that the kind of “sexual abuse” wasn’t actually abusive in the context in question, and that society is wrong. Or not as harmful as it’s made out to be.
Sometimes this will simplify when the causes of harm are nice and clear, and we can lead people to recognize these dangers and how to avoid them—no (permanent or debilitating) trauma needed. Maybe that means soberly showing them that while they got away with it this time, there’s this other risk. Or maybe it means recognizing that they actually have learned their lesson, and showing them that it’s okay to laugh and to move on.
In the specific instances where it resolves to neither of these, and you’re stuck with the apparent paradox of “Yes, that was actually very bad in my best estimate” and “I can’t point to a particular harm to avoid”, then until we can figure out more about where that sense is coming from and what the lurking danger is (which is a potential option to pursue), we’re kinda stuck acknowledging our uncertainty. Instead of “This is super bad including in your specific case!!!” or “Don’t worry, nothing at all is wrong with what happened”, just stick to what is known to be true. I can’t be too concrete as it will depend both on the specific situation and also what you know about it, but “There was likely nothing harmful in this case, but that’s a slippery slope that most likely leads to really bad things, even if it’s hard to explain exactly what those are” is one potential resolution of the apparent paradox.
What impact are you concerned about?
You’re giving up an opportunity to play with developing their capacity to orient to intensity!
It’s not that kissing it better is not okay or something, but there’s an opportunity for useful fun, and the rule of “be very suspicious of saying things that look false” can help point you in the direction to find it.
The dishonesty isn’t in “It’ll be okay after I kiss it”, it’s in the idea that it was ever not okay in the first place.
Think about what purpose the idea that kissing it makes it better serves to you, which leads you to use it. The kid won’t accept “Aw, you’re fine” and therefore you can’t help him feel better with that take. However, if you pretend that he really is not okay, then he can believe that, and maybe he’ll also believe you when you offer a solution. It allows you to side step the part where you have to convince the kid that he’s wrong, and allows you to lead him to a correct conclusion that he’s okay. Which is nice, since having your toddler hurt and in distress can be uncomfortable, and we’d often like to get out of that discomfort ourselves.
But that work is where all the cool stuff happens, and that’s where you get to teach them the skill of reorienting to unpleasant sensations effectively. The alternative starts with orienting towards our own discomforts skillfully, and then modeling that for them as applied to their problem.
So, like, what happens when your kid scrapes his knee and he cries about it, and he doesn’t stop crying for five minutes. When you could have stopped it at two with a kiss. Is that okay, and something you feel comfortable playing with (assuming you see a reason to play with it), or is it something uncomfortable which you’d rather stop?
To the extent that it’s the latter then you have you own little puzzle to sort out, and to the extent that it’s the former then you have a new game to play with your toddler. When your own emotional take on the booboo and ensuing distress is “Ooh! An opportunity to play! How bad is this one!?”, then it tends to come across and the kid can learn that little booboos and a little distress aren’t the end of the world and can actually be a fun learning challenge in an interesting sort of way. You get to engage with the experience they’re having without trying to minimize it, or to pretend to agree with it, and that gives them a lot more room to figure out if they’re actually okay and what lessons (if any) they want to take from it. And you often get really cool experiences, like watching your kid reframe the problem from “I’m hurt and not okay” to “It was scary and I cried, but I’m okay!” and “Can we do it again!?”.
In principle, this can result in more distress because there’s more willingness to entertain distress, but in practice I haven’t found it to be the case. There aren’t really any times where I could offer a kiss to make it feel better, because in the cases where it’d work it never really becomes a problem in the first place. Sometimes my toddler will insist that she needs a bandaid to make it better, and I’ll give her one even though she doesn’t physically need one, but that’s very much led by her and the nudges she gets from me are actually away from the idea that the bandaids are necessary.
“Narrative syncing” took a moment to click to me, but when it did it brought up the connotations that I don’t see in the examples alone. Personally, the words that first came to mind were “Presupposing into existence”, and then after getting a better idea of which facet of this you were intending to convey, “Coordination through presupposition”.
While it obviously can be problematic in the ways you describe, I wouldn’t view it as “a bad thing” or “a thing to be minimized”. It’s like.. well, telling someone what to do can be “bossy” and “controlling”, and maybe as a society we think we see too much of this failure mode, but sometimes commands really are called for and so too little willingness to command “Take cover!” when necessary can be just as bad.
Before getting into what I see as the proper role of this form of communication, I think it’s worth pointing out something relevant about the impression I got when meeting you forever ago, which I’d expect others get as well, and would be expected to lead to this kind of difficulty and this kind of explanation of the difficulty.
It’s a little hard to put into words, and not at all a bad thing, but it’s this sort of paradoxically “intimidating in reverse” sort of thing. It’s this sort of “I care what you think. I will listen and update my models based on what you say” aura that provokes anxieties of “Wait a minute, my status isn’t that high here. This doesn’t make sense, and I’m afraid if I don’t denounce the status elevation I might fall less gracefully soon”—though without the verbal explanation, of course. But then, when you look at it, it’s *not* that you were holding other people above you, and there’s no signals of “I will *believe* what you say*” or “I see you as claiming relevant authority here”, just a lack of “threatened projection of rejection”. Like, there was going to be no “That’s dumb. You’re dumb for thinking that”, and no passive aggression in “Hm. Okay.”, just an honest attempt to take things for what they appear to be worth. It’s unusually respectful, and therefore jarring when people aren’t used to being given the opportunity to take that kind of responsibility.
I think this is a good thing, but if you lack an awareness of how it clashes with expectations people are likely to have, it can be harder to notice and preempt the issues that can come up when people get too intimidated by what you’re asking of them, which they are likely to flinch from. Your proposed fix addresses part of this because you’re at least saying the “We expect you to think for yourself” part explicitly rather than presupposing it on them, but there are a couple pieces missing. One is that it doesn’t acknowledge the “scariness” of being expected to come up with ones own perspectives and offer them to be criticized by very intelligent people who have thought about the subject matter more than you have. Your phrasing downplays it a bit (“no vetted-by-the-group answer to this” is almost like “no right answer here”) and that can help, but I suspect that it ends up burying some of the intimidation under the rug rather than integrating it.
The other bit is that it doesn’t really address the conceptual possibility that “You should go study ML” is actually the right answer here. This needs a little unpacking, I think.
Respect, including self respect or lack thereof, is a big part of how we reason collectively. When someone makes an explicit argument (or otherwise makes a bid for pointing our attention in a certain direction), we cannot default to always engage and try to fully evaluate the argument on the object level. Before even beginning to do that, we have to decide whether or not and to what extent their claim is worth engaging with, and we do that based on a sense of how likely it is that this person’s thoughts will prove useful to engage with. “Respect” is a pretty good term for that valuation, and it is incredibly useful for communicating across inferential distances. It’s always necessary to *some* degree (or else discussions go the way political arguments go even about trivial things), and excess amounts let you bridge much larger distances usefully because things don’t have to be supported immediately relative to a vastly different perspective. When the homeless guy starts talking about the multiverse, you don’t think quite so hard about whether it could be true as if it were a respected physics professor saying the same things. When someone you can see to see things you miss tells you that you’re in danger and to follow their instructions if you want to live, it can be viscerally unnerving, and you might find yourself motivated to follow precautions you don’t understand—and it might very well be the right thing to do.
Returning to Alec, he’s coming to *you*. Anna freakin’ Salamon. He’s asking you “What should I do? Tell me what I should do, because *I don’t know* what I should do”. In response one, you’re missing his presupposition that he belongs in a “follower” role, as relates to this question, and elevating to “peer” someone who doesn’t feel up to the job, without acknowledging his concerns or addressing them.
In response two, you’re accepting the role and feeling uneasy about it, presumably because you intuitively feel like that leadership role is appropriate there, regardless of whether you’ve put it to words.
In response three, you lead yourself out of a leadership role. This is nice because it actually addresses the issue somewhat, and is a potentially valid use of leadership, but open to unintentional abuse of the same type that your unease with the second response warns of.
Returning to “narrative syncing”, I don’t see it so much as “syncing”, as that implies a sort of symmetry that doesn’t exist. It’s not “I’m over here, where are you? How do we best meet up?”. It’s “We’re meeting up *here*. This is where you will be, or you won’t be part of the group”. It’s a decision coming from someone who has the authority to decide.
So when’s that a good thing?
Well, put simply, when it’s coming from someone who actually has the authority to decide, and when the decision is a good one. Is the statement *true?*
“We don’t do that here” might be questionable. Do people there really not do it, or do you just frown at them when they do? Do you actually *know* that people will continue to meet your expectations of them, or is there a little discord that you’re “shoulding” at them? Is that a good rule in the first place?
It’s worth noticing that we do this all the time without noticing anything weird about it. What else is “My birthday party is this Saturday!”, if not syncing narratives around a decision that is stated as fact? But it’s *true*, so what’s the problem? Or internally, “You know, I *will* go to that party!”. They’re both decisions and predictions simultaneously because that’s how decisions fundamentally work. As long as it’s an actual prediction and not a “shoulding”, it doesn’t suddenly become dishonest if the person predicting has some choice in the matter. Nor is there any thing wrong with exercising choice in good directions.
So as applied to things like “What should I do for AI risk?”, where the person is to some degree asking to be coordinated, and telling you that they want your belief or your community’s belief because they don’t trust themselves to be able to do better themselves, do you have something worth coordinating them toward? Are you sure you don’t, given how strongly they believe they need the direction, and how much longer you’ve been thinking about this?
An answer which denies neither possibility might look like..
“ML. Computer science in general. AI safety orgs. Those are the legible options that most of us currently guess to be best for most, but there’s dissent and no one really knows. If you don’t know what else to do, start with computer science while working to develop your own inside views about what the right path is, and ditch my advice the moment you don’t believe it to be right for you. There’s plenty of room for new answers here, and finding them might be one of the more valuable things you could contribute, if you think you have some ideas”.
I don’t have the equipment on hand to easily measure power consumption.
It’s pretty easy to get that data if you want it. $14 on amazon
It’s worth noting that (and the video acknowledges that) “Maybe it’s more like raising a child than putting a slave to work” is a very very different statement than “You just have to raise it like a kid”.
In particular, there is no “just” about raising a kid to have good values—especially when the kid isn’t biologically yours and quickly grows to be more intelligent than you are.
It’s possible that it “wouldn’t use all it’s potential power” in the same sense that a high IQ neurotic mess of a person wouldn’t use all of their potential power either if they’re too poorly aligned internally to get out of bed and get things done. And while still not harmless, crazy people aren’t as scary as coherently ruthless people optimized for doing harm.
But “People aren’t ruthless” isn’t true in any meaningful sense. If you’re an ant colony, and the humans pave over you to make a house, the fact that they aren’t completely coherent in their optimization for future states over feelings doesn’t change the fact that their successful optimization for having a house where your colony was destroyed everything you care about.
People generally aren’t in a position of that much power over other people such that reality doesn’t strongly suggest that being ruthful will help them with their goals. When they do perceive that to be the case, you see an awful lot of ruthless behavior. Whether the guy in power is completely ruthless is much less important than whether you have enough threat of power to keep him feeling ruthful towards your existence and values.
When you start positing superintelligence, and it gets smart enough that it actually can take over the world regardless of what stupid humans want, that becomes a real problem to grapple with. So it makes sense that it gets a lot of attention, and we’d have to figure it out even if it were just a massively IQ and internal-coherence boosted human.
With respect to the “smart troubled person, dumb therapist” thing, I think you have some very fundamental misgivings about human aims and therapy. It’s by no means trivial to explain in a tangent of a LW comment, but “if the person knew how to feel better in the future, they would just do that” is simply untrue. We do “optimize for feelings” in a sense, but not that one. People choose their unhappiness and their suffering because the alternative is subjectively worse (as a trivial example, would you take a pill that made you blisfully happy for the rest of your life if it came at the cost of happily watching your loved ones get tortured to death?). In the course of doing “therapy like stuff”, sometimes you have to make this explicit so that they can reconsider their choice. I had one client, for example, who I led to the realization that his suffering was a result of his unthinking-refusal to give up hope on a (seemingly) impossible goal. Once he could see that this was his choice, he did in fact choose to suffer less and give up on that goal. However, that was because the goal was actually impossible to achieve, and there’s no way in hell he’d have given up and chosen happiness if it were at all possible for him to succeed in his hopes.
It’s possible for “dumb therapists” to play a useful role, but either those “dumb” therapists are still wiser than the hyperintelligent fool, or else it’s the smart one leading the whole show.
As far as I can tell, the reasoning is that things that help Trump hurt America, so Putin should help Trump? I mean, fair, but a little on nose and saying the quiet part out loud even for him.
It’s obviously that Trump America is great, and Biden America is bad. So Putin should spank Biden for making America bad again, so that Trump can help Make America Great Again (Again).
I don’t think your conclusions follow.
Humans get neurotic and goodhart on feelings, so would you say “either it’s not really GAI, or it’s not really friendly/unfriendly” about humans? We seem pretty general, and if you give a human a gun either they use it to shoot you and take your stuff or they don’t.
Similarly, with respect to “They might still be able to “negotiate” “win-win” accommodations by nudging the AI to different local optima of its “feelings” GAN”, that’s analogous to smart people going to dumb therapists. In my experience, helping people sort out their feelings pretty much requires having thought through the landscape better than they have, otherwise the person “trying to help” just gets sucked into the same troubled framing or has to disengage. That doesn’t mean there isn’t some room for lower IQ people to be able to help higher IQ people, but it does mean that this only really exists until the higher IQ person has done some competent self reflection. Not something I’d want to rely on.
If we’re using humans as a model, there’s two different kinds of “unfriendliness” to worry about. The normal one we worry about is when people do violent things which aren’t helpful to the individual, like school shootings or unabombering. The other one is what we do to colonies of ants when they’re living where we want to build a house. Humans generally don’t get powerful enough for the latter to be much of a concern (except in local ways that are really part of the former failure mode), but anything superintelligent would. That gets us right back to thinking about what the hell a human even wants when unconstrained, and how to reliably ensure that things end up aligned once external constraints are ineffective.
Thanks for the feedback. To be clear, I didn’t mean that I inferred that you took it that way, just that after I finished writing I realized I was doing the “pretty critical of people for doing very normal things” thing, and that it often comes off that way if I’m not careful to credibly disclaim that interpretation.
Right, it sounds like you mostly get what I’m saying.
I’d quibble that “the people their coalition is forcing to wear masks” are the anti-maskers (since pro-maskers are being nice and obedient, and therefore aren’t being “forced”). It’s pretty easy to slip into contempt for people not respecting your well-deserved authoritah, so that even when they start doing it you think “About fucking time!” and judge them for not doing it earlier or more enthusiastically, instead of showing gratitude for the fact that they’re moving in the right direction. I know I’ve been guilty of it in the past.
I don’t mean to imply that the people behind the ads are to be seen as shitty people, or in this light alone, and I think in the course of describing this perspective which I viewed as needing to be conveyed I may have failed to make that clear. I do actually agree with your take on what they see themselves as doing, and that it’s not entirely illegitimate.
I responded to my own comment trying to lay out better what I meant exactly by “alignment failure” and how “they’re not (meta) trying to be hostile” and “they’re trying to humiliate and degrade” aren’t actually mutually exclusive.
After hitting “submit” I realized that “alignment failure” is upstream of this divergence of analyses.
By “alignment failure”, I mean “the thing they are optimizing for isn’t aligned with the thing they claim to be optimizing for”. It’s a bit “agnostic” on the cause of this, because the cause isn’t so clearly separable into “evil vs incompetent”. Alignment failure happens by default, and it takes active work to avoid.
Goodharting is an example. Maybe you think “Well, COVID kills people, so we want people to not get COVID, so… let’s fine people for positive COVID tests!”. Okay, sure, that might work if you have mandatory testing. If you have voluntary testing though, that just incentivizes people to not get tested, which will probably make things worse. At this point, someone could complain that you’re aiming to make COVID *look* like it’s not a problem, not actually aiming to solve the problem. They will be right in that this is the direction your interventions are pointing, *even if you didn’t mean to and don’t like it*. In order to actually help keep people healthy and COVID free, you have to keep your eyes on the prize and adjust your aim point as necessary. In order to aim at aiming to keep people healthy and COVID free, you have to keep your eyes on the prize of alignment, and act to correct things when you see that your method of aiming is no longer keeping convergence.
When it comes to things like pro-mask advertisements, it’s oversimplifying to say “It’s an honest mistake” and it’s *also* oversimplifying to say “They WANT to exercise power, not save lives” (hopefully). The question is where, *exactly* the alignment between stated goals and effects break. And the way to tell is to try different interventions and see what happens.
What happens if you say “All I got from your ad was ‘eat shit’! Go to hell you evil condescending jerk!”? Do they look genuinely surprised and say “Shoot, I’m so sorry. I definitely care about your opinion and I have no idea how I came off that way. Can you please explain so that I can see where I went wrong and make it more clear that my respect for your opinion and autonomy is genuine?”?
Do they think “Hm. This person seems to think that I’m condescending to him, and I don’t want them to think that, yet I notice that I’m not surprised. Is it true? Do I have to check my inner alignment to the goal of saving lives, and maybe humble myself somewhat?”
What if you state the case more politely? What if you go out of your way to explain it in a way that makes it easy for them to continue to see themselves as good people, while also making it unmistakable that remaining a “good person who cares about saving lives” requires running ads which don’t leak contempt? Do they change the ad, mind how they’re coming off and how they’re feeling more closely, and thank you for helping them out? Or do they try making up nonsense to justify things before finally admitting “Okay, I don’t actually care about people I just like being a jerk”?
My own answer is that the contempt is likely real. It’s likely something they aren’t very aware of, but that they likely would be if they were motivated to find these things. It’s likely that they are not so virtuous and committed to alignment to their stated goals of being a good person that you can rudely shove this in their face and have them fix their mistakes. If you play the part of someone being stomped on, and cast them as a stomper, they will play into the role you’ve cast them in while dismissing the idea that they’re doing it. How evil!
However, it’s also overwhelmingly likely that if you sit down with them and see them for where they’re at, and explain things in a way that makes it feel okay to be who they are and shows them *how* they can be more of who they want to see themselves as being, they’ll choose to better align themselves and be grateful for the help. If you play the part of someone who recognizes their good intent and who recognizes that there are causal reasons which are beyond them for all of their failures, and cast them in the role of someone who is virtuous enough to choose good… they’ll probably still choose to play the part you cast them in.
That’s why it’s not “Simple mistake, nothing to see here” and also not “They’re doing it on purpose, those irredeemable bastards!”. It’s kinda “accidentally on purpose”. You can’t just point at what they did on purpose and expect them to change because they did in fact “do it on purpose” (in a sense). You *can*, however, point out the accident of how they allowed their purpose to become misaligned (if you know how to do so), and expect that to work.
Aligning ourselves (and others) with good takes active work, and active re-aiming, both of object level goals and meta-goals of what we’re aiming for. Framing things as either “innocent mistakes” or “purposeful actions of coherent agents” misses the important opportunity to realign and teach alignment
It’s not that when the people behind the ad sat down and asked “What are we trying to do?”, they twirled their mustaches and said “I know! Let’s degrade and humiliate!”. It’s about what bleeds through about their attitude when they “try to get people to wear masks”, which they fail to catch.
For example, if a microwave salesman said “Microwaves are like women. Great in the kitchen!”, you don’t have to reject the idea that they’re trying to sell microwaves to notice what their ad implies about their perspective on women. Maybe it’s incompetence that they’d love to fix if anyone informs them about why it might not be the most universally non-offensive line to use, but it still shows something about how they view women.
However, if they use this line at a feminist convention, and they aren’t paid on commission… and you don’t quickly hear “Oops! Sorry, I fucked up!”… it starts to say something not just about his perspectives on women, but also his ability and/or inclination to take into account the perspectives of his target audience. The more the context makes the offensiveness difficult to miss, the harder it becomes to believe that the person is trying oh so hard to be not offensive so that they can sell microwaves, and the more it starts to seem like provoking offense and failing to sell microwaves is something they’re at least indifferent to, if not actively enjoying.
So when someone says “Masks are like opinions” and reminds you that opinions are like assholes (and stinky assholes at that, which the full saying specifies), right before encouraging you to have an opinion, it’s pretty hard to hear that as expressing “I’d love to hear your opinion!”? Do you really think that’s the best way they can think to convey their heart-felt attitude of “Let’s all expose our opinions to each other so that we can share their contents and take them in!”? Or do you notice that they went out of their way to point at “No one wants that shit, so keep it hidden behind multiple layers”, and then didn’t disclaim that interpretation, and infer that maybe the fact that this slipped past their filters signals that “We’re not interested in your dissent” isn’t actually something they’re trying super hard to avoid signalling?
Keep in mind, this isn’t some “orthogonal” failure mode that makes for a small deviation from an otherwise good ad—the way “simple oversight” predicts. The people who aren’t wearing masks have actively formed an opinion on the topic which contradicts the idea of wearing masks. The anti-mask sentiment is *explicitly* about giving the finger to an authority who they see as trying to condescend to them while sneering at them, and the ad that is “trying” to combat this literally associates their opinions with shit—while portraying itself as supportive, no less. It is quite literally the exact wrong signal to send if you want to get people to wear masks, so as far as “simple oversights” go, it’d have to be an amazing one. However, it is dead nuts center of what “alignment failure of the type pointed at by anti-maskers” predicts.
“Masks = assholes” is just the wrong explanation for the valid observation that there’s an “Eat shit” vibe coming through.
The way to die with dignity is to genuinely intend to succeed even as we accept that we will likely fail.
“How many bits” isn’t a very well defined question. Give it 14k words to persuade you to let it out, and it might succeed. Give it 140k chances to predict “rain or no rain, in this location and time?” and it has no chance. The problem is that if it works well for that, you’ll probably want to start using it for more… and more..
It’s this path that’s the concerning thing. So long as we’re sufficiently worried about having our minds changed, we’re really good at ignoring good arguments and deluding ourselves into missing the fact that the arguments really ought to be compelling according to our own ways of reasoning. When people get their mind changed (against their best interests or not) it tends to be because their guards were not up, or were not up in the right places, or were not up high enough.
The question, so far as I can tell, is whether you recognize that you ought to be scared away from going further before or after you reach the point of no return where the AI has enough influence over you that it can convince you to give it power faster than you can ensure the power to be safe. Drugs make a useful analogy here. How much heroin can you take before you get addicted? Well, it depends on how terrified you are of addiction and how competent you are at these things. “Do you get hooked after one dose?” isn’t really the right question if quitting after the first dose is so easy that you let yourself take more. If you recognized the threat as significant enough, it’s possible to get shaken out of things quite far down the road (the whole “hitting rock bottom” thing sometimes provides enough impetus to scare people sober).
Superhuman AI is a new type of threat that’s likely very easy to underestimate (perhaps to an extent that is also easy to underestimate), but I don’t think the idea of “Give it carefully chosen ways to influence the world, and then stop and reassess” is doomed to failure regardless of the level of care.
Some evidence I think Alexey’s model fails to explain:
“do all my friends hate me or do I just need a nap”
I don’t think that’s much of a problem. People get “hangry” too and that doesn’t invalidate the usefulness of fasting nor does it imply that fasting has to be miserable like that.
I think any prolonged discomfort that is unaddressed is going to make people cranky and bias them towards negative explanations of things. Address the discomfort, and the experience can change dramatically.
Perhaps the most useful thing I got out of experimenting with polyphasic sleep was a recognition of how well I could function when sleep deprived so long as I wasn’t constantly battling an unaddressed urge to sleep. Sleep deprivation still made me dumber, but the crankiness and most of the dysfunction was actually a result of trying to ignore (and yet not completely tuning out) my body screaming at me to sleep. Having a reference experience for being sleep deprived without craving sleep made it a lot easier to function when an hour or two short, and also more willing to go take a nap when I need one.
This clearly isn’t fair. For one, the “really hard” modifier is completely made up (did Guzey ever imply that the way to train sleep deprivation resilience is to go “really hard” at not-sleeping rather than easing into it?), and for two, physical stress to ones toes is clearly a much more local thing than caloric or sleep deprivation so the hypothesis would be “kicking things in a controlled fashion strengthens the thing you’re impacting with”.
And I’m not sure if it’s true or not, but it’s definitely a thing that professional fighters do, and it does not at all strike me as “obviously false”.
“…we should not flinch away…” is another instance of the thing.
Thanks for the reminder.
This isn’t just banishing the word “should”: the ability not to flinch away from hard things is a skill, and trying to bypass development of that skill with moral panic actually makes everything worse.
We don’t have to actually answer that question to participate in feeding Friendliness in the egregoric wars. We just have to sincerely ask.
Good point. Agreed, and worth pointing out explicitly.
I’m not totally sure I follow. Do you mean a hard line against “shoulding”?
Yes. You don’t really need it, things tend to work better without it, and the fact no one even noticed that that it didn’t show up in this post is a good example of that. At the same time, “I shouldn’t ever use ‘should’” obviously has the exact same problems, and it’s possible to miss that you’re taking that stance if you don’t ever say it out loud. I watched some of your videos after Kaj linked one, and… it’s not that it looked like you were doing that, but it looked like you might be doing that. Like there wasn’t any sort of self caricaturing or anything that showed me that “Val is well aware of this failure mode, and is actively steering clear”, so I couldn’t rule it out and wanted to mark it as a point of uncertainty and a thing you might want to watch out for.
That said, I think trying to make my point more compelling would in fact be an example of the corruption I’m trying to purify myself of. Instead I want to be correct and clear. That might happen to result in what I’m saying being more compelling… but I need to be clean of the need for that to happen in order for it to unfold in a Friendly way.
Ah, but I never said you should try to make your point more compelling! What do you notice when you ask yourself why “X would have effect Y” led you to respond with a reason to not do X? ;)
Now that I’ve had a few days to let the ideas roll around in the back of my head, I’m gonna take a stab at answering this.
I think there are a few different things going on here which are getting confused.
1) What does “memetic forces precede AGI” even mean?
“Individuals”, “memetic forces”, and “that which is upstream of memetics” all act on different scales. As an example of each, I suggest “What will I eat for lunch?”, “Who gets elected POTUS?”, and “Will people eat food?”, respectively.
“What will I eat for lunch?” is an example of an individual decision because I can actually choose the outcome there. While sometimes things like “veganism” will tell me what I should eat, and while I might let that have influence me, I don’t actually have to. If I realize that my life depends on eating steak, I will actually end up eating steak.
“Who gets elected POTUS” is a much tougher problem. I can vote. I can probably persuade friends to vote. If I really dedicate myself to the cause, and I do an exceptionally good job, and I get lucky, I might be able to get my ideas into the minds of enough people that my impact is noticeable. Even then though, it’s a drop in the bucket and pretty far outside my ability to “choose” who gets elected president. If I realize that my life depends on a certain person getting elected who would not get elected without my influence… I almost certainly just die. If a popular memeplex decides that a certain candidate threatens it, that actually can move enough people to plausibly change the outcome of an election.
However there’s a limitation to which memeplexes can become dominant and what they can tell people to do. If a hypercreature tells people to not eat meat, it may get some traction there. If it tries to tell people not to eat at all, it’s almost certainly going to fail and die. Not only will it have a large rate of attrition from adherents dying, but it’s going to be a real hard sell to get people to take its ideas on, and therefore it will have a very hard time spreading.
My reading of the claim “memetic forces precede AGI” is that like getting someone elected POTUS, the problem is simply too big for there to be any reasonable chance that a few guys in a basement can just go do it on their own when not supported by friendly hypercreatures. Val is predicting that our current set of hypercreatures won’t allow that task to be possible without superhuman abilities, and that our only hope is that we end up with sufficiently friendly hypercreatures that this task becomes humanly possible. Kinda like if your dream was to run an openly gay weed dispensary, it’s humanly possible today, but not so further in the past or in Saudi Arabia today; you need that cultural support or it ain’t gonna happen.
2) “Fight egregores” sure sounds like “trying to act on the god level” if anything does. How is this not at least as bad as “build FAI”? What could we possibly do which isn’t foolishly trying to act above our level?
This is a confusing one, because our words for things like “trying” are all muddled together. I think basically, yes, trying to “fight egregores” is “trying to act on the god level”, and is likely to lead to problems. However, that doesn’t mean you can’t make progress against egregores.
So, the problem with “trying to act on a god level” isn’t so much that you’re not a god and therefore “don’t have permission to act on this level” or “ability to touch this level”, it’s that you’re not a god and therefore attempting to act as if you were a god fundamentally requires you to fail to notice and update on that fact. And because you’re failing to update, you’re doing something that doesn’t make sense in light of the information at hand. And not just any information either; it’s information that’s telling you that what you’re trying to do will not work. So of course you’re not going to get where you want if you ignore the road signs saying “WRONG WAY!”.
What you can do, which will help free you from the stupifying factors and unfriendly egregores, and (Val claims) will have the best chance of leading to a FAI, is to look at what’s true. Rather than “I have to do this, or we all die! I must do the impossible”, just “Can I do this? Is it impossible? If so, and I’m [likely] going to die, I can look at that anyway. Given what’s true, what do I want to do?”
If this has a ”...but that doesn’t solve the problem” bit to it, that’s kinda the point. You don’t necessarily get to solve the problem. That’s the uncomfortable thing we should not flinch away from updating on. You might not be able to solve the problem. And then what?
(Not flinching from these things is hard. And important)
3) What’s wrong with talking about what AI researchers should do? There’s actually a good chance they listen! Should they not voice their opinions on the matter? Isn’t that kinda what you’re doing here by talking about what the rationality community should do?
Yes. Kinda. Kinda not.
There’s a question of how careful one has to be, and Val is making a case for much increased caution but not really stating it this way explicitly. Bear with me here, since I’m going to be making points that necessarily seem like “unimportant nitpicking pedantry” relative to an implicit level of caution that is more tolerant to rounding errors of this type, but I’m not actually presupposing anything here about whether increased caution is necessary in general or as it applies to AGI. It is, however, necessary in order to understand Val’s perspective on this, since it is central to his point.
If you look closely, Val never said anything about what the rationality community “should” do. He didn’t use the word “should” once.
He said things like “We can’t align AGI. That’s too big.” and “So, I think raising the sanity waterline is upstream of AI alignment.” and “We have an advantage in that this war happens on and through us. So if we take responsibility for this, we can influence the terrain and bias egregoric/memetic evolution to favor Friendlines”. These things seem to imply that we shouldn’t try to align AGI and should instead do something like “take responsibility” so we can “influcence the terrain and bias egregoric/memetic evolution to favor friendliness”, and as far as rounding errors go, that’s not a huge one. However, he did leave the decision of what to do with the information he presented up to you, and consciously refrained from imbuing it with any “shouldness”. The lack of “should” in his post or comments is very intentional, and is an example of him doing the thing he views as necessary for FAI to have a chance of working out.
In (my understanding of) Val’s perspective, this “shouldness” is a powerful stupifying factor that works itself into everything—if you let it. It prevents you from seeing the truth, and in doing so blocks you from any path which might succeed. It’s so damn seductive and self protecting that we all get drawn into it all the time and don’t really realize—or worse, rationalize and believe that “it’s not really that big a deal; I can achieve my object level goals anyway (or I can’t anyway, and so it makes no difference if I look)”. His claim is that it is that big a deal, because you can’t achieve your goals—and that you know you can’t, which is the whole reason you’re stuck in your thoughts of “should” in the first place. He’s saying that the annoying effort to be more precise about what exactly we are aiming to share and holding ourselves to be squeaky clean from any “impotent shoulding” at things is actually a necessary precondition for success. That if we try to “Shut up and do the impossible”, we fail. That if we “Think about what we should do”, we fail. That if we “try to convince people”, even if we are right and pointing at the right thing, we fail. That if we allow ourselves to casually “should” at things, instead of recognizing it as so incredibly dangerous as to avoid out of principle, we get seduced into being slaves for unfriendly egregores and fail.
That last line is something I’m less sure Val would agree with. He seems to be doing the “hard line avoid shoulding, aim for maximally clean cognition and communication” thing and the “make a point about doing it to highlight the difference” thing, but I haven’t heard him say explicitly that he thinks it has to be a hard line thing.
And I don’t think it does, or should be (case in point). Taking a hard line can be evidence of flinching from a different truth, or a lack of self trust to only use that way of communicating/relating to things in a productive way. I think by not highlighting the fact that it can be done wisely, he clouds his point and makes his case less compelling than it could be. However, I do think he’s correct about it being both a deceptively huge deal and also something that takes a very high level of caution before you start to recognize the issues with lower levels of caution.
I feel like “a healthy rationalist community should not make arguments” is pretty much just a slight rephrasing of “strong rationalist communication is healthiest and most efficient when practically empty of arguments”, but I’m open to suggestions for alternative phrasings (especially if Valentine wants to comment).
They’re quite different. The latter is a qualified description. The former is an unqualified prescription. Even if the prescription were qualified, it does not automatically follow from the description, because it is not necessarily the case that focusing on making less arguments is a way to get healthier—in the same way that “healthiest people tend to exercise” doesn’t imply “get out of bed and go for a jog” is gonna help sick people. Goodheart’s law has a tendency to screw these kind of things up.
Sometimes these kinds of things can work (maybe exercising more will keep you from getting sick?), but in those cases it is still an additional piece which is not contained in “healthiest tends to look like X”. Every time you add in an additional piece because it seems implied from your perspective, you risk changing the meaning to something that the person saying it wouldn’t endorse. When you’re reading someone whose worldview is quite different to your own, this can happen very rapidly, so it’s crucial to read precisely what they are saying and note which inferences are your own rather.