One thing I didn’t have time for in the post proper is that ask culture (or something like it) is crucial for diplomacy—diplomatic cosmopolitan contexts require that everyone set aside their knee-jerk assumptions about what “everyone knows” or what X “obviously means,” etc. I think part of why it came about (/has almost certainly been reinvented thousands of times) is that people wanted to interact nondestructively with people whose cultural assumptions greatly differed from their own.
Duncan Sabien (Inactive)
Do environments widely recognized for excellence and intellectual progress generally have cultures of harsh and blunt criticism
To the best of my ability to detect, the answer is clearly and obviously “no” — there’s an important property of people not-bullshitting and not doing the LinkedIn thing, but you can actually do clear and honest and constructively critical communication without assholery (and it seems to me that the people who lump the two together have a skill issue and some sort of color-blindness; because they don’t know how to get the good parts of candor and criticism while not unduly hurting feelings, they assume that it can’t be done).
I think there’s a conflation here between “the internal experience of the emotion of caring” and “the act of caring, visible to external observers.”
I think you’re saying “I might take actions that look like not-caring, due to constraints like memory, and I don’t want this to be misconstrued as not having the internal experience of caring,” but I think that ultimately if one has … akrasiatic caring? … this doesn’t matter. I think that whether it’s down to unfortunate memory constraints or like deliberate callousness, what usually carries weight is “does this person take the actions that I need them to take, in order to be happy interacting?” and if the answer is “nope” then the reason doesn’t matter all that much to me.
Or to put it another way, “sufficiently advanced obliviousness is indistinguishable from malice” is a sentence pattern that has a lot of other versions, e.g. “sufficiently advanced inability to scrape together spoons, due to chronic illness, is indistinguishable from apathy.”
It’s not that an “I don’t care about you” signal was accidentally sent, it’s that the action of care, sufficient for the needs of the other person, wasn’t taken. It’s tragic when it wasn’t-taken due to reality constraints, as opposed to due to a free and unpressured choice, but what matters is whether the action is on the table.
Your colleague who has trouble learning non-WASP names, for instance … in my culture, it’s bad to excoriate that colleague if it’s a genuine constraint issue, but whether it’s inability or unwillingness isn’t pertinent if what matters is “can we say each other’s names, reliably?”
I volunteer as tribute
I was going to type a longer comment for the people who are observing this interaction, but I think the phrase “case in point” is superior to what I originally drafted.
(It was me, and in the place where I encouraged DrShiny to come here and repeat what they’d already said unprompted, I also offered $5 to anybody who disagreed with the Said ban to please come and leave that comment as well.)
Just noting that
one should object to tendentious and question-begging formulations, to sneaking in connotations, and to presuming, in an unjustified way, that your view is correct and that any disagreement comes merely from your interlocutor having failed to understand your obviously correct view
is a strong argument for objecting to the median and modal Said comment.
But I think a lot of Said’s confusions would actually make more sense to Said if he came to the realization that he’s odd, actually, and that the way he uses words is quite nonstandard, and that many of the things which baffle and confuse him are not, in fact, fundamentally baffling or confusing but rather make sense to many non-Said people.
(My own writing, from here.)
Separately, I will note (shifting the (loose) analogy a little) that if someone were to propose “hey, why don’t we put ourselves in the position of wolves circa 20,000 years ago? Like, it’s actually fine to end up corralled and controlled and mutated according to the whims of a higher power, away from our present values; this is actually not a bad outcome at all; we should definitely build a machine that does this to us,”
they would be rightly squinted at.
Like, sometimes one person is like “I’m pretty sure it’ll kill everyone!” and another person responds “nuh-uh! It’ll just take the lightcone and the vast majority of all the resources and keep a tiny token population alive under dubious circumstances!” as if this is, like, sufficiently better to be considered good, and to have meaningfully dismissed the original concern.
It is better in an absolute sense, but again: “c’mon, man.” There’s a missing mood in being like “yeah, it’s only going to be as bad as what happened to monkeys!” as if that’s anything other than a catastrophe.
(And again: it isn’t likely to only be as bad as what happened to monkeys.)
(But even if it were, wolves of 20,000 years ago, if you could contrive to ask them, would not endorse the present state of wolves-and-dogs today. They would not choose that future. Anyone who wants to impose an analogous future on humanity is not a friend, from the perspective of humanity’s values. Being at all enthusiastic about that outcome feels like a cope, or something.)
No, the edit completely fails to address or incorporate
You have to be careful with the metaphor, because it can lead people to erroneously assuming that an AI would be at least that nice, which is not at all obvious or likely for various reasons
...and now I’m more confused at what’s going on. Like, I’m not sure how you missed (twice) the explicitly stated point that there is an important disanalogy here, and that the example given was more meant to be an intuition pump. Instead you seem to be sort of like “yeah, see, the analogy means that at least some humans would not die!” which, um. No. It would imply that, if the analogy were tight, but I explicitly noted that it isn’t and then highlighted the part where I noted that, when you missed it the first time.
(I probably won’t check in on this again; it feels doomy given that you seem to have genuinely expected your edit to improve things.)
I disagree with your “obviously,” which seems both wrong and dismissive, and seems like you skipped over the sentence that was written specifically in the hopes of preventing such a comment:
You have to be careful with the metaphor, because it can lead people to erroneously assuming that an AI would be at least that nice, which is not at all obvious or likely for various reasons
(Like, c’mon, man.)
Why would modern technology-using humans ‘want’ to destroy the habitats of the monkeys and apes that are the closest thing they still have to a living ancestor in the first place? Don’t we feel gratitude and warmth and empathy and care-for-the-monkey’s-values such that we’re willing to make small sacrifices on their behalf?
(Spoilers: no, not in the vast majority of cases. :/ )
The answer is “we didn’t want to destroy their habitats, in the sense of actively desiring it, but we had better things to do with the land and the resources, according to our values, and we didn’t let the needs of the monkeys and apes slow us down even the slightest bit until we’d already taken like 96% of everything and even then preservation and conservation were and remain hugely contentious.”
You have to be careful with the metaphor, because it can lead people to erroneously assuming that an AI would be at least that nice, which is not at all obvious or likely for various reasons (that you can read about in the book when it comes out in September!). But the thing that justifies treating catastrophic outcomes as the default is that catastrophic outcomes are the default. There are rounds-to-zero examples of things that are 10-10000x smarter than Other Things cooperating with those Other Things’ hopes and dreams and goals and values. That humans do this at all is part of our weirdness, and worth celebrating, but we’re not taking seriously the challenge involved in robustly installing such a virtue into a thing that will then outstrip us in every possible way. We don’t even possess this virtue ourselves to a degree sufficient that an ant or a squirrel standing between a human and something that human wants should feel no anxiety.
The problem is, evolution generally doesn’t build in large buffers. Human brains are “pretty functional” in the sense that they just barely managed to be adequate to the challenges that we faced in the ancestral environment. Now that we are radically changing that environment, the baseline “barely adequate” doesn’t have to degrade very much at all before we have concerningly high rates of stuff like obesity, depression, schizophrenia, etc.
(There are other larger problems, but this is a first gentle gesture in the direction of “I think your point is sound but still not reassuring.” I agree you could productively make a list of proxies that are still working versus ones that aren’t holding up in the modern era.)
Yes (alas)
… I will not be responding further because the confidence you’re displaying is not in line with (my sense of) LessWrong’s bare minimum standard of quality for assertion. You seem not to be bothering at all with questions like “why, specifically, do I believe what I believe?” or “how would I notice if I were wrong?”
I read the above as, essentially, saying “I know that an ASI will behave a certain way because I just thought about it and told myself that it would, and now I’m using that conclusion as evidence.” (I’m particularly pointing at “as we learn from this particular example.”)
On the surface level, that may seem to be the same thing that MIRI researchers are doing, but there are several orders of magnitude difference in the depth and detail of the reasoning, which makes (what seems to me to be) a large qualitative difference.
You seem to believe we have the capacity to “tell” a superintelligence (or burgeoning, nascent proto-superintelligence) anything at all, and this is false, as the world’s foremost interpretability experts generally confirm. “Amass power and resources while minimizing risks to yourself” is still a proxy, and what the pressure of that proxy brings-into-being under the hood is straightforwardly not predictable with our current or near-future levels of understanding.
This link isn’t pointing straight at my claim, it’s not direct support, but still: https://x.com/nabla_theta/status/1802292064824242632
I am not a MIRI researcher, but I do nonzero work for MIRI and my sense is “yes, correct.”
imo it “makes sense” to have a default prior of suspicion that things which have hurt you are Bad, things which you’ve never seen provide value Don’t Provide Value, and things which you’ve never seen at all Don’t Exist.
Like, there’s a kind of innate Occam’s Razor built into people that often takes the form of doubt, and it’s good to have this pressure as opposed to being infinitely credulous or letting the complexity of the territory spiral out of control until your map isn’t helpful.
But I think a key part of maturity is coming to recognize the fact that your subset of experience is impoverished, and there are things that are outside of your view (and not written on your map) that are nevertheless real, and learning to productively ask the question “have I not seen evidence of this before because no evidence exists, or because I’ve been blinded by the RNG gods?”
Just noting that when LW 2.0 revived, I did this on my own for a month, writing under the pseudonym Conor Moreton. I was already somewhat accomplished as a writer, but I can affirm that it was nevertheless an extremely useful exercise on multiple axes. If you’re sitting there being, like, “ah, sounds pretty cool, shame I can’t do it for [reason],” maybe pause and do some goal factoring and see if you can’t get around [reason].
I deleted it for such poor reading comprehension and adversarially selective quotation of the Facebook post in question—
(which is over 2200 words long and has tons of relevant context that softens the impression of the above text, which also didn’t contain the added bolding that pushes it in an even more straw direction)
—that it was inescapably either malice or negligence sufficiently advanced so as to be indistinguishable from malice. I would’ve greatly preferred that DirectedEvolution take the hint rather than reposting elsewhere, but since that hint was not taken I am now banning DirectedEvolution from being able to do any similarly shitty psychologizing on my future posts (and lodging this brief defense of myself, which I would have preferred not to have to write in the first place, and was with the original deletion trying to avoid needing to write).
From that same Facebook post:
And from discussion beneath it:
DirectedEvolution’s overt attempt to categorize me as mentally ill, and my models suspect based purely on that categorization, is unjustified and not particularly welcome.
(I also found a bunch of the reasoning in the four bullet points to be pretty poor, but that just made me unenthusiastic about trying to bridge gaps; it was the last paragraph that earned intended-to-be-silent deletion.)