I was going to type a longer comment for the people who are observing this interaction, but I think the phrase “case in point” is superior to what I originally drafted.
Duncan Sabien (Inactive)
(It was me, and in the place where I encouraged DrShiny to come here and repeat what they’d already said unprompted, I also offered $5 to anybody who disagreed with the Said ban to please come and leave that comment as well.)
Just noting that
one should object to tendentious and question-begging formulations, to sneaking in connotations, and to presuming, in an unjustified way, that your view is correct and that any disagreement comes merely from your interlocutor having failed to understand your obviously correct view
is a strong argument for objecting to the median and modal Said comment.
But I think a lot of Said’s confusions would actually make more sense to Said if he came to the realization that he’s odd, actually, and that the way he uses words is quite nonstandard, and that many of the things which baffle and confuse him are not, in fact, fundamentally baffling or confusing but rather make sense to many non-Said people.
(My own writing, from here.)
Separately, I will note (shifting the (loose) analogy a little) that if someone were to propose “hey, why don’t we put ourselves in the position of wolves circa 20,000 years ago? Like, it’s actually fine to end up corralled and controlled and mutated according to the whims of a higher power, away from our present values; this is actually not a bad outcome at all; we should definitely build a machine that does this to us,”
they would be rightly squinted at.
Like, sometimes one person is like “I’m pretty sure it’ll kill everyone!” and another person responds “nuh-uh! It’ll just take the lightcone and the vast majority of all the resources and keep a tiny token population alive under dubious circumstances!” as if this is, like, sufficiently better to be considered good, and to have meaningfully dismissed the original concern.
It is better in an absolute sense, but again: “c’mon, man.” There’s a missing mood in being like “yeah, it’s only going to be as bad as what happened to monkeys!” as if that’s anything other than a catastrophe.
(And again: it isn’t likely to only be as bad as what happened to monkeys.)
(But even if it were, wolves of 20,000 years ago, if you could contrive to ask them, would not endorse the present state of wolves-and-dogs today. They would not choose that future. Anyone who wants to impose an analogous future on humanity is not a friend, from the perspective of humanity’s values. Being at all enthusiastic about that outcome feels like a cope, or something.)
No, the edit completely fails to address or incorporate
You have to be careful with the metaphor, because it can lead people to erroneously assuming that an AI would be at least that nice, which is not at all obvious or likely for various reasons
...and now I’m more confused at what’s going on. Like, I’m not sure how you missed (twice) the explicitly stated point that there is an important disanalogy here, and that the example given was more meant to be an intuition pump. Instead you seem to be sort of like “yeah, see, the analogy means that at least some humans would not die!” which, um. No. It would imply that, if the analogy were tight, but I explicitly noted that it isn’t and then highlighted the part where I noted that, when you missed it the first time.
(I probably won’t check in on this again; it feels doomy given that you seem to have genuinely expected your edit to improve things.)
I disagree with your “obviously,” which seems both wrong and dismissive, and seems like you skipped over the sentence that was written specifically in the hopes of preventing such a comment:
You have to be careful with the metaphor, because it can lead people to erroneously assuming that an AI would be at least that nice, which is not at all obvious or likely for various reasons
(Like, c’mon, man.)
Why would modern technology-using humans ‘want’ to destroy the habitats of the monkeys and apes that are the closest thing they still have to a living ancestor in the first place? Don’t we feel gratitude and warmth and empathy and care-for-the-monkey’s-values such that we’re willing to make small sacrifices on their behalf?
(Spoilers: no, not in the vast majority of cases. :/ )
The answer is “we didn’t want to destroy their habitats, in the sense of actively desiring it, but we had better things to do with the land and the resources, according to our values, and we didn’t let the needs of the monkeys and apes slow us down even the slightest bit until we’d already taken like 96% of everything and even then preservation and conservation were and remain hugely contentious.”
You have to be careful with the metaphor, because it can lead people to erroneously assuming that an AI would be at least that nice, which is not at all obvious or likely for various reasons (that you can read about in the book when it comes out in September!). But the thing that justifies treating catastrophic outcomes as the default is that catastrophic outcomes are the default. There are rounds-to-zero examples of things that are 10-10000x smarter than Other Things cooperating with those Other Things’ hopes and dreams and goals and values. That humans do this at all is part of our weirdness, and worth celebrating, but we’re not taking seriously the challenge involved in robustly installing such a virtue into a thing that will then outstrip us in every possible way. We don’t even possess this virtue ourselves to a degree sufficient that an ant or a squirrel standing between a human and something that human wants should feel no anxiety.
The problem is, evolution generally doesn’t build in large buffers. Human brains are “pretty functional” in the sense that they just barely managed to be adequate to the challenges that we faced in the ancestral environment. Now that we are radically changing that environment, the baseline “barely adequate” doesn’t have to degrade very much at all before we have concerningly high rates of stuff like obesity, depression, schizophrenia, etc.
(There are other larger problems, but this is a first gentle gesture in the direction of “I think your point is sound but still not reassuring.” I agree you could productively make a list of proxies that are still working versus ones that aren’t holding up in the modern era.)
Yes (alas)
… I will not be responding further because the confidence you’re displaying is not in line with (my sense of) LessWrong’s bare minimum standard of quality for assertion. You seem not to be bothering at all with questions like “why, specifically, do I believe what I believe?” or “how would I notice if I were wrong?”
I read the above as, essentially, saying “I know that an ASI will behave a certain way because I just thought about it and told myself that it would, and now I’m using that conclusion as evidence.” (I’m particularly pointing at “as we learn from this particular example.”)
On the surface level, that may seem to be the same thing that MIRI researchers are doing, but there are several orders of magnitude difference in the depth and detail of the reasoning, which makes (what seems to me to be) a large qualitative difference.
You seem to believe we have the capacity to “tell” a superintelligence (or burgeoning, nascent proto-superintelligence) anything at all, and this is false, as the world’s foremost interpretability experts generally confirm. “Amass power and resources while minimizing risks to yourself” is still a proxy, and what the pressure of that proxy brings-into-being under the hood is straightforwardly not predictable with our current or near-future levels of understanding.
This link isn’t pointing straight at my claim, it’s not direct support, but still: https://x.com/nabla_theta/status/1802292064824242632
I am not a MIRI researcher, but I do nonzero work for MIRI and my sense is “yes, correct.”
imo it “makes sense” to have a default prior of suspicion that things which have hurt you are Bad, things which you’ve never seen provide value Don’t Provide Value, and things which you’ve never seen at all Don’t Exist.
Like, there’s a kind of innate Occam’s Razor built into people that often takes the form of doubt, and it’s good to have this pressure as opposed to being infinitely credulous or letting the complexity of the territory spiral out of control until your map isn’t helpful.
But I think a key part of maturity is coming to recognize the fact that your subset of experience is impoverished, and there are things that are outside of your view (and not written on your map) that are nevertheless real, and learning to productively ask the question “have I not seen evidence of this before because no evidence exists, or because I’ve been blinded by the RNG gods?”
Just noting that when LW 2.0 revived, I did this on my own for a month, writing under the pseudonym Conor Moreton. I was already somewhat accomplished as a writer, but I can affirm that it was nevertheless an extremely useful exercise on multiple axes. If you’re sitting there being, like, “ah, sounds pretty cool, shame I can’t do it for [reason],” maybe pause and do some goal factoring and see if you can’t get around [reason].
LessWrong is still a really rough place for me to try to do anything other than “present complete thoughts that I am thoroughly ready to defend.”
The fact that some other comments trickled in, and that the vote ratios stabilized, was definitely an improvement over the situation in the first 36h, but I think it’s not super cruxy? It was more of a crisp example (at the time) of the larger gestalt that demoralizes me, and it ceasing to be an example doesn’t mean the gestalt went away.
One of the things that hurts me when I try to be on LessWrong is something like ….…
A person will make a comment that has lots of skewed summaries and misinterpretations and just-plain-wrong claims about the claims it says its responding to (but it’s actually strawmanning)
Someone else will make an effortful rebuttal that corrects the misconceptions of the top-level comment and sometimes even answers the steel version of its complaint
The top comment will continue to accrue upvotes at a substantially faster clip than the lower comment, despite being meaningfully wrong, and it won’t ever get edited or fixed and it just keeps anchoring people on wrongthoughts forever
The lower comment gets largely ignored and often people don’t even engage with it at all
...all of which lives in my soul as a sort of despair that might be described as “yeah, so, if you want to give people a platform upon which to strawman you and then gain lots of local status and also infect the broader crowd very efficiently with their uncanny-valley version of your ideas such that it becomes even harder to actually talk about the thing you wanted to talk about … post it on LW!”
A whole other way to gesture at the same problem is something like, out in the real world I often find myself completely alone against the mob, fighting for either [truth] or [goodness] or both. And that’s fine. The real world makes no promises about me-not-finding-myself-alone.
But LessWrong, by its name and mission and sometimes by explicit promise or encouragement, is “supposed” to be the sort of place where I’m not completely alone against the mob. For one thing, the mob is supposed to be good instead of bad, and for another, there are supposed to be other people around who are also fighting the good fight, and not just me all by myself.
(There’s some hyperbole here, but.)
Instead, definitely not all the time but enough times that it matters, and enough times that it’s hurt me over and over again, and enough times that it produces strong hesitation, I’ve found myself completely alone with even high-status LWers (sometimes even senior mods!) just straightforwardly acting as the forces of darkness and madness and enacting mindkilledness and advocating for terrible and anti-epistemic things and it just hurts real bad. The feeling of betrayal and rug-pulled-out is much worse because this was supposed to be a place where people who care about collaborative truth-seeking could reliably find others to collaborate with.
I can find people to think productively with on LessWrong. But I can’t rely on it. More than 20% of the time, it goes badly, and “do an expansive and vulnerable thing in an environment where you will be stabbed for it one time out of five” just … kinda doesn’t work.
I definitely endorse people using whatever terms work for them, but I predict that “intermediary spaces” is going to work less.
Honestly, Rowan is providing a pretty solid case study in exactly the topic at hand.
I volunteer as tribute