I’ve posted on LW before, but I posted again here after a long hiatus because of recent AI news, and entirely unaware of the good heart thing; then made several comments after reading the original post, but thinking it was a joke. Now I understand why the site was so strangely active.
homunq
- “An animal looking curiously in the mirror, but the reflection is a different kind of animal; in digital style.” - “A cat looking curiously in the mirror, but the reflection is a different kind of animal; in digital style.” - “A cat looking curiously in the mirror, but the reflection is a dog; in digital style.” - Curious to see how it handles modified-reflection and lack-of-specificity. 
- Another thing whose True Name is probably a key ingredient for alignment (and which I’ve spent a lot of time trying to think rigorously about): collective values. - Which is interesting, because most of what we know so far about collective values is that, for naive definitions of “collective” and “values”, they don’t exist. Condorcet, Arrow, Gibbard and Satterthwaite, and (crucially) Sen have all helped show that. - I personally don’t think that means that the only useful things one can say about “collective values” are negative results like the ones above. I think there are positive things to say; definitions of collectivity (for instance, of democracy) that are both non-trivial and robust. But finding them means abandoning the naive concepts of “collective values”. - I think that this is probably a common pattern. You go looking for the True Name of X, but even if that search ever bears fruit, you’d rarely if ever look back and say “Y is the True Name of X”. Instead, you’d say something like “(long math notation) is the True Name of itself, or for short, of Y. Though I found this by looking for X, calling it ‘X’ was actually a misnomer; that phrase has baked-in misconceptions and/or red herrings, so from now on, let’s call it ‘Y’ instead.” 
- I think this post makes sense given the premises/arguments that I think many people here accept: that AG(S)I is either amazingly good or amazingly bad, and that getting the good outcome is a priori vastly improbable, and that the work needed to close the gap between that prior and a good posterior is not being done nearly fast enough. - I don’t reject those premises/arguments out of hand, but I definitely don’t think they’re nearly as solid as I think many here do. In my opinion, the variance in goodness of reasonably-thinkable post-AGSI futures is mind-bogglingly large, but it’s still probably a bell curve, with greater probability density in the “middle” than in super-heaven or ultra-hell. I also think that just making the world a better place here and now probably usually helps with alignment. - This is probably not the place for debating these premises/arguments; they’re the background of this post, not its point. But I do want to say that having a different view on that background is (at least potentially) a valid reason for not buying into the “containment” strategy suggested here. - Again, I think my point here is worthwhile to mention as one part of the answer to the post’s question “why don’t more people think in terms of containment”. I don’t think that we’re going to resolve whether there’s space in between “friendly” and “unfriendly” right here, though. 
- Sure, humans are effectively ruthless in wiping out individual ant colonies. We’ve even wiped out more than a few entire species of ant. But our ruthfulness about our ultimate goals — well, I guess it’s not exactly ruthfulness that I’m talking about... - ...The fact that it’s not in our nature to simply define an easy-to-evaluate utility function and then optimize, means that it’s not mere coincidence that we don’t want anything radical enough to imply the elimination of all ant-kind. In fact, I’m pretty sure that for a large majority of people, there’s no utopian ideal you could pitch and they’d buy into, that’s so radical enough that getting there would imply or even suggest actions that would kill all ants. Not because humanity wouldn’t be capable of doing that, just that we’re not capable of wanting that, and that fact may be related to our (residual) ruthfulness and to our intelligence itself. And metaphorically, from a superintelligence’s perspective, I think that humanity-as-a-whole is probably closer to being Formicidae than it is to being one species of ant. - ... - This post, and its line of argument, is not about saying “AI alignment doesn’t matter”. Of fucking course it does. What I’m saying is: “it may not be the case that any tiny misalignment of a superintelligence is fatal/permanent”. Because yes, a superintelligence can and probably will change the world to suit its goals, but it won’t ruthlessly change the whole world to perfectly suit its goals, because those goals will not, themselves, be perfectly coherent. And in that gap, I believe there will probably still be room for some amount of humanity or posthumanity-that’s-still-commensurate-with-extrapolated-human-values having some amount of say in their own fates. - The response I’m looking for is not at all “well, that’s all OK then, we can stop worrying about alignment”. Because there’s a huge difference between future (post)humans living meagerly under sufferance in some tiny remnant of the world that a superintelligence doesn’t happen to care about coherently enough to change, or them thriving as an integral part of the future that it does care about and is building, or some other possibility better or worse than those. But what I am arguing is that I think the “win big or lose big are the only options” attitude I see as common in alignment circles (I know that Eleizer isn’t really cutting edge anymore, but, look at his recent April Fools’ “joke” for an example) may be misguided. Not every superintelligence that isn’t perfectly friendly is terrifyingly unfriendly, and I think that admitting other possibilities (without being complacent about them) might help useful progress in pursuing alignment. - ... - As for your points about therapy: yes, of course, my off-the-cuff one-paragraph just-so-story was oversimplified. And yes, you seem to know a lot more about this than I do. But I’m not sure the metaphor is strong enough to make all that complexity matter here. 
- I guess we’re using different definitions of “friendly/unfriendly” here. I mean something like “ruthlessly friendly/unfriendly” in the sense that humans (neurotic as they are) aren’t. (Yes, some humans appear ruthless, but that’s just because their “ruths” happen not to apply. They’re still not effectively optimizing for future world-states, only for present feels.) - I think many of the arguments about friendly/unfriendly AI, at least in the earlier stages of that idea (I’m not up on all the latest) are implicitly relying on that “ruthless” definition of (un)friendliness. - You (if I understand) mean “friendly/unfriendly” in a weaker sense, in which humans can be said to be friendly/unfriendly (or neither? Not sure what you’d say about that, but it probably doesn’t matter.) - As for the “smart people going to dumb therapists” argument, I think you’re going back to a hidden assumption of ruthlessness: if the person knew how to feel better in the future, they would just do that. But what if, for instance, they know how to feel better in the future, but doing that thing wouldn’t make them feel better right now unless they first simplify it enough to explain it to their dumb therapist? The dumb therapist is still playing a role. - My point is NOT to say that non-ruthless GASI isn’t dangerous. My point is that it’s not an automatic “game over” because if it’s not ruthless it doesn’t just institute its (un)friendly goals; it is at least possible that it would not use all its potential power. 
- Why does the AI even “want” failure mode 3? If it’s a RL agent, it’s not “motivated to maximize its reward”, it’s “motivated to use generalized cognitive patterns that in its training runs would have marginally maximized its reward”. Failure mode 3 is the peak of an entirely separate mountain than the one RL is climbing, and I think a well-designed box setup can (more-or-less “provably”) prevent any cross-peak bridges in the form of cognitive strategies that undermine this. - That is to say: yes, it can (or at least, it it’s not provable that it can’t) imagine a way to break the box, and it can know that the reward it would actually get from breaking the box would be “infinite”, but it can be successfully prevented from “feeling” the infinite-ness of that potential reward, because the RL procedure itself doesn’t consider a broken-box outcome to be a valid target of cognitive optimization. - Now, this creates a new failure mode, where it hacks its own RL optimizer. But that just makes it unfit, not dangerous. Insofar as something goes wrong to let this happen, it would be obvious and easy to deal with, because it would be optimizing for thinking it would succeed and not for succeeding. - (Of course, that last sentence could also fail. But at least that would require two simultaneous failures to become dangerous; and it seems in principle possible to create sufficient safeguards and warning lights around each of those separately, because the AI itself isn’t subverting those safeguards unless they’ve already failed.) 
What if “friendly/unfriendly” GAI isn’t a thing?
- One way of dividing up the options is: fix the current platform, or find new platform(s). The natural decay process seems to be tilting towards the latter, but there are downsides: the diaspora loses cohesion, and while the new platforms obviously offer some things the current one doesn’t, they are worse than the current one in various ways (it’s really hard to be an occasional lurker on FB or tumblr, especially if you are more interested in the discussion than the “OP”). - If the consensus is to fix the current platform, I suggest trying the simple fixes first. As far as I can tell, that means, break the discussion/main dichotomy, and do something about “deletionist” downvoting. Also, making it clearer how to contribute to the codebase, with a clearer owner. I think that these things should be tried and given a chance to work before more radical stuff is attempted. - If the consensus is to find something new, I suggest that it should be something which has a corporation behind it. Something smallish but on the up-and-up, and willing to give enough “tagging” capability for the community to curate itself and maintain itself reasonably separate from the main body of users of the site. It should be something smaller than FB but something willing to take the requests of the community seriously. Reddit, Quora, StackExchange, Medium… this kind of thing, though I can see problems with each of those specific suggestions. 
- I disagree. I think the issue is whether “pro-liberty” is the best descriptive term in this context. Does it point to the key difference between things it describes and things it doesn’t? Does it avoid unnecessary and controversial leaps of abstraction? Are there no other terms which all discussants would recognize as valid, if not ideal? No, no, and no. 
- Whether something is a defensible position, and whether it should be embedded in the very terms you use when more-neutral terms are available, are separate questions. - If you say “I’m pro-liberty”, and somebody else says “no you’re not, and I think we could have a better discussion if you used more specific terms”, you don’t get to say “why won’t you accept me at face value”. 
- When you say “Nothing short of X can get you to Y”, the strong implication is that it’s a safe bet that X will at least not move you away from Y, and sometimes move you toward it. So OK, I’ll rephrase: - The OP suggests that colonization is in fact a proven way to turn at least some poor countries into more productive ones. 
- Note that my post just above was basically an off-the-cuff response to what I felt was a ludicrously wrong assumption buried in the OP. I’m not an expert on African history, and I could be wrong. I think that I gave the OP’s idea about the level of refutation it deserved, but I should have qualified my statements more (“I’d guess...”), so I certainly didn’t deserve 5 upvotes for this (5 points currently; I deserve 1-3 at most). 
- I think that it’s worth being more explicit in your critique here. - The OP suggests that colonization is in fact a proven way to turn poor countries into productive ones. But in fact, it does the opposite. Several parts of Africa were at or above average productivity before colonization¹, and well below after; and this pattern has happened at varied enough places and times to be considered a general rule. The examples of successful transitions from poor countries to rich ones—such as South Korea—do not involve colonization. - ¹Note that I’m considering the triangular trade as a form of colonization; even if it didn’t involve proconsuls, it involved an external actor explicitly fomenting a hierarchical and extractive social order. 
- I think you can make this critique more pointed. That is: “pro-liberty” is flag-waving rhetoric which makes us all stupider. - I dislike the “politics is a mind-killer” idea if it means we can’t talk about politically touchy subjects. But I entirely agree with it if it means that we should be careful to keep our language as concrete and precise as possible when we approach these subjects. I could write several paragraphs about all the ways that the term “pro-liberty” takes us in the wrong direction, but I expect that most of you can figure all that out for yourselves. 
- It appears that you need to be logged in from FB or twitter to be fully non-guest. That seems like a… strange… choice for an anti-akrasia tool. - (Tangentially related to above, not really a reply) 
- Fair enough. Thanks. Again, I agree with some of your points. I like blemish-picking as long as it doesn’t require open-ended back-and-forth. 
- You’re raising some valid questions, but I can’t respond to all of them. Or rather, I could respond (granting some of your arguments, refining some, and disputing some), but I don’t know if it’s worth it. Do you have an underlying point to make, or are you just looking for quibbles? If it’s the latter, I still thank you for responding (it’s always gratifying to see people care about issues that I think are important, even if they disagree); but I think I’ll disengage, because I expect that whatever response I give would have its own blemishes for you to find. - In other words: OK, so what? 
- Full direct democracy is a bad idea because it’s incredibly inefficient (and thus also boring/annoying, and also subject to manipulation by people willing to exploit others’ boredom/annoyance). This has little or nothing to do with whether people’s preferences correlate with their utilities, which is the question I was focused on. In essence, this isn’t a true Goldilocks situation (“you want just the right amount of heat”) but rather a simple tradeoff (“you want good decisions, but don’t want to spend all your time making them”). - As to the other related concepts… I think this is getting a bit off-topic. The question is, is energy (money) spent on pursuing better voting systems more of a valid “saving throw” than when spent on pursuing better individual rationality. That’s connected to the question of the preference/utility correlation of current-day, imperfectly-rational voters. I’m not seeing the connection to rule of law &c. 
Is “do whatever action you predict to maximize the electricity in this particular piece of wire” really “general”? You’re basically claiming that the more intelligent someone is, the more likely they are to wirehead. With humans, in my experience, and for a loose definition of “wirehead”, the pattern seems to be the opposite; and that seems to me to be solid enough in terms of how RL works that I doubt it’s worth the work to dig deep enough to resolve our disagreement here.