I mostly agree, but the word gentile is a medieval translation of “goyim,” so it’s a bit weird to differentiate between them. (And the idea that non-jews are ritually impure is both confused, and an frequent antisemitic trope. In fact, idol worshippers were deemed impure, based on verses in the bible, specifically Leviticus 18:24, and there were much later rabbinic decrees to discourage intermingling with even non-idol worshippers.)
Also, both Judaism and LDS (with the latter obviously more proselytizing) have a route for such excluded individuals to join, so calling this “a state of being which outsiders cannot attain” is also a bit strange to claim.
Davidmanheim
Your dismissive view of “conservatism” as a general movement is noted, and not even unreasonable—but it seems basically irrelevant to what we were discussing in the post, both in terms of what we called conservatism, and the way you tied it to ’Hostile to AGI.” And the latter seems deeply confused, or at least needs much more background explanation.
I’d be more interested in tools that detected downvotes that occur before people started reading, on the basis of the title—because I’d give even odds that more than half of downvotes on this post were within 1 minute of opening it, on the basis of the title or reacting the the first paragraph—not due to the discussion of CEV.
I agree that Eliezer has made different points different places, and don’t think that the Fun Theory series makes this clear, and CEV as described seems to not say it. (I can’t try to resolve all the internal tensions between the multiple bookshelves woth of content he’s produced, so I referred to “fun theory, as written.”)
And I certainly don’t think conflict as such is good! (I’ve written about the benefits of avoiding conflict at some length on my substack about cooperation.) My point here was subtly different, and more specific to CEV; I think that solutions for eliminating conflict which route around humans themselves solving the problems might be fundamentally destructive of our values.
I don’t think this is true in the important sense; yes, we’ll plausibly get material abundance, but we will still have just as much conflict because humans want scarcity, and they want conflict. So which resources are “important” will shift. (I should note that Eliezer made something like this point in a tweet, where he said “And yet somehow there is a Poverty Equilibrium which beat a 100-fold increase in productivity plus everything else that went right over the last thousand years”—but his version assumes that once all the necessities are available, poverty would be gone. I think that we view clearly impossible past luxuries, like internet connectivity and access to laundry machines as minimal requirements, showing that the hedonic treadmill is stronger than wealth generation!)
Thank you for noticing the raft of reflexive downvotes; it’s disappointing how much even Lesswrong seems to react reflexively; even the comments seem not to have read the piece, or at least engaged with the arguments.
On your response—I agree that CEV as a process could arrive at the outcomes you’re describing, where ineliminable conflict gets it to throw an error—but think that CEV as approximated and as people assume will work is, as you note, making a prediction that disagreements will dissolve. Not only that, but it asserts that this will have an outcome that preserves what we value. If the tenets of agonism are correct, however, any solution geared towards “efficiently resolving conflict” is destructive of human values—because as we said, “conflict is central to the way society works, not something to overcome.” Still, I agree that Eliezer got parts of this right (a decade before almost anyone else even noticed the problem,) and agree that keeping things as multiplayer games with complex novelty, where conflict still matters is critical. The further point, which I think Eliezer’s fun theory, as written, kind of elides, is that we also need limits and pain for the conflict to matter. That is, again, it seems possible that part of what makes things meaningful is that we need to ourselves engage in the conflict, instead of having it “solved” via extrapolation of our values.
As a separate point, I argued in a different post, we lack the conceptual understanding needed to deal with the question of whether there is some extrapolated version of most agents that is anywhere “close” to their values which is coherent. But at the very least, “the odds that an arbitrary complex system is pursuing some coherent outcome” approaches zero, and that at least slightly implies almost all agents might not be “close” to a rational agent in the important senses we care about for CEV.
“This is what’s happening and we’re not going to change it” isn’t helpful—both because it’s just saying we’re all going to die, and because it fails to specify what we’d like to have happen instead. We’re not proposing a specific course for us to influence AI developers, we’re first trying to figure out what future we’d want.
The viability of what approach, exactly? You again seem to be reading something different than what was written.
You said “There is no point in this post where the authors present a sliver of evidence for why it’s possible to maintain the ‘barriers’ and norms that exist in current societies, when the fundamental phase change of the Singularity happens.”
Did we make an argument that it was possible, somewhere, which I didn’t notice writing? Or can I present a conclusion to the piece that might be useful:
”...the question we should be asking now is where [this] view leads, and how it could be achieved.That is going to include working towards understanding what it means to align AI after embracing this conservative view, and seeing status and power as a feature, not a bug. But we don’t claim to have ‘the’ answer to the question, just thoughts in that direction—so we’d very much appreciate contributions, criticisms, and suggestions on what we should be thinking about, or what you think we are getting wrong.”
We’re very interested in seeing where people see flaws, and there’s a real chance that they could change our views. This is a forum post, not a book, and the format and our intent sharing it differs. That is, if we had completed the entire sequence before starting to get public feedback, the idea of sharing the full seuquence at the start would work—but we have not. We have ideas, partial drafts, and some thoughts on directions to pursue, but it’s not obvious that the problems we’re addressing are solvable, so we certainly don’t have final conclusions, nor do I think we will get there when we conclude the sequence.
Yes, this is partly true, but assumes that we can manage technical alignment in a way that is separable from the values we are aiming towards—something that I would have assumed was true before we saw the shape of LLM “alignment” solutions, but no longer think is obvious.
And instruction-following is deeply worrying as an ‘alignment target’, since it doesn’t say anything about what ends up happening, much less actually guarantee corrigibility—especially since we’re not getting meaningful oversight - but that’s a very different argument than the one we’re making here.
“AI is very likely to drastically move us away from scarcity and towards abundance”
That makes a huge number of assumptions about the values and goals of the AI, and is certainly not obvious—unless you’ve already assumed things about the shape of the likely future, and the one we desire. But that’s a large part of what we’re questioning.
Yes, agreed that the concept of value is very often confused, mixing economic utility and decision theory with human preferences, constraints, and goals. Harry Law also discussed the collapse of different conceptions into a single idea of “values” here: https://www.learningfromexamples.com/p/weighed-measured-and-found-wanting
I’m confused by this criticism. You jumped on the most the most basic objection that jumps to mind first based on what you thought we were saying—but you were wrong. We said, explicitly, that this is “our lens on parts of the conservative-liberal conceptual conflict” and then said “In the next post, we want to outline what we see as a more workable version of humanity’s relationship with AGI moving forward.”
My reply wasn’t backing out of a claim, it was clarifying the scope by restating and elaborating slightly something we already said in the very first section of the post!
I won’t try to speak for my co-author, but yes, we agree that this doesn’t try to capture the variety of views that exist, much less what your view of political discourse should mean by conservatism—this is a conservative vision, not the conservative vision. And given that, we find the analogy to be useful in motivating our thinking and illustrating an important point, despite that fact that all analogies are inexact.
That said, yes, I don’t “feel the AGI” in the sense that if you presume that the singularity will happen in the typically imagined way, humanity as we know it doesn’t make it. And with it goes any ability to preserve our current values. I certainly do “feel the AGI” in thinking that the default trajectory is pointed in that direction, and accelerating, and it’s not happening in any sense in a fashion that preserves any values whatsoever, conservative or otherwise.But that’s exactly the point—we don’t think that the AGI which is being aimed for is a good thing, and we do think that the conversation about the possible futures humanity could be aiming for is (weirdly) narrowly constrained to be either a pretty bland techno-utopianism, or human extinction. We certainly don’t think that it’s necessary for there to be no AGI, and we don’t think that eternal stasis is a viable conservative vision either, contrary to what you assume we meant. But as we said, this is the first post in a series about our thinking on the topic, not a specific plan, much less final word on how things should happen.
A Conservative Vision For AI Alignment
And the receiver won’t know where to look, and we don’t know where to send it, so given attenuation, even 10^4 light years seems pretty optimistic.
There’s a lot of energy expenditure needed to make the kinds of broadcasts you discuss, and even then, it won’t go far—aiming for specific targets with relatively high probability of reception helps a bit, but not much. Given that, this all seems low value compared to even short term benefits that we could get with the same effort or at the same cost.
Seems important to consider threats and game theoretic reasons to carry them out when considering pessimizing. That is, one actor may pessimize the other as retaliation rather than as a goal, or carry out an action to ensure future threats are credible. The dynamics in these cases are critical, since the goals may be amenable to cooperative solutions, rather than be fundamentally and unavoidably zero or negative sum.
I’m guessing we don’t actually strongly disagree here, but I think that unless you’re broadening / shortening “information processing and person modelling technologies” to “technologies”, it’s only been a trend for a couple decades at most—and even with that broadening, it’s only been true under some very narrow circumstances in the west recently.
I asked an LLM to do the math explicitly, and I think it shows that it’s pretty infeasible—you need a large portion of total global power output, and even then you need to know who’s receiving the message, you can’t do a broad transmission.
I also think this plan preserves almost nothing I care about. At the same time, at least it’s realistic about our current trajectory, so I think planning along these lines and making the case for doing it clearly and publicly is on net good, even if I’m skeptical of the specific details you suggested, and don’t think it’s particularly great even if we succeed.