What odds are you willing to give taking friction into account?
Aiyen
How much are you willing to bet? I’ll take you up on that up to fairly high stakes.
I agree that alien visits are fairly unlikely, but not 99% unlikely.
-
This seems untrue. For one thing, high-powered AI is in a lot more hands than nuclear weapons. For another, nukes are well-understood, and in a sense boring. They won’t provoke as strong of a “burn it down for the lolz” response as AI will.
-
Even experts like Yann LeCun often do not merely not understand the danger, they actively rationalize against understanding it. The risks are simply not understood or accepted outside of a very small number of people.
-
Remember the backlash around Sydney/Bing? Didn’t stop her creation. Also, the idea that governments are working in their nations’ interests does not survive looking at history, current policy or evolutionary psychology (think about what motivations will help a high-status tribesman pass on his genes. Ruling benevolently ain’t it.)
-
You think RLHF solves alignment? That’s an extremely interesting idea, but so far it looks like it Goodharts it instead. If you have ideas about how to fix that, by all means share them, but there is as yet no theoretical reason to think it isn’t Goodharting, while the frequent occurrence of jailbreaks on ChatGPT would seem to bear this out.
-
Maybe. The point of intelligence is that we don’t know what a smarter agent can do! There are certainly limits to the power of intelligence; even an infinitely powerful chess AI can’t beat you in one move, nor in two unless you set yourself up for Fool’s Mate. But we don’t want to make too many assumptions about what a smarter mind can come up with.
-
AI-powered robots without super intelligence are a separate question. An interesting one, but not a threat in the same way as superhuman AI is.
-
Ever seen an inner city? People are absolutely shooting each other for the lolz! It’s not everyone, but it’s not that rare either. And if the contention is that many people getting strong AI results in one of them destroying the world just for the hell of it, inner cities suggest very strongly that someone will.
-
Exercise and stimulants tend to heighten positive emotions. They don’t generally heighten negative ones, but that’s probably all to the good, right? Increased social interaction, both in terms of time and in terms of emotional closeness, tends to heighten both positive and negative emotions.
Strongly upvoted. This is a very good point.
150,000 people die every day. That’s not a small price for any delays to AGI development. Now, we need to do this right: AGI without alignment just kills everyone; it doesn’t solve anything. But the faster we get aligned AI, the better. And trying to slow down capabilities research without much thought into the endgame seems remarkably callous.
Eliezer has mentioned the idea of trying to invent a new paradigm for AI, outside of the conventional neural net/backpropagation model. The context was more “what would you do with unlimited time and money” than “what do you intend irl”, but this seems to be his ideal play. Now, I wish him the best of luck with the endeavor if he tries it, but do we have any evidence that another paradigm is possible?
Evolved minds use something remarkably close to the backprop model, and the only other model we’ve seen work is highly mechanistic AI like Deep Blue. The Deep Blue model doesn’t generalize well, nor is it capable of much creativity. A priori, it seems somewhat unlikely that any other AI paradigm exists: why would math just happen to permit it? And if we oppose capabilities research until we find something like a new AI model, there’s a good chance that we oppose it all the way to the singularity, rather than ever contributing to a Friendly system. That’s not an outcome anyone wants, and it seems to be the default outcome of incautious pessimism.
This is excellently written, and the sort of thing a lot of people will benefit from hearing. Well done Zvi.
Well said! Though it raises a question: how can we tell when such defenses are serving truth vs defending an error?
As for an easier word for “memetic immune system”, Lewis might well have called it Convention, as convention is when we disregard memes outside our normal mileu. Can’t say for Chesterton or Aquinas; I’m fairly familiar with Lewis, but much less so with the others apart from some of their memes like Chesterton’s Fence.
Good analogy, but I think it breaks down. The politician’s syllogism, and the resulting policies, are bad because they tend to make the world worse. I would say that Richard’s comment is an improvement, even if you think it might be a suboptimal one, and that pushing back against improvements tends to result in fewer improvements. Don’t let the perfect be the enemy of the good is a saying for very good reason.
The syllogism here is more like:
-
Something beneficial ought to be done
-
This is beneficial.
-
Therefore I probably ought not to oppose this, though if I see a better option I’ll do that instead of doubling down on this.
-
How functional can our community be without pushing back against people like Ziz? Richard’s comment seems to be a way of doing so, and thus potentially useful. It’s fine if you disagree with him, but while I agree the comment was flag-planting, some degree of flag-planting is likely necessary for a healthy discussion. Consider the way well kept gardens die by pacifism (can’t link on my phone, but if you’re not familiar with it there’s an excellent Yudkowsky post of that name that seems relevant). Zizianism is something worth planting a few flags to stop.
This is the sort of work we need to be doing to understand neural nets. Excellent job!
The wanting vs liking distinction seems relevant here. Politics can be truly fun, especially when you’re discussing it with someone who’s clearly presenting their views in good faith, and when you can both learn something from the interaction. However, it’s easy for the wanting to stay strong long after the liking has completely disappeared.
I wonder if that’s a common trait of most or all addictive things, or at least of “non-physical” addictions (things where you don’t suffer withdrawals, yet still may find yourself spending more time on them than you wish while not enjoying them much or at all). These days, Twitter is the classic example of an unfulfilling time sink. But Twitter really is great, at least when you’re starting out by learning news and seeing new ideas from your favorite thinkers. But the urge for “just another tweet” can persist for hours, while the fun of it, in my experience at least, lasts more like fifteen or twenty minutes.
It is-for a certain type of unstable person. Ziz would likely have come up with different crazy ideas without Less Wrong. Compare Deepak Chopra on quantum mechanics: he pushes all manner of “quantum” bullshit, yet you can hardly blame physics for this, and if physics weren’t known, Chopra would almost certainly just be pushing a different flavor of insanity.
More like “enjoy the dive!”
Combating bad regulation isn’t a solution, but a description of a property you’d want a solution to have.
Or more specifically, while you could perhaps lobby against particular destructive policies, this article is pushing for “helping [government actors] take good actions”, but given the track record of government actions, it would make far more sense to help them take no action. Pushing for political action without a plan to steer that action in a positive direction is much like pushing for AI capabilities without a plan for alignment… which we both agree is insanely dangerous.
The state is not aligned. That should be crystal clear from the medical and economic regulations that already exist. And bringing in a powerful Unfriendly agent into mankind’s efforts to create a Friendly one is more likely to backfire than to help.
How do you propose nudging regulation to be better without nudging for more regulation?
Regulation in most other areas has been counterproductive. In AI, it will likely be even more so: there’s at least some understanding of e.g. medicine by both the public and our rulers, but most people have no idea about the details of alignment.
This could easily backfire in countless ways. It could drive researchers out of the field, it could mandate “alignment” procedures that don’t actually help and get in the way of finding procedures that do, it could create requirements for AIs to say what is socially desirable instead of what is true (chatGPT is already notorious for this), making it harder to tell how the AI is functioning...It is socially desirable to call for regulation as a solution for almost any problem you care to name, but it is practically useful far more rarely. This is AI alignment. This is potentially the future of humanity at stake, and all human values. If we cannot speak the truth here, when will we ever speak it?
There are, of course, potentially reasonable counterarguments. Someone might believe that AI capabilities are more fragile than AI alignment, for instance, such that regulation would tend to slow capabilities without greatly hampering alignment, and the time bought gave us a better chance of a good outcome. Perhaps. But please consider, are you calling for regulation because it actually makes sense, or because it’s the Approved Answer to problems?
Please don’t make this worse.
There’s potentially an aspect of this dynamic that you’re missing. To think an opponent is making a mistake is not the same thing as them not being your opponent (as you yourself point out quite rightly, people with the same terminal goals can still come into conflict around differences in beliefs about the best instrumental ways to attain them), and to think someone is the enemy in a conflict is not the same thing as thinking that they aren’t making mistakes.
To the extent that Mistake/Conflict Theory is pointing at a real and useful dichotomy, it’s a difference in how deep the disagreement is believed to lie, rather than a binary between a world of purely good-faith allies who happen to be slightly confused and a world of pure evil monsters who do harm solely for harm’s sake. And that means that in an interaction between dissidents and quislings, you probably will get the dynamic that Zack is pointing out.
Dissidents are likely to view the quislings as being primarily motivated by trying to get personal benefits/avoid personal costs by siding with the regime, making the situation a matter of deliberate defection, aka Conflict Theory. Quislings are likely to view dissidents (or at least to claim to) as misguided (the Regime is great! How could anyone oppose it unless they were terminally confused?), aka Mistake Theory. However, this Mistake Theory perspective is perfectly compatible with hating dissidents and levying all manner of violence against them. You might be interested in watching some interviews with pro-war Russians about the “Special Military Operation”: a great many of them evince precisely this perspective, accusing Ukrainians of making insane mistakes and having no real interests opposed to Russia (i.e. they don’t view the war through Conflict Theory!), but if anything that makes them more willing to cheer on the killing of Ukrainians, not less. It’s not a universal perspective among Putin’s faithful, but it seems to be quite common.
The dynamic seems to be not so much that one side views the other with more charity (“oh, they’re just honestly mistaken; they’re still good people”) so much as that one side views the other with more condescension (“oh our enemies are stupid and ignorant as well as bad people”).
That’s a documentary about factory farming, yes? What people do to lower animals doesn’t necessarily reflect what they’ll do to their own species. Most people here want to exterminate mosquitoes to fight diseases like malaria. Most people here do not want to exterminate human beings.
Even if we assume that’s true (it seems reasonable, though less capable AIs might blunder on this point, whether by failing to understand the need to act nice, failing to understand how to act nice or believing themselves to be in a winning position before they actually are), what does an AI need to do to get in a winning position? And how easy is it to make those moves without them being seen as hostile?
An unfriendly AI can sit on its server saying “I love mankind and want to serve it” all day long, and unless we have solid neural net interpretability or some future equivalent, we might never know it’s lying. But not even superintelligence can take over the world just by saying “I love mankind”. It needs some kind of lever. Maybe it can flash its message of love at just the right frequency to hack human minds, or to invoke some sort of physical effect that let’s it move matter. But whether it can or not depends on facts about physics and psychology, and if that’s not an option, it doesn’t become an option just because it’s a superintelligence trying it.