cousin_it
Is that you in the photo? Why is your left hand a claw?
Thank you for writing this! I have a question though. The post says “many cases” and so on. Can we get some estimates on how many people are affected now, and is it growing or decreasing?
I’m not very confident, but will try to explain where the intuition comes from.
Basically I think the idea of “good” might be completely cultural. As in, if you extrapolate what an individual wants, that’s basically a world optimized for that individual’s selfishness; then there is what groups can agree on by rational negotiation, which is a kind of group selfishness, cutting out everyone who’s weak enough (so for example factory farming would be ok because animals can’t fight back); and on top of that there is the abstract idea of “good”, saying you shouldn’t hurt the weak at all. And that idea is not necessitated by rational negotiation. It’s just a cultural artifact that we ended up with, I’m not sure how.
So if you ask AI to optimize for what individuals want, and go through negotiations and such, there seems a high chance that the resulting world won’t contain “good” at all, only what I called group selfishness. Even if we start with individuals who strongly believe in the cultural idea of good, they can still get corrupted by power. The only way to get “good” is to point AI at the cultural idea to begin with.
You are of course right that culture also contains a lot of nasty stuff. The only way to get something good out of it is with a bunch of extrapolation, philosophy, and yeah I don’t know what else. It’s not reliable. But the starting materials for “good” are contained only there. Hope that makes sense.
Also to your other question: how to train philosophical ability? I think yeah, there isn’t any reliable reward signal, just as there wasn’t for us. The way our philosophical ability seems to work is by learning heuristics and ways of reasoning from fields where verification is possible (like math, or everyday common sense) and applying them to philosophy. And it’s very unreliable of course. So for AIs maybe this kind of carry-over to philosophy is also the best we can hope for.
It’s complicated.
First, I think there’s enough overlap between different reasoning skills that we should expect a smarter than human AI to be really good at most such skills, including philosophy. So this part is ok.
Second, I don’t think philosophical skill alone is enough to figure out the right morality. For example, let’s say you like apples but don’t like oranges. Then when choosing between philosophical theory X, which says apples are better than oranges, and theory Y which says the opposite, you’ll use the pre-theoretic intuition as a tiebreak. And I think when humans do moral philosophy, they often do exactly that: they fall back on pre-theoretic intuitions to check what’s palatable and what isn’t. It’s a tree with many choices, and even big questions like consequentialism vs deontology vs virtue ethics may ultimately depend on many such case by case intuitions, not just pure philosophical reasoning.
Third, I think morality is part of culture. It didn’t come from the nature of an individual person: kids are often cruel. It came from constraints that people put on each other, and cultural generalization of these constraints. “Don’t kill.” When someone gets powerful enough to ignore these constraints, the default outcome we should expect is amorality. “Power corrupts.” Though of course there can be exceptions.
Fourth—and this is the payoff—I think the only good outcome is if the first smarter than human AIs start out with “good” culture, derived from what human societies think is good. Not aligned to an individual human operator, and certainly not to money and power. Then AIs can take it from there and we’ll be ok. But I don’t know how to achieve that. It might require human organizational forms that are not money- or power-seeking. I wrote a question about it sometime ago, but didn’t get any answers.
All good points and I wanted to reply with some of them, so thanks. But there’s also another point where I might disagree more with LW folks (including you and Carl and maybe even Wei): I no longer believe that technological whoopsie is the main risk. I think we have enough geniuses working on the thing that technological whoopsie probably won’t happen. The main risk to me now is that AI gets pretty well aligned to money and power, and then money and power throws most humans by the wayside. I’ve mentioned it many times, the cleanest formulation is probably in this book review.
In that light, Redwood and others are just making better tools for money and power, to help align AI to their ends. Export controls are a tool of international conflict: if they happen, they happen as part of a package of measures which basically intensify the arms race. And even the CAIS letter is now looking to me like a bit of PR move, where Altman and others got to say they cared about risk and then went on increasing risk anyway. Not to mention the other things done by safety-conscious money, like starting OpenAI and Anthropic. You could say the biggest things that safety-conscious money achieved were basically enabling stuff that money and power wanted. So the endgame wouldn’t be some kind of war between humans and AI, it would be AI simply joining up with money and power, and cutting out everyone else.
I think a lot of rationalists accepted these Molochian offers (“build the Torment Nexus before others do it”, “invest in the Torment Nexus and spend the proceeds on Torment Nexus safety”) and the net result is simply that the Nexus is getting built earlier, with most safety work ending up as enabling capabilities or safetywashing. The rewards promised by Moloch have a way of receding into the future as the arms race expands, while the harms are already here and growing.
Wow. There’s a very “room where it happens” vibe about this post. Lots of consequential people mentioned, and showing up in the comments. And it’s making me feel like...
Like, there was this discussion club online, ok? Full of people who seemed to talk about interesting things. So I started posting there too, did a little bit of math, got invited to one or two events. There was a bit of money floating around too. But I always stayed a bit at arms length, was a bit less sharp than the central folks, less smart, less jumping on opportunities.
And now that folks from the same circle essentially ended up doing this huge consequential thing—the whole AI thing I mean, not just Anthropic—and many got rich in the process… the main feeling in my mind isn’t envy, but relief. That my being a bit dull, lazy and distant saved me from being part of something very ugly. This huge wheel of history crushing the human form, and I almost ended up pushing it along, but didn’t.
Or as Mike Monteiro put it:
Tech, which has always made progress in astounding leaps and bounds, is just speedrunning the cycle faster than any industry we’ve seen before. It’s gone from good vibes, to a real thing, to unicorns, to let’s build the Torment Nexus in record time. All in my lifetime...
I was lucky (by which I mean old) to enter this field when I felt, for my own peculiar reasons, that it was at its most interesting. And as it went through each phase, it got less and less interesting to me, to the point where I have little desire to interact too much with it now… In fact, when I think about all the folks I used to work on web shit with and what they’re currently doing, the majority are now woodworkers, ceramicists, knitters, painters, writers, etc. People who make things tend to move on when there’s nothing left to make. Nothing to make but the Torment Nexus.
Great writing. And yeah, the human-cat relationship is indeed one of the better ways that an AI-human relationship could turn out.
It’s not quite perfect though. We neuter cats. In the Middle Ages, people burned live cats for fun.
I remember you from the Pugs days. Two questions about this presentation. One is more aspirational: do you think of this society of AIs as more egalitarian (many superhuman AIs at roughly the same level) or more hierarchical (a range of AI sizes, with the largest hopefully being the most aligned to those below)? And the other is more practical. Right now the AI market is locked in an arms race kind of situation, and in particular, scrambling to make AIs that will bring commercial profit. That can lead to nasty incentives, e.g. an AI working for a tax software company can help it lobby the government to keep tax filing difficult, and of course much worse things can be imagined as well. If this continues, all the nice vision of kami and so on will just fail to exist. What is to be done, in your opinion?
Gonna be full of lies about living people, including billionaires, celebrities and politicians. Takedown in 3...2...1...
Hmm. To me it always felt more natural to “compare myself to the task rather than to my peers”, no matter what task and what level, even when I’m a complete beginner at something. It just makes more sense. The only reason to look at peers is to steal their tricks :-)
I don’t know if there are official definitions, but to me, the connotation of “aligned AI” is something that’s aligned to the user. For example, an aligned AI working for TurboTax will happily help it lobby the government to keep tax filing difficult. That’s the kind of AI that corporations are trying to build and sell now. While the connotation of “friendly AI” to me is more about being aligned to the concept of good in general, without privileging the user. It probably needs to be built by a non-market organization.
Hey, I think I share a lot of these emotions. Also left my corp job sometime ago. But the change that happened to me was a bit different: I think I just don’t like corporate programming anymore. (Or corporate design, writing and so on.) When I try to make stuff that isn’t meant for corporate in any shape or form, I find myself doing that happily. Without using AI of course.
Sorry, I wrote a response and deleted it. Let me try again.
I don’t know what exactly makes AI images so off-putting to me. The bare fact is that this image to me looks obviously AI-made and really unpleasant to see. I don’t know why some people react to AI images this way and others don’t.
My best guess is that AI images would begin to look more “cursed” to you if you spent some days or weeks drawing stuff with pencil and paper, maybe starting with some Betty Edwards exercises. But that’s just a guess, and maybe you’ve done that already.
My two cents. There’s a certain kind of posts on LW that to me feel almost painfully anti-rational. I don’t want to name names, but such posts often get highly upvoted. Said was one of very few people willing to vocally disagree with such posts. As such, he was a voice for a larger and less vocal set of people, including me. Essentially, from now on it will be harder to disagree with bullshit on LW—because the example is gone, and you know that if you disagree too hard, you might become another example. So I’m not happy to see him kicked out, at all.
DSL search isn’t accessible without login, and the site seems to disallow Google search as well. I patiently Ctrl+F’d through the very long Trump Shuts Down USAID thread, but didn’t find any good arguments why PEPFAR wasn’t good. If you know such arguments, maybe you can summarize?
I don’t think it’s due to evolution or material conditions. I think it’s cultural and goes back to the rise of Christianity. Pre-Christian stories, like the Greek myths, glorified the strong. Now we glorify the weak.
As an aside, it’s a bit of miracle that the pro-weak worldview became so strong and in many places won outright. It’s inherently strange and the result wasn’t a given at all. I’ll never stop recommending “The Girl in a Swing” by Richard Adams, which examines this conflict maybe better than any other, despite being a fiction book.
To me this philosophical counterargument still counts as “galaxy brain”. Doing vs allowing harm is not the issue. It’s much simpler: if one president does a good thing and another one cancels it, then we’re allowed to compare who’s better or worse on this aspect. The only possible way to defeat this is to argue that PEPFAR wasn’t a good thing. If you or datasecretslox folks want to argue that, go ahead!
I’m just wondering though, what’s meant by high quality arguments here? Sophistication? Unfortunately I’ve found that someone can make very erudite galaxy-brain arguments and still be wrong about almost everything, like Yarvin. And some simple-minded argument for the opposite side may be in fact right. So argument quality is not a superficial thing, you can’t tell it from the tone.
With regard to the arguments in the OP, I mean yeah, they sound pretty basic. But where they overlap with stuff I actually know, they seem right to me. The Laos thing for example I’ve known about for many years, and the post’s simple-minded condemnation of US actions there is simply right. While the galaxy-brained justifications of these actions (at least the ones I’ve read) are wrong.
Yeah. I think the main reason democracy exists at all is because people are necessary for skilled labor and for fighting wars. If that goes away, the result will be a world where money and power just discards most people. Why some people think “oh we’ll implement UBI if it comes to that”, I have no idea. When it “comes to that”, there won’t be any force powerful enough to implement UBI and interested in doing so. My cynical view is that the promise of distributing AI benefits in the future is a distraction: look over there, while we take all power.