cousin_it

Karma: 31,577

https://vladimirslepnev.me

cousin_it 4 Nov 2025 8:15 UTC
11 points
5
in reply to: CronoDAS’s comment on: Comparative advantage & AI
Yeah, getting outbid or otherwise deprived of resources we need to survive is one of the main concerns to me as well. It can happen completely legally and within market rules, and if you add AI-enhanced manipulation and lobbying to the mix, it’s almost assured to happen.

One thing I’ve been wondering about is, how fixed is the “human minimum wage” really? I mean, in the limit it’s the cost of running an upload, which could be really low. And even if we stay biological, I can imagine lots of technologies that would allow us to live more cheaply: food-producing nanotech, biotech that makes us smaller and so on.

The scary thing though is that when such technologies appear, that’ll create a pressure to use them. Everyone would have to choose between staying human or converting themselves to a bee in beehive #12345, living much cheaper but with a similar quality of life because the hive is internet-enabled.

cousin_it 4 Nov 2025 1:47 UTC
43 points
39
in reply to: Eliezer Yudkowsky’s comment on: The Tale of the Top-Tier Intellect
It seems you interpreted my comment as “the essay argues against something nobody believes anyway”. What I meant was more like “the essay keeps making its point in an angry and tedious way, over and over”.

cousin_it 3 Nov 2025 23:49 UTC
11 points
0
on: The Unreasonable Effectiveness of Fiction
My favorite example of fiction influencing reality (or maybe just predicting it really well, it’s hard to tell) is how Arthur Conan Doyle’s detective stories basically created forensic science from thin air. For example, the very first Sherlock Holmes story “A Study in Scarlet”, published in 1887, describes Holmes inventing a chemical test to distinguish dried bloodstains from dirt stains. Then exactly that test was invented in 1900. Another example is analysis of tiny differences between typewriters, which appeared in Holmes stories a few years before anyone did it in reality.

cousin_it 3 Nov 2025 23:12 UTC
28 points
29
on: The Tale of the Top-Tier Intellect
Reading this felt like watching someone kick a dead horse for 30 straight minutes, except at the 21st minute the guy forgets for a second that he needs to kick the horse, turns to the camera and makes a couple really good jokes. (The bit where they try and fail to change the topic reminded me of the “who reads this stuff” bit in HPMOR, one of the finest bits you ever wrote in my opinion.) Then the guy remembers himself, resumes kicking the horse and it continues in that manner until the end.

By which I’m trying to say, though not in a top-tier literary way maybe, that you’re a cool writer. A cool writer who has convinced himself that he has to be a horse-kicker, otherwise the world will end. And I do agree that the world will end! But… hmm how to put it… there is maybe a more optimal ratio of cool writing to horse-kicking, which HPMOR often achieved. Which made it more effective at saving the world, more fun to read, and maybe more fun to write as well.

Though I could be wrong about that. Maybe the cool bit in the middle wasn’t a release valve for you, but actually took more effort than laying out the arguments in the rest of the essay. In that case never mind.

cousin_it 3 Nov 2025 17:09 UTC
6 points
0
in reply to: johnswentworth’s comment on: Human Values ≠ Goodness

But, like, the memetic egregore “Goodness” clearly does not track that in a robust generalizable way, any more than people’s feelings of yumminess do.

I feel you’re overstating the “any more” part, or at least it doesn’t match my experience. My feelings of “goodness” often track what would be good for other people, while my feelings of “yumminess” mostly track what would be good for me. Though of course there are exceptions to both.

So why are you attached to the whole egregore, rather than wanting to jettison the bulk of the egregore and focus directly on getting people to not defect?

This can be understood two ways. 1) A moral argument: “We shouldn’t have so much extra stuff in the morality we’re blasting in everyone’s ears, it should focus more on the golden rule / unselfishness”. That’s fine, everyone can propose changes to morality, go for it. 2) “Everyone should stop listening to morality radio and follow their feels instead”. Ok, but if nobody listens to the radio, by what mechanism do you get other people to not defect? Plenty of people are happy to defect by feels, I feel I’ve proved that sufficiently. Do you use police? Money? The radio was pretty useful for that actually, so I’m not with you on this.

cousin_it 3 Nov 2025 12:19 UTC
10 points
0
on: Why Corrigibility is Hard and Important (i.e. “Whence the high MIRI confidence in alignment difficulty?”)

“Oh,” says the computer scientist. “Well, in that case — hm. Well, utility functions are invariant under scaling, so how about you scale the two utility functions U1 and U2 such that the AI expects it can get the same utility from each of them, so it doesn’t have an incentive one way or the other.”

That can work for a single moment, but not much longer. The AI’s options change over time. For instance, whenever it has a setback, its expected U1-utility drops, so then it would mash the shutdown button to get all that sweet, sweet shutdown utility.

“Ah,” says the computer scientist. “Well, in that case, how about if [some other clever idea]?”

Well, you see, that clever idea is isomorphic to the AI believing that it’s impossible for the button to ever be pressed, which incentivizes it to terrify the user whenever it gets a setback, so as to correlate setbacks with button-presses, which (relative to its injured belief system) causes it to think the setbacks can’t happen.

And so on.

Lessons from the Trenches

We ran some workshops, and the workshops had various mathematicians of various stripes (including an International Mathematical Olympiad gold medalist), but nobody came up with a really good idea.

This passage sniped me a bit. I thought about it for a few seconds and found what felt like a good idea. A few minutes more and I couldn’t find any faults, so I wrote a quick post. Then Abram saw it and suggested that I should look back and compare it with Stuart’s old corrigibility papers.

And indeed: it turned out my idea was very similar to Stuart’s “utility indifference” idea plus a known tweak to avoid the “managing the news” problem. To me it fully solves the narrow problem of how to swap between U1 and U2 at arbitrary moments, without giving the AI incentive to control the swap button at any moment. And since Nate was also part of the discussion back then, it makes me wonder a bit why the book describes this as an open problem (or at least implies that).

For completeness sake, here’s a simple rephrasing of the idea, copy-pasted from my post yesterday which I ended up removing because it wasn’t new work:

Imagine two people, Alice and Bob, wandering around London. Bob’s goal is to get to the Tower Bridge. When he gets there, he’ll get a reward of £1 per minute of time remaining until midnight, so he’s incentivized to go fast. He’s also carrying a radio receiver.

Alice is also walking around, doing some chores of her own which we don’t need to be concerned with. She is carrying a radio transmitter with a button. If/when the button is pressed (maybe because Alice presses it, or Bob takes it from her and presses it, or she randomly bumps into something), Bob gets notified that his goal changes: there’ll be no more reward for getting to Tower Bridge, he needs to get to St Paul’s Cathedral instead. His reward formula also changes: the device notes Bob’s location at the time the button is pressed, calculates the expected travel times to Tower Bridge and to St Paul’s from that location, and adds or subtracts a payment so that the expected reward stays the same. For example, if Bob is 20 minutes away from the bridge and 30 minutes away from the cathedral when the button is pressed, the reward will be increased by £10 to compensate for the 10 minutes of delay.

I think this can serve as a toy model of corrigibility, with Alice as the “operator” and Bob as the “AI”. It’s clear enough that Bob has no incentive to manipulate the button at any point, but actually Bob’s indifference goes even further than that. For example, let’s say Bob can sacrifice just a minute of travel time to choose an alternate route, one which will take him close to both Tower Bridge and St Paul’s, to prepare for both eventualities in case Alice decides to press the button. Will he do so? No. He won’t spare even one second. He’ll take the absolute fastest way to Tower Bridge, secure in the knowledge that if the button gets pressed while he’s on the move, the reward will get adjusted and he won’t lose anything.

We can also make the setup more complicated and the general approach will still work. For example, let’s say traffic conditions change unpredictably during the day, slowing Bob down or speeding him up. Then all we need to say is that the button does the calculation at the time it’s pressed, taking into account the traffic conditions and projections at the time of button press.

Are we unrealistically relying on the button having magical calculation abilities? Not necessarily. Formally speaking, we don’t need the button to do any calculation at all. Instead, we can write out Bob’s utility function as a big complicated case statement which is fixed from the start: “if the button gets pressed at time T when I’m at position P, then my reward will be calculated as...” and so on. Or maybe this calculation is done after the fact, by the actuary who pays out Bob’s reward, knowing everything that happened. The formal details are pretty flexible.

cousin_it 3 Nov 2025 11:55 UTC
2 points
−2
in reply to: orthonormal’s comment on: Please Do Not Sell B30A Chips to China
If only one entity is building AI, that reduces the risk from race dynamics, but increases the risk that the entity will become world dictator. I think the former reduction in risk is smaller than the second risk. So to me first best is nobody has AI, second best is everyone has it, and the worst option is if one group monopolizes it.

cousin_it 3 Nov 2025 11:41 UTC
4 points
0
in reply to: johnswentworth’s comment on: Human Values ≠ Goodness
But why do you think that people’s feelings of “yumminess” track the reality of whether an action is cooperate/cooperate? I’ve explained that it hasn’t been true throughout most of history: people have been able to feel “yummy” about very defecting actions. Maybe today the two coincide unusually well, but then that demands an explanation.

I think it’s just not true. There are too many ways to defect and end up better off, and people are too good at rationalizing why it’s ok for them specifically to take one of those ways. That’s why we need an evolving mechanism of social indoctrination, “goodness”, to make people choose the cooperative action even when it doesn’t feel “yummy” to them in the moment.

cousin_it 3 Nov 2025 3:39 UTC
12 points
6
in reply to: johnswentworth’s comment on: Human Values ≠ Goodness

Most people do not actually like screwing over other people

I think this is very culturally dependent. For example, wars of conquest were considered glorious in most places and times, and that’s pretty much the ultimate form of screwing over other people. Or for another example, the first orphanages were built by early Christians, before that the orphans were usually disposed of. Or recall how common slavery and serfdom have been throughout history.

Basically my view is that human nature without indoctrination into “goodness” is quite nasty by default. Empathy is indeed a feeling we have, and we can feel it deeply (...sometimes). But we ended up with this feeling mainly due to indoctrination into “goodness” over generations. We wouldn’t have nearly as much empathy if that indoctrination hadn’t happened, and it probably wouldn’t stay long term if that indoctrination went away.

cousin_it 3 Nov 2025 2:25 UTC
16 points
10
on: Why I Transitioned: A Case Study
Can’t comment much on the trans stuff, but the main thing I wanna say that if you were lonely in high school, it wasn’t your fault. Don’t blame yourself for it. Society should do a much better job at making schools more accepting, or sorting kids to schools where they’ll be accepted, or at minimum just not forcing them to be there all the time. School does serve a purpose, but it’s still a miserable place for too many of the children confined in it, and that should be fixed.

In any case it’s great that you didn’t get get hung up on “improving social skills” somewhere that didn’t accept you, and instead found a group that accepted you. This is the only real way, I think. Next I’d encourage you to find more such groups and live a fun life between them, unless of course you’re doing that already :-)

cousin_it 3 Nov 2025 0:06 UTC
14 points
0
in reply to: johnswentworth’s comment on: Human Values ≠ Goodness
Hmm. In all your examples, Albert goes against “goodness” and ends up with less “yumminess” as a result. But my point was about a different kind of situation: some hypothetical Albert goes against “goodness” and actually ends up with more “yumminess”, but someone else ends up with less. What do you think about such situations?

cousin_it 2 Nov 2025 22:57 UTC
18 points
4
on: Human Values ≠ Goodness
I agree that the distinction is important. However, my view is that a lot of what you call “goodness” is part of society’s mechanism to ensure cooperate/cooperate. It helps other people get yummy stuff, not just you.

You can of course free yourself from that mechanism, and explicitly strategize how to get the most “yumminess” for yourself without ending up broke/addicted/imprisoned/etc. If the rest of society still follows “goodness”, that leads to defect/cooperate, and indeed you end up better off. But there’s a flaw in this plan.

cousin_it 2 Nov 2025 22:31 UTC
2 points
0
in reply to: abramdemski’s comment on: A toy model of corrigibility
Thanks for the suggestion! I went over the corrigibility paper again. The “utility indifference” proposal in the paper is similar to mine. Then in section 4.2 it says that the proposal is vulnerable to a “managing the news” problem, and that spooked me into deleting my post for awhile.

Then I thought some more and restored the post again, because I no longer see why Bob would want to “manage the news”, e.g. ask Carol to bump into Alice and press the button if there’s a jam on Abbey Road and so on. My setup doesn’t seem to incentivize such things.

And then I read some more past discussions, and found that the “managing the news” problem was already solved back then in the same simple way, so my post is nothing new. Again back to drafts.

cousin_it 1 Nov 2025 12:53 UTC
12 points
9
on: Vaccination against ASI
Hmm. Maybe not inoculation exactly, but the trope of creating an external enemy to achieve unity at home seems pretty popular (e.g. Watchmen, HPMOR) and it’s usually done by villains, so that doesn’t fill me with confidence.

cousin_it 30 Oct 2025 12:53 UTC
2 points
0
in reply to: ChristianKl’s comment on: EU explained in 10 minutes
I don’t think that violates free trade. Trump may think so, but that’s on him.

Putting a tariff on foreign cars certainly violates free trade, because it discriminates between domestic and foreign sellers. But requiring e.g. catalytic converters on all cars sold in your country, domestic and foreign alike, is okay. Banning leaded gasoline in your country is likewise okay, as long as you don’t discriminate on the origin of that gasoline. Countries should be allowed to pass laws like that.

ETA: looking at actual history, it seems different European countries banned leaded gasoline at different times, and the EU was already well established by then. Which seems to confirm my point.

cousin_it 30 Oct 2025 12:38 UTC
11 points
−3
on: Please Do Not Sell B30A Chips to China
I don’t agree with this. In my mind there’s a pretty clear line between good and evil in AI-related matters, it goes something like this:
1. If you don’t want anyone to have AI, you’re probably on the side of good.
2. If you want everyone equally to have AI, you may be also on the side of good. Though there’s a factual question how well this will work out.
3. But if you think that you and your band of good guys should have AI, but they and their band of bad guys shouldn’t—or at least, your band should get world domination first, because you’re good—then in my mind this crosses the line. It’s where bad things happen. And I don’t really make an exception if the “good guys” are MIRI, or OpenAI, or the US, or whichever group.

cousin_it 29 Oct 2025 12:56 UTC
4 points
0
in reply to: ChristianKl’s comment on: EU explained in 10 minutes
Isn’t the obvious solution to allow only early-screened eggs to be sold in Germany, no matter where they came from? And similar for other kinds of goods that can be made in unethical or polluting ways: require both domestic producers and importers to prove that the goods were produced ethically/cleanly/etc. And this doesn’t require a shared policy between many countries, each country can impose such rules on its own.

cousin_it 28 Oct 2025 9:27 UTC
11 points
7
on: Would concentration of power be bad, given offense-dominant weapons?
Hi Felix! I’ve been thinking about the same topics for awhile, and came to pretty much the opposite conclusions.

most humans, who do have some nonzero preference for being altruistic along with their other goals

No nononono. So many people making this argument and it’s so wrong to me.

The thing is: altruistic urges aren’t the only “nonzero urges” that people have. People also have an urge to power, an urge to lord it over others. And for a lot of people it’s much stronger than the altruistic urge. So a world where most people are at the whim of “nonzero urges” of a handful of superpowerful people will be a world of power abuse, with maybe a little altruism here and there. And if you think people will have exit rights from the whims of the powerful, unfortunately history shows that it won’t necessarily be so.

advanced AI can plausibly allow you to make cheap, ultra-destructive weapons… until we hit a point where a few people are empowered to destroy the world at the expense of everyone else

I think we’ll never be at a point where a handful of people can defeat the strongest entities. Bioweapons are slow; drone swarms can be stopped by other drone swarms. I can’t imagine any weapon at all that would allow a terrorist cell to defeat an army of equal tech level. Well, maybe if you have a nanotech-ASI in a test tube, but we’re dead before then.

It is however possible that a handful of people can harm the strongest entities. And that state of affairs is desirable. When the powerful could exploit the masses with impunity in the past, they did so. But when firearms got invented, and a peasant could learn to shoot a knight dead, the masses became politically relevant. That’s basically why we have democracy now: the political power of the masses comes from their threat-value. (Not economic value! The masses were always economically valuable to the powerful. Without threat-value, that just leads to exploitation. You can be mining for diamonds and still be a slave.) So the only way the masses can avoid a world of total subjugation to the powerful in the future is by keeping threat-value. And for that, cheap offense-dominant weapons are a good thing.

Even though the U.S has unbelievable conventional military superiority to North Korea, for instance, the fact that they have nuclear weapons means that we cannot arbitrarily impose our preferences about how North Korea should act onto them… Roughly speaking, you can swap out “U.S” and “North Korea” with “Optimizers” and “Altruists”.

Making an analogy with altruism here is strange. North Korea is a horrifying oppressive regime. The fact that they can use the nuke threat to protect themselves, and their citizens have no analogous “gun” to hold to the head of their own government, is a perfect example of the power abuse that I described above. A world with big actors holding all threat-power will be a world of NKs.

But I don’t believe that inequality is intrinsically problematic from a welfare perspective: it’s far more important that the people at the bottom meet the absolute threshold for comfort than it is for a society’s Gini coefficient to be lower.

There’s a standard response to this argument: namely, inequality of money always tries to convert itself into inequality of power, through lobbying and media ownership and the like. Those at the bottom may have comfort, but that comfort will be short lived if they don’t have the power to ensure it. The “Gini coefficient of power” is the most important variable.

So yeah, to me these all converge on a pretty clear answer to your question. Concentration of power, specifically of threat-power, offense-power, would be very bad. Spreading it out would be good. That’s how the world looks to me.

cousin_it 22 Oct 2025 22:49 UTC
7 points
1
on: Which side of the AI safety community are you in?
I agree this distinction is very important, thank you for highlighting it. I’m in camp B and just signed the statement.

cousin_it 22 Oct 2025 10:24 UTC
5 points
4
in reply to: Richard_Ngo’s comment on: 21st Century Civilization curriculum
It seems to me that such “unhealthiness” is pretty normal for labor and property markets: when I read books from different countries and time periods, the fear of losing one’s job and home is a very common theme. Things were easier in some times and places, but these were rare.

So it might make more sense to focus on reasons for “unhealthiness” that apply generally. Overregulation can be the culprit in today’s US, but I don’t see it applying equally to India in the 1980s, Turkey in the 1920s, or England in the early 1800s (these are the settings of some books on my shelf whose protagonists had very precarious jobs and housing). And even if you defeat overregulation, the more general underlying reasons might still remain.

What are these general reasons? In the previous comment I said “exploitation”, but a more neutral way of putting it is that markets don’t always protect one particular side. Markets are two-sided: there’s no law of economics saying a healthy labor market must be a seller’s market, while housing must be a buyer’s market. Things could just as easily go the other way. So if we want to make the masses less threatened, it’s not enough to make markets more healthy overall; we need to empower the masses’ side of the market in particular.