quetzal_rainbow

Karma: 2,623

quetzal_rainbow 9 Sep 2025 6:17 UTC
2 points
0
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
Absolute sense comes from absolute nature of taking actions, not absolute nature of logical correlation. I.e., in Prisoner’s Dilemma with payoffs (5,5)(10,1)(2,2) you should defect if your counterparty is capable to act conditional on your action in less than 75% of cases, which is quite high logical correlation, but expected value is higher if you defect.

quetzal_rainbow 2 Sep 2025 19:32 UTC
3 points
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
There is a difference between adoption as in “people are using it” and adoption as in “people are using it in economically productive way”. I think supermajority of productivity from LLMs is realized as pure consumer surplus right now.

quetzal_rainbow 2 Sep 2025 10:20 UTC
4 points
2
in reply to: Aram Ebtekar’s comment on: Help me understand: how do multiverse acausal trades work?
We can send space ship beyond event horizon and still care about what is going to happen on it after it crosses event horizon, despite this being utterly irrelevant to our genetic fitness in causal sense. If we are capable to develop such preferences, I don’t see any strong reason to develop strongly-monoverse decision theory.

Multiversal acausal trading is just logical consequence of LDT and I expect majority of powerful agents to have LDT-style decision theory, not LDT-but-without-multiverse decision theory.

quetzal_rainbow 2 Sep 2025 10:10 UTC
4 points
0
in reply to: JBlack’s comment on: Help me understand: how do multiverse acausal trades work?
This is really weird line of reasoning, because “multiversal trading” doesn’t mean “trading with entire multiverse”, it means “finding suitable trading partner in multiverse”.
First of all, there is a very-broad-but-well-defined class of agents which humans belong to. It’s class of agents with indexical preferences. It’s likely that indexical preferences are relatively weird in multiverse, but they are simple enough to be considered in any sufficiently broad list of preferences, as certain sort of curiosity for multiversal decision theorists.
For what we know, out universe is going to end one way or another (heat death, cyclic collapse, Big Rip or something else). Because we have indexical preferences, we would like to escape universe in subjective continuity. Because, ceteris paribus, we can be provided with very small shares of reality to have subjective continuity, it creates large gains from trade for any non-indexical-caring entities.
(And if our universe is not going to end, it means that we have effectively infinite compute, therefore, we actually can perform a lot of acausal trading.)
Next, there are large restrictions on search space. As you said, we both should be able to consider each other. I think, say, considering physics in which analogs of quantum computers can solve NP-problems in polynomial time is quite feasible—we have rich theory of approximation and we are going to discover even more of it.
Another restriction is around preferences. If their preferences is something we can do, like molecular squiggles, then we should restrict ourselves to something sufficiently similar to our physics.
We can go further and restrict preferences to sufficiently concave, such that we consider broad class of agents, each of which may have some very specific hard to specify peak of utility function (like very precise molecular squiggles), but have common broad basin of good enough states (they would like to have precise molecular squiggles, but they would consider it sufficient payment if we just produce a lot of granite spheres).
Given all these restrictions, I don’t find it plausible to believe that future human-aligned superintelligences with galaxies of computronium won’t find any way to execute trade, given the incentives.

quetzal_rainbow 2 Sep 2025 9:07 UTC
2 points
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
I don’t think it’s reasonable to expect such evidence to appear after such short period of time. There were no hard evidence that electricity is useful in a sense you are talking about until 1920s. Current LLMs are clearly not AGIs in a sense that they can integrate into economy as migrant labor, therefore, productivity gains from LLMs are bottlenecked on users.

quetzal_rainbow 1 Sep 2025 9:53 UTC
8 points
3
on: Help me understand: how do multiverse acausal trades work?
Problem 1 is wrong objection.
CDT agents are not capable to cooperate in Prisoner’s dilemma, therefore, they are selected out. EDT agents are not capable to refuse to pay in XOR blackmail (or, symmetrically, pay in Parfit’s hitchhiker), therefore, they are selected out.
I think you will be interested in this paper.

quetzal_rainbow 6 Aug 2025 14:12 UTC
9 points
2
in reply to: Daniel Kokotajlo’s comment on: The Problem
Yudkowsky wrote in letter for Time Magazine:

To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow.

And, if anything, That Alien Message was even earlier.

quetzal_rainbow 30 Jul 2025 6:32 UTC
31 points
21
on: My Empathy Is Rarely Kind
I think proper guide for alignment researcher here is to:
1. Understand other people as made-of-gears cognitive engines, i.e., instead of “they don’t bother to apply effort for some reason” “they don’t bother to apply effort because they learned in the course of their life that extra effort is not rewarded”, or something like that. You don’t even need to build comprehensive model, you just can list more than two hypotheses about possible gears and not assume “no gears, just howling abyss”.
2. Realize that it would require supernatual intervention for them to have your priorities and approaches.

quetzal_rainbow 21 Jul 2025 17:44 UTC
3 points
−2
in reply to: Thane Ruthenis’s comment on: Thane Ruthenis’s Shortform
Systematically avoiding all situations where you’re risking someone’s life in exchange for a low-importance experience would assemble into a high-importance life-ruining experience for you (starving to death in your apartment, I guess?).
We can easily ban speed above 15km/h for any vehicles except ambulances. Nobody starves to death in this scenario, it’s just very inconvenient. We value convenience lost in this scenario more than lives lost in our reality, so we don’t ban high-speed vehicles.
Ordinal preferences are bad and insane and they are to be avoided.
What’s really wrong with utilitarianism is that you can’t, actually, sum utilities: it’s a type error, because utilities are invariant up to affine transform, what would their sum mean?
The problem, I think, that humans naturally conflate two types of altruism. First type is caring about other entities mental state. Second type is “game-theoretic” or “alignment-theoretic” altruism: generalized notion of what does that mean to care about someone else’s values. Roughly, I think that good type of the second type of altruism requires you to do fair bargaining in interests of entity you are being altruistic towards.
Let’s take “World Z” thought experiment. The problem from the second type altruism perspective is that total utilitarian gets very large utility from this world, while all inhabitants of this world, by premise, get very small utility per person, which is unfair division of gains.
One may object: why not create entities who think that very small share of gains is fair? My answer is that if entity can be satisfied with infinitesimal share of gains, it also can be satisfied with infinitesimal share of anthropic measure, i.e., non-existence, and it’s more altruistic to look for more demanding entities to fill universe with.
My general problem with animal welfare from bargaining perspective is that most of animals probably don’t have sufficient agency to have any sort of representative in bargaining. We can imagine CEV of shrimp which is negative utilitarian and wants to kill all shrimp, or positive utilitarian which thinks that even very painful existence is worth it, or CEV that prefers shrimp swimming in heroin, or something human-like, or something totally alien, and sum of these guesses probably sums up to “do not torture and otherwise do as you please”.

quetzal_rainbow 21 Jul 2025 8:23 UTC
36 points
6
on: Just Make a New Rule!
Nobody likes rules. Rules are costly: first, they constrict the space of available actions or force you to expend resources to do something. Second, rules are costly to follow: you need to pay attention and remember all relevant rules and calculate all ways they interact. Third, in real life, rules aren’t simple! After you left area of “don’t kill”, every rule has ambiguities and grey areas and strict dependency on judgement of enforcing authority.
If everybody was good and smart, we wouldn’t need rules. We would just publish “hey, lead is toxic, don’t put it into dishes” and everybody would just stop using lead. After that, even if somebody continued using lead, everybody would just ask and conduct analysis and stop buying lead-tainted commodities and everybody still using lead would go bankrupt.
Conversely, if everybody was good and smart, we wouldn’t need authorities! Everybody would just do what’s best.
You don’t need to be utility minimizer to do the damage through rules. You need just to be the sort of person who likes to argue over rules to paralyze functioning of almost every group. Like, 95% of invocations of authority outside of legal sphere can be described as “I decided to stop this argument about rules, so I stop it”. Heck, even Supreme Court functions mostly in this way.
There are different societies. In broad society, whose goal is just “live and let live”, sure, you can go for simple univerally-enforceable rules. In inclusive parts of society, like public libraries and parks and streets—same. it doesn’t work for everything else. Like, there can’t be comprehensive rules why you can(’t) be fired from startup. There are CEOs and HRs and they make judgements about how productive you are et cetera and if their judgement is unfavorable, you get fired. Sure, there is a labor law, but your expences (including reputational) on trying to stay are probably going to be much higher than whatever you can hope to get. There are some countries where it’s very hard to be fired, but such countries also don’t have rich startup culture.

quetzal_rainbow 13 Jul 2025 4:04 UTC
2 points
0
in reply to: Cole Wyeth’s comment on: Thane Ruthenis’s Shortform
The concept of weird machine is the closest to be useful here and an important quetion here is “how to check that our system doesn’t form any weird machine here”.

quetzal_rainbow 11 Jul 2025 4:50 UTC
2 points
−2
on: The Rising Premium of Life, Or: How We Learned to Start Worrying and Fear Everything
I think the large part of this phenomenon is social status. I.e., if you die early, it means that you did something really embarassingly stupid. Conversely, if you caused someone to die by, say, faulty construction or insufficient medical intervention, you should be really embarassed. If you can’t prove/reliably signal that you behaving reasonably, you are incentivized to behave unreasonaboy safe to signal your commitment to not do stupid things. It’s also probably linked to trade-off between social status and desire for reproduction. It also explains why people who are worried about endless list of harms are not that worried about human extinction: if everybody is dead, there is nobody to be embarassed around.
Extreme sports plateauing is likely weak indicator. Even as risks decrease, you still need to enjoy it and most of people are not adrenaline junkies.

quetzal_rainbow 9 Jul 2025 14:08 UTC
2 points
0
on: quetzal_rainbow’s Shortform
My largest share of probability of survival on business-as-usual AGI (i.e., no major changes in technology compared to LLM, no pause, no sudden miracles in theoretical alignment and no sudden AI winters) belongs to scenario where brain concept representations, efficiently learnable representations and learnable by current ML models representations secretly have very large overlap, such that even if LLMs develop “alien thought patterns” it happens as addition to the rest of their reasoning machinery, not as primary part, which results in human values not only being easily learnable, but also them being “atomic” in a sense that development of complex concepts happens through combination of atomic concepts instead of rewriting them from scratch.
This world is meaningfully different from the world where AIs are secretly easy-to-steer, because easy-to-steer AGIs run into standard issues with wish-phrasing. I don’t believe that we are going to actually become this much better in wish-phrasing on BAU path and I don’t think we are going to leverage easy-to-steer early AGIs to get good ending (again, keeping BAU assumption). If AGIs are too easy to steer, they are probabily going to be pushed around by reward signals until they start to optimize for something simple and clearly useful, like survival or profit.

quetzal_rainbow 4 Jul 2025 18:51 UTC
2 points
0
in reply to: Shankar Sivarajan’s comment on: A case for courage, when speaking of AI danger
Criminal negligence leading to catastrophic consequences is already ostracized and persecuted, because, well, it’s a crime.

quetzal_rainbow 3 Jul 2025 8:12 UTC
2 points
0
in reply to: skunnavakkam’s comment on: skunnavakkam’s Shortform
There is one: https://www.lesswrong.com/posts/fwSnz5oNnq8HxQjTL/arbital-has-been-imported-to-lesswrong

quetzal_rainbow 1 Jul 2025 5:31 UTC
10 points
4
in reply to: Kaj_Sotala’s comment on: Project Vend: Can Claude run a small shop?
There were cases when LLMs were “lazier” on common vacations periods. EDIT: see here, for example

quetzal_rainbow 30 Jun 2025 4:37 UTC
2 points
0
in reply to: Nina Panickssery’s comment on: Nina Panickssery’s Shortform
I feel like this position is… flimsy? Unsubstantial? It’s not like I disagree, I don’t understand why you would want to articulate it in this way.
On the one hand, I don’t think biological/non-biological distinction is very meaningful from transhumanist perspective. Is embryo, genetically modified to have +9000IQ, going to be meaningfully considered “transhuman” instead of “posthuman”? Are you going to still be you after one billion years of life extension? “Keeping relevant features of you/humanity after enormous biological changes” seems to be qualitatively the same to “keeping relevant features of you/humanity after mind uploading”—i.e., if you know at gears-level what features of biological brains are essential to keep, you have rough understanding what you should work on in uploading.
On the other hand, I totally agree that if you don’t feel adventurous and you don’t want to save the world at price of your personality death, it would be a bad idea to undergo uploading in a way that closest-to-modern technology can provide. It just means that you need to wait for more technological progress. If we are in the ballpark of radical life extension, I don’t see any reason to not wait 50 years to perfect upload tech and I don’t see any reason why 50 years are not going to be enough, conditional on at least normally expected technical progress.
The same with AIs. If we have children, who are meaningfully different from us, and who can become even more different in glorious transhumanist future, I don’t see reasons to not have AI children, conditional on their designs preserving all important relevant features we want to see in our children. The problem is that we are not on track to create such designs, not conceptual existence of such designs.
And all said seems to be simply deducible/anticipated from concept of transhumanism, i.e., concept that the good future is the one filled with beings capable to meaningfully say that they were Homo Sapiens and stopped being Homo Sapiens at some point of their life. When you say “I want radical life extension” you immediately run into question “wait, am I going to be me after one billion years of life extension?” and you start The Way through all the questions about self-identity, essense of humanity, succession, et cetera.

quetzal_rainbow 29 Jun 2025 16:28 UTC
17 points
5
on: quetzal_rainbow’s Shortform
Continuing my rant:

quetzal_rainbow 28 Jun 2025 6:32 UTC
10 points
6
in reply to: Isopropylpod’s comment on: Isopropylpod’s Shortform
It is indeed surprising, because it indicates much more sanity I would otherwise expected.

Terrorism is not effective. The only ultimate result of 9/11 from perspective of bin Laden goals was “Al Qaeda got wiped out of the face of Earth and rival groups have replaced it”. The only result of firebombing datacenter would be “every single personality in AI safety gets branded terrorist, destroying literally any chance to influence relevant policy”.

quetzal_rainbow 26 Jun 2025 21:01 UTC
6 points
2
in reply to: TsviBT’s comment on: TsviBT’s Shortform
I think more correct picture is that it’s useful to have programmable behavior and then programmable system suddenly becomes Turing-complete weird machine and some of resulting programs are terminal-goal-oriented, which are favored by selection pressures: terminal goals are self-preserving.

Humans in native enviornment have programmable behavior in form of social regulation, information exchange and communicating instructions, if you add sufficient amount of computational power in this system you can get very wide spectrum of behaviors.

I think it’s general picture of inner misalignment.