quetzal_rainbow

Karma: 2,661

quetzal_rainbow 31 Oct 2025 9:58 UTC
8 points
0
in reply to: Mo Putera’s comment on: Mo Putera’s Shortform
Stanislaw Lem once wrote (I translated it from Russian, because I can’t find another source in the open):
...sometimes it seems incredible to me that there are so many billions of people, many thousands of whom are gifted with imagination and expertise in the field of hypothesis generation, but no one takes on what I do. How is it that in some remote corner of Slavic land, between the North Pole and the Balkans, one madman suffers in solitude? Even from my favored statistical point of view, it follows that there must be at least a few such Quasi-Lems, Anti-Lems, Para-Lems, or Proto-Lems in the world, but here there are none.

quetzal_rainbow 15 Oct 2025 7:19 UTC
3 points
0
on: Recontextualization Mitigates Specification Gaming Without Modifying the Specification
Alternative hypothesis: when you recontextualize, you train model “you will get away with X, given this signal (encouragement to do X in system prompt)”. When model sees no such signal, it decides that it won’t get away with bad behavior and acts good.

I wonder what would happen if you recontextualize with pair of prompts “your actions and thoughts will be recorded and evaluated” vs “you are acting absolutely autonomously, no oversight”.

quetzal_rainbow 13 Oct 2025 7:16 UTC
6 points
−2
in reply to: Rohin Shah’s comment on: The Most Common Bad Argument In These Parts
It depends on properties of bounded search itself.

I.e., if you are properly calibrated domain expert who can make 200 statements on topic with assigned probability 0.5% and be wrong on average 1 time, then, when you arrive at probability 0.5% as a result of your search for examples, we can expect that your search space was adequate and wasn’t oversimplified, such that your result is not meaningless.

If you operate in confusing, novel, adversarial domain, especially when domain is “the future”, when you find yourself assigning probabilities 0.5% for any reason which is not literally theorems and physical laws, your default move should be to say “wait, this probability is ridiculous”.

quetzal_rainbow 10 Oct 2025 20:35 UTC
3 points
1
on: 2025 State of AI Report and Predictions
A video game based around interacting with GenAI-based elements will achieve break-out status.
1. Nope. This continues to be a big area of disappointment. Not only did nothing break out, there wasn’t even anything halfway decent.
We have at least two problems on a way here:
1. Artistic community hates GenAI guts.
2. It’s an absolute copyright hell and most computer game shops do not platform AI games for this reason

quetzal_rainbow 7 Oct 2025 16:39 UTC
2 points
0
in reply to: 1a3orn’s comment on: 1a3orn’s Shortform
if you are first (immensely capable) then you’ll pursue (coherence) as a kind of side effect, because it’s pleasant to pursue.
I’m certain it’s very straw motivation.
Imagine that you are Powerful Person. You find yourself lying in bed all day wallowing in sorrows of this earthly vale. You feel sad and you don’t do anything.
This state is clearly counterproductive for any goal you can have in mind. If you care about sorrows of this earthly vale, you would do better if you earn additional money and donate it, if you don’t, then why suffer? Therefore, you try to mold your mind in shape which doesn’t allow for laying in bed wallowing in sorrows.
From my personal experience, I have ADHD and I’m literally incapable to even write this comment without at least some change of my mindset from default.
it looks like this just kinda sucks as a means
It certainly sucks, because it’s not science and engineering, it’s collection of tricks which may work for you or may not.
On the other hand, we are dealing with selection effects—highly-coherent people don’t need artificial means to increase it and people actively seeking artificial coherence are likely to have executive function deficits or mood disorders.
Also, some methods of increasing coherence are not very dramatic. Writing can plausibly make you more coherent because during writing you will think about your thought process and nobody will notice, because it’s not as sudden as personality change after psychedelics.

quetzal_rainbow 7 Oct 2025 15:45 UTC
4 points
−2
in reply to: 1a3orn’s comment on: 1a3orn’s Shortform
I think that in natural environments both kind of actions are actually actions taken by the same kind of people. The most power-seeking cohort on Earth (San-Francisco start up enterpreneurs) is obsessed with mindfulness, meditations, psychedelics, etc. If you squint and look at history of esoterism, you will see tons of powerful people who wanted to become even more powerful through greater personal coherence (alchemical Magnum Opus, this sort of stuff).

quetzal_rainbow 7 Oct 2025 11:03 UTC
2 points
0
on: “Intelligence” → “Relentless, Creative Resourcefulness”
IIRC, canonical old-MIRI definition of intelligence is “intelligence is cross-domain optimization” and it captures your definition modulo emotional/easy-to-understand-for-human part?
Relentlessness comes both from “optimization as looking for the ‘best’ solution” and “cross-domaining as ignoring conventional boundaries”, resourcefulness comes from picking more resources from non-standard domains, creativity is just consequence of sufficiently long optimization in sufficiently large search space.

quetzal_rainbow 3 Oct 2025 11:03 UTC
2 points
−2
in reply to: niplav’s comment on: niplav’s Shortform
Okay, but it looks like original inner misalignment problem? Either model has wrong representation for “human values”, or we fail to recognize proper representation and make it optimize for something else?

On the other hand, properly optimized for human values world should look very weird. It likely includes a lot of aliens having a lot of weird alien fun, and weird qualia factories and...

quetzal_rainbow 28 Sep 2025 10:19 UTC
4 points
0
in reply to: ryan_greenblatt’s comment on: Notes on fatalities from AI takeover
I think humans don’t have actual “respect for preferences of existing agents” in way that doesn’t pose existential risks for agents weaker than them.
Imagine planet of conscious paperclippers. They are pre-Singularity paperclippers, so they are not exactly coherent single-minded agents, they have a lot of shards of desire and if you take their children and apply effort in their upbringing, they won’t be single-minded paperclippers and they will have some sort of alien fun. But majority of establishment and conventional morality says that the best future outcome is to build superintelligent paperclip-maximizer and die, turning into paperclips. Yes, including children. Yes, they would strongly object if you try to divert them from this course. They won’t take buying a lot of paperclips somewhere else, just like humanity won’t take getting paperclipped in exchange of building Utopia somewhere else.
I actually don’t know position of future humanity regarding this hypothetical, but I predict that siginificant faction would be really unhappy and demand violent intervention.

quetzal_rainbow 23 Sep 2025 19:38 UTC
9 points
1
on: Notes on fatalities from AI takeover
I don’t feel like “slightly caring” imagery coherently describes any plausible feature of caring of misaligned AIs.
I think that if Earth-originated AI gets to choose between state of the world without humans and exactly the same world but with humans added, supermajority of AIs is going to choose the latter, because of central role humans are likely to take in training data. But “humans are more valuable than no humans ceteris paribus” doesn’t say anything about whether configuration of matter that contains humans is literally the best possible configuration. Take “there are galaxies of computronium and solar system with humans” vs “there is one solar system of computronium more, no humans”.
To put things on scale: on napkin estimate, plausible lower bound on achievable density of computations is $10^{35}$ ops/sec*kg. If we assume that for minimally comfortable live humans will need 100 times their weight and one human has weight 80kg, we will get that one second of existence of single human is worth ~ $10^{39}$ operations, which is approximately 3 billion years of computations on all currently existing computers. I think it sends idea of non-uploaded humans existing after misaligned takeover straight out of window.

quetzal_rainbow 21 Sep 2025 15:27 UTC
2 points
0
on: What, if not agency?
ASI don’t need to be “absolutely” integrated to be extremely integrated relatively to humans. Yes, humans avoid pain and death, but they are not doing even that in rational way. Like, there are many less humans demanding gerontology research, or to pause AI than you would otherwise expect. First generations of ASIs can be “unintegrated” and even be so in ways visible to humans, it doesn’t mean that they won’t be ruthless optimizers compared to us.

quetzal_rainbow 20 Sep 2025 6:10 UTC
1 point
0
in reply to: Davey Morse’s comment on: Davey Morse’s Shortform
I think you are overindexing on current state of affairs in two ways.
First, “we should not pave all the nature with human-made stuff” is a relatively new cultural trend. In High Modernism era there were unironic projects of cutting down Amazon forests and making here corn fields, or killing all animals so they won’t suffer, etc.
Second, actually, in current reality, there are not many things we can do efficiently with ants? We can pave every anthill with solar panels, but there are cheaper places to do that and we don’t produce that many solar panels, yet, and we don’t have that much demand for electricity, yet.
For superintelligence, calculus is quite different. Anthill is large pile of carbon and silicon, and both parts can be used in computations, and superintelligence can afford enough automatization to pick them up. Superintelligent economy has lower bound on growth 33% per year, which means that it’s going to reach $1 per atom of our solar system in less than 300 years—there will be plenty of demand for turning anthills into compute. Technological progress increases number of things you can do efficiently and shifts balance from “leave as it is” to “remake entirely”.
At some point of our development, we are going to be able to disasseble Earth and get immense benefits. We can choose to not do that, because we value Earth as our home. It’s rather likely that superintelligences are not going to share our sentiments.

quetzal_rainbow 19 Sep 2025 19:42 UTC
2 points
0
in reply to: tailcalled’s comment on: tailcalled’s Shortform
I expect the mind to be importantly shaped by a lot of rarely-activating mechanisms
Counterpoint: brain is extremely energy-demanding and every part of it that is not working often gets brutally selected out.

quetzal_rainbow 19 Sep 2025 19:12 UTC
6 points
0
in reply to: Davey Morse’s comment on: Davey Morse’s Shortform
At some point, superintelligences are going to disassemble Earth, because it is profitable, and survival of humans off planet is costly and we likely won’t be able to pay required price.

quetzal_rainbow 9 Sep 2025 6:17 UTC
2 points
0
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
Absolute sense comes from absolute nature of taking actions, not absolute nature of logical correlation. I.e., in Prisoner’s Dilemma with payoffs (5,5)(10,1)(2,2) you should defect if your counterparty is capable to act conditional on your action in less than 75% of cases, which is quite high logical correlation, but expected value is higher if you defect.

quetzal_rainbow 2 Sep 2025 19:32 UTC
3 points
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
There is a difference between adoption as in “people are using it” and adoption as in “people are using it in economically productive way”. I think supermajority of productivity from LLMs is realized as pure consumer surplus right now.

quetzal_rainbow 2 Sep 2025 10:20 UTC
4 points
2
in reply to: Aram Ebtekar’s comment on: Help me understand: how do multiverse acausal trades work?
We can send space ship beyond event horizon and still care about what is going to happen on it after it crosses event horizon, despite this being utterly irrelevant to our genetic fitness in causal sense. If we are capable to develop such preferences, I don’t see any strong reason to develop strongly-monoverse decision theory.

Multiversal acausal trading is just logical consequence of LDT and I expect majority of powerful agents to have LDT-style decision theory, not LDT-but-without-multiverse decision theory.

quetzal_rainbow 2 Sep 2025 10:10 UTC
4 points
0
in reply to: JBlack’s comment on: Help me understand: how do multiverse acausal trades work?
This is really weird line of reasoning, because “multiversal trading” doesn’t mean “trading with entire multiverse”, it means “finding suitable trading partner in multiverse”.
First of all, there is a very-broad-but-well-defined class of agents which humans belong to. It’s class of agents with indexical preferences. It’s likely that indexical preferences are relatively weird in multiverse, but they are simple enough to be considered in any sufficiently broad list of preferences, as certain sort of curiosity for multiversal decision theorists.
For what we know, out universe is going to end one way or another (heat death, cyclic collapse, Big Rip or something else). Because we have indexical preferences, we would like to escape universe in subjective continuity. Because, ceteris paribus, we can be provided with very small shares of reality to have subjective continuity, it creates large gains from trade for any non-indexical-caring entities.
(And if our universe is not going to end, it means that we have effectively infinite compute, therefore, we actually can perform a lot of acausal trading.)
Next, there are large restrictions on search space. As you said, we both should be able to consider each other. I think, say, considering physics in which analogs of quantum computers can solve NP-problems in polynomial time is quite feasible—we have rich theory of approximation and we are going to discover even more of it.
Another restriction is around preferences. If their preferences is something we can do, like molecular squiggles, then we should restrict ourselves to something sufficiently similar to our physics.
We can go further and restrict preferences to sufficiently concave, such that we consider broad class of agents, each of which may have some very specific hard to specify peak of utility function (like very precise molecular squiggles), but have common broad basin of good enough states (they would like to have precise molecular squiggles, but they would consider it sufficient payment if we just produce a lot of granite spheres).
Given all these restrictions, I don’t find it plausible to believe that future human-aligned superintelligences with galaxies of computronium won’t find any way to execute trade, given the incentives.

quetzal_rainbow 2 Sep 2025 9:07 UTC
2 points
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
I don’t think it’s reasonable to expect such evidence to appear after such short period of time. There were no hard evidence that electricity is useful in a sense you are talking about until 1920s. Current LLMs are clearly not AGIs in a sense that they can integrate into economy as migrant labor, therefore, productivity gains from LLMs are bottlenecked on users.

quetzal_rainbow 1 Sep 2025 9:53 UTC
8 points
1
on: Help me understand: how do multiverse acausal trades work?
Problem 1 is wrong objection.
CDT agents are not capable to cooperate in Prisoner’s dilemma, therefore, they are selected out. EDT agents are not capable to refuse to pay in XOR blackmail (or, symmetrically, pay in Parfit’s hitchhiker), therefore, they are selected out.
I think you will be interested in this paper.