There is a difference between adoption as in “people are using it” and adoption as in “people are using it in economically productive way”. I think supermajority of productivity from LLMs is realized as pure consumer surplus right now.
quetzal_rainbow
We can send space ship beyond event horizon and still care about what is going to happen on it after it crosses event horizon, despite this being utterly irrelevant to our genetic fitness in causal sense. If we are capable to develop such preferences, I don’t see any strong reason to develop strongly-monoverse decision theory.
Multiversal acausal trading is just logical consequence of LDT and I expect majority of powerful agents to have LDT-style decision theory, not LDT-but-without-multiverse decision theory.
This is really weird line of reasoning, because “multiversal trading” doesn’t mean “trading with entire multiverse”, it means “finding suitable trading partner in multiverse”.
First of all, there is a very-broad-but-well-defined class of agents which humans belong to. It’s class of agents with indexical preferences. It’s likely that indexical preferences are relatively weird in multiverse, but they are simple enough to be considered in any sufficiently broad list of preferences, as certain sort of curiosity for multiversal decision theorists.
For what we know, out universe is going to end one way or another (heat death, cyclic collapse, Big Rip or something else). Because we have indexical preferences, we would like to escape universe in subjective continuity. Because, ceteris paribus, we can be provided with very small shares of reality to have subjective continuity, it creates large gains from trade for any non-indexical-caring entities.
(And if our universe is not going to end, it means that we have effectively infinite compute, therefore, we actually can perform a lot of acausal trading.)
Next, there are large restrictions on search space. As you said, we both should be able to consider each other. I think, say, considering physics in which analogs of quantum computers can solve NP-problems in polynomial time is quite feasible—we have rich theory of approximation and we are going to discover even more of it.
Another restriction is around preferences. If their preferences is something we can do, like molecular squiggles, then we should restrict ourselves to something sufficiently similar to our physics.
We can go further and restrict preferences to sufficiently concave, such that we consider broad class of agents, each of which may have some very specific hard to specify peak of utility function (like very precise molecular squiggles), but have common broad basin of good enough states (they would like to have precise molecular squiggles, but they would consider it sufficient payment if we just produce a lot of granite spheres).
Given all these restrictions, I don’t find it plausible to believe that future human-aligned superintelligences with galaxies of computronium won’t find any way to execute trade, given the incentives.
I don’t think it’s reasonable to expect such evidence to appear after such short period of time. There were no hard evidence that electricity is useful in a sense you are talking about until 1920s. Current LLMs are clearly not AGIs in a sense that they can integrate into economy as migrant labor, therefore, productivity gains from LLMs are bottlenecked on users.
Problem 1 is wrong objection.
CDT agents are not capable to cooperate in Prisoner’s dilemma, therefore, they are selected out. EDT agents are not capable to refuse to pay in XOR blackmail (or, symmetrically, pay in Parfit’s hitchhiker), therefore, they are selected out.
I think you will be interested in this paper.
Yudkowsky wrote in letter for Time Magazine:
To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow.
And, if anything, That Alien Message was even earlier.
I think proper guide for alignment researcher here is to:
Understand other people as made-of-gears cognitive engines, i.e., instead of “they don’t bother to apply effort for some reason” “they don’t bother to apply effort because they learned in the course of their life that extra effort is not rewarded”, or something like that. You don’t even need to build comprehensive model, you just can list more than two hypotheses about possible gears and not assume “no gears, just howling abyss”.
Realize that it would require supernatual intervention for them to have your priorities and approaches.
Systematically avoiding all situations where you’re risking someone’s life in exchange for a low-importance experience would assemble into a high-importance life-ruining experience for you (starving to death in your apartment, I guess?).
We can easily ban speed above 15km/h for any vehicles except ambulances. Nobody starves to death in this scenario, it’s just very inconvenient. We value convenience lost in this scenario more than lives lost in our reality, so we don’t ban high-speed vehicles.
Ordinal preferences are bad and insane and they are to be avoided.
What’s really wrong with utilitarianism is that you can’t, actually, sum utilities: it’s a type error, because utilities are invariant up to affine transform, what would their sum mean?
The problem, I think, that humans naturally conflate two types of altruism. First type is caring about other entities mental state. Second type is “game-theoretic” or “alignment-theoretic” altruism: generalized notion of what does that mean to care about someone else’s values. Roughly, I think that good type of the second type of altruism requires you to do fair bargaining in interests of entity you are being altruistic towards.
Let’s take “World Z” thought experiment. The problem from the second type altruism perspective is that total utilitarian gets very large utility from this world, while all inhabitants of this world, by premise, get very small utility per person, which is unfair division of gains.
One may object: why not create entities who think that very small share of gains is fair? My answer is that if entity can be satisfied with infinitesimal share of gains, it also can be satisfied with infinitesimal share of anthropic measure, i.e., non-existence, and it’s more altruistic to look for more demanding entities to fill universe with.
My general problem with animal welfare from bargaining perspective is that most of animals probably don’t have sufficient agency to have any sort of representative in bargaining. We can imagine CEV of shrimp which is negative utilitarian and wants to kill all shrimp, or positive utilitarian which thinks that even very painful existence is worth it, or CEV that prefers shrimp swimming in heroin, or something human-like, or something totally alien, and sum of these guesses probably sums up to “do not torture and otherwise do as you please”.
Nobody likes rules. Rules are costly: first, they constrict the space of available actions or force you to expend resources to do something. Second, rules are costly to follow: you need to pay attention and remember all relevant rules and calculate all ways they interact. Third, in real life, rules aren’t simple! After you left area of “don’t kill”, every rule has ambiguities and grey areas and strict dependency on judgement of enforcing authority.
If everybody was good and smart, we wouldn’t need rules. We would just publish “hey, lead is toxic, don’t put it into dishes” and everybody would just stop using lead. After that, even if somebody continued using lead, everybody would just ask and conduct analysis and stop buying lead-tainted commodities and everybody still using lead would go bankrupt.
Conversely, if everybody was good and smart, we wouldn’t need authorities! Everybody would just do what’s best.
You don’t need to be utility minimizer to do the damage through rules. You need just to be the sort of person who likes to argue over rules to paralyze functioning of almost every group. Like, 95% of invocations of authority outside of legal sphere can be described as “I decided to stop this argument about rules, so I stop it”. Heck, even Supreme Court functions mostly in this way.
There are different societies. In broad society, whose goal is just “live and let live”, sure, you can go for simple univerally-enforceable rules. In inclusive parts of society, like public libraries and parks and streets—same. it doesn’t work for everything else. Like, there can’t be comprehensive rules why you can(’t) be fired from startup. There are CEOs and HRs and they make judgements about how productive you are et cetera and if their judgement is unfavorable, you get fired. Sure, there is a labor law, but your expences (including reputational) on trying to stay are probably going to be much higher than whatever you can hope to get. There are some countries where it’s very hard to be fired, but such countries also don’t have rich startup culture.
The concept of weird machine is the closest to be useful here and an important quetion here is “how to check that our system doesn’t form any weird machine here”.
I think the large part of this phenomenon is social status. I.e., if you die early, it means that you did something really embarassingly stupid. Conversely, if you caused someone to die by, say, faulty construction or insufficient medical intervention, you should be really embarassed. If you can’t prove/reliably signal that you behaving reasonably, you are incentivized to behave unreasonaboy safe to signal your commitment to not do stupid things. It’s also probably linked to trade-off between social status and desire for reproduction. It also explains why people who are worried about endless list of harms are not that worried about human extinction: if everybody is dead, there is nobody to be embarassed around.
Extreme sports plateauing is likely weak indicator. Even as risks decrease, you still need to enjoy it and most of people are not adrenaline junkies.
My largest share of probability of survival on business-as-usual AGI (i.e., no major changes in technology compared to LLM, no pause, no sudden miracles in theoretical alignment and no sudden AI winters) belongs to scenario where brain concept representations, efficiently learnable representations and learnable by current ML models representations secretly have very large overlap, such that even if LLMs develop “alien thought patterns” it happens as addition to the rest of their reasoning machinery, not as primary part, which results in human values not only being easily learnable, but also them being “atomic” in a sense that development of complex concepts happens through combination of atomic concepts instead of rewriting them from scratch.
This world is meaningfully different from the world where AIs are secretly easy-to-steer, because easy-to-steer AGIs run into standard issues with wish-phrasing. I don’t believe that we are going to actually become this much better in wish-phrasing on BAU path and I don’t think we are going to leverage easy-to-steer early AGIs to get good ending (again, keeping BAU assumption). If AGIs are too easy to steer, they are probabily going to be pushed around by reward signals until they start to optimize for something simple and clearly useful, like survival or profit.
Criminal negligence leading to catastrophic consequences is already ostracized and persecuted, because, well, it’s a crime.
There were cases when LLMs were “lazier” on common vacations periods. EDIT: see here, for example
I feel like this position is… flimsy? Unsubstantial? It’s not like I disagree, I don’t understand why you would want to articulate it in this way.
On the one hand, I don’t think biological/non-biological distinction is very meaningful from transhumanist perspective. Is embryo, genetically modified to have +9000IQ, going to be meaningfully considered “transhuman” instead of “posthuman”? Are you going to still be you after one billion years of life extension? “Keeping relevant features of you/humanity after enormous biological changes” seems to be qualitatively the same to “keeping relevant features of you/humanity after mind uploading”—i.e., if you know at gears-level what features of biological brains are essential to keep, you have rough understanding what you should work on in uploading.
On the other hand, I totally agree that if you don’t feel adventurous and you don’t want to save the world at price of your personality death, it would be a bad idea to undergo uploading in a way that closest-to-modern technology can provide. It just means that you need to wait for more technological progress. If we are in the ballpark of radical life extension, I don’t see any reason to not wait 50 years to perfect upload tech and I don’t see any reason why 50 years are not going to be enough, conditional on at least normally expected technical progress.
The same with AIs. If we have children, who are meaningfully different from us, and who can become even more different in glorious transhumanist future, I don’t see reasons to not have AI children, conditional on their designs preserving all important relevant features we want to see in our children. The problem is that we are not on track to create such designs, not conceptual existence of such designs.
And all said seems to be simply deducible/anticipated from concept of transhumanism, i.e., concept that the good future is the one filled with beings capable to meaningfully say that they were Homo Sapiens and stopped being Homo Sapiens at some point of their life. When you say “I want radical life extension” you immediately run into question “wait, am I going to be me after one billion years of life extension?” and you start The Way through all the questions about self-identity, essense of humanity, succession, et cetera.
Continuing my rant:
It is indeed surprising, because it indicates much more sanity I would otherwise expected.
Terrorism is not effective. The only ultimate result of 9/11 from perspective of bin Laden goals was “Al Qaeda got wiped out of the face of Earth and rival groups have replaced it”. The only result of firebombing datacenter would be “every single personality in AI safety gets branded terrorist, destroying literally any chance to influence relevant policy”.
I think more correct picture is that it’s useful to have programmable behavior and then programmable system suddenly becomes Turing-complete weird machine and some of resulting programs are terminal-goal-oriented, which are favored by selection pressures: terminal goals are self-preserving.
Humans in native enviornment have programmable behavior in form of social regulation, information exchange and communicating instructions, if you add sufficient amount of computational power in this system you can get very wide spectrum of behaviors.
I think it’s general picture of inner misalignment.
Absolute sense comes from absolute nature of taking actions, not absolute nature of logical correlation. I.e., in Prisoner’s Dilemma with payoffs (5,5)(10,1)(2,2) you should defect if your counterparty is capable to act conditional on your action in less than 75% of cases, which is quite high logical correlation, but expected value is higher if you defect.