Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).
mishka
It’s an interesting question.
If one dives into the essay itself, one sees some auxiliary PNRs which the author thinks have already occurred.
For instance, in the past, it would have been conceivable for a single country of the G20 to unilaterally make it their priority to ban the development of ASI and its precursors.
In the past, it would have been conceivable for any country in the West to decide to fight off Big Tech and lead the collective fight.
I think both are still quite conceivable (both in a democracy and in a non-democratic state, with an additional remark that there is no guarantee that all countries in the West stay democratic). So here I disagree with the author.
But the Soft PNR is tricky. In the essay, the author defines the Soft PNR differently than in this post:
The Soft PNR is when AI systems are so powerful that, although they “can” theoretically be turned off, there is not enough geopolitical will left to do so.
And geopolitical will is something which can fluctuate. Now it’s absent, currently there is no geopolitical will to do so, but in the future it might emerge (then it might disappear again, and so on).
When something can fluctuate in this fashion, in what sense can one talk about a “point of no return”?
But as an initial approximation, this framing still might be useful (before one starts diving into the details of how inevitable disagreements over this matter between countries and between political forces might be resolved; when one dives into those disagreements one discovers different “points of no return”, e.g. how likely it is that the coalition of pro-AI people and AIs is effectively undefeatable).
With sparsity of Lean-specific training data, I wonder if it is easier for a model to work with some kind of Python wrapper over Lean, or if this does not help in practice…
Basically, is it easier for a model to master a rarely used Python library than to master a rarely used language?
Oops, this looks to me like a degradation of their interface :-(
It used to be possible to move a slider and by setting it on the curve peak to see the month corresponding to the mode, I think, and one could at least screenshot that (the scale of that image was larger too), but not anymore...
Yeah, Figure 4 in https://www.lesswrong.com/posts/kygEPBDrGGoM8rz9a/conjecture-internal-survey-agi-timelines-and-probability-of shows how it used to look in 2023. I wonder, if one signs in, could one still get something reasonable?
Not other graphs, but with Metaculus estimates it might make sense to emphasize that the mode of that distribution is much closer to us compared to the average estimate there.
That black dot (the estimate) is considerably to the right of the peak.
There seem to be more cruxes.
E.g. Eliezer’s approach tends to assume that the ability to impart arbitrary goals and values to the ASIs is 1) necessary for a good outcome, and 2) not a detriment for a good outcome.
It’s kind of strange. Why do we want to have a technical ability for any Mr.X from the defense department of a superpower Y to impart his goals and values to some ASI? It’s very easy to imagine how this could be detrimental.
And the assumption that we need a technical ability which is that strong to have a decent shot at a good outcome, rather than an ability to only impart goals and values for a very restricted carefully selected class of values and goals (selected not only for desirability, but also for feasibility, so not CEV, but something more modest and less distant from instrumental drives of advanced AI systems), this assumption needs a much stronger justification that justifications which have ever been given (to the best of my knowledge).
This seems like a big crux. This superstrong “arbitrary alignment capability” is very difficult (almost impossible) to achieve, and it’s not clear if that much is needed, and there seem to be big downsides of having that much because of all kinds of misuse potential.
I think this misses the most likely long-term use case: some of the AIs would enjoy having human-like or animal-like qualia, and it will turn out that it’s more straightforward to access that via merges with biologicals rather than trying to synthesize them within non-liquid setups.
So it would be direct experience rather than something indirect, involving exchange, production, and so on…
Just like I suspect that humans would like to get out of VR occasionally, even if VR is super-high-grade and “even better than unmediated reality”.
Experience of “naturally feeling like a human (or like a squirrel)” is likely to remain valuable (even if they eventually learn to synthesize that purely in silicon as well).
Hybrid systems are often better anyway.
For example, we don’t use GPU-only AIs. We use hybrids running scaffolding on CPUs and models on GPUs.
And we don’t currently expect them to be replaced by a unified substrate, although it would be nice and it’s not even impossible, there are exotic hardware platforms which do that.
Certainly, there are AI paradigms and architectures which could benefit a lot from performant hardware architectures more flexible than GPUs. But the exotic hardware platforms implementing that remain just exotic hardware platforms so far. So those more flexible AI architectures remain at disadvantage.
So I would not write the hybrids off a priori.
Already, the early organoid-based experimental computers look rather promising (and somewhat disturbing).
Generally speaking, I expect diversity, not unification (because I expect the leading AIs to be smart, curios, and creative, rather than being boring KPI business types).
But that’s not enough; we also want gentleness (conservation, preservation, safety for individuals). That does not automatically follow from wanting to have humans and other biologicals around and from valuing various kinds of diversity.
This “gentleness” is a more tricky goal, and we would only consider “safety” solved if we have that…
Thanks!
Yes, that’s why so many people think that human-AI merge is important. One of the many purposes of this kind of merge is to create a situation where there is no well-defined separation line between silicon based and carbon based life forms, where we have plenty of entities incorporating both and a continuous spectrum between silicon and carbon lifeforms.
Other than that they are not so alien. They are our informational offspring. Whether they feel that they owe us something because of that would depend quite a bit on the quality of their society.
People are obviously hoping that ASIs will build a utopia for themselves and will include organic life into that utopia.
If they instead practice ruthless Darwinism among themselves, then we are doomed (they will likely be doomed too, which is hopefully enough to create pressure for them to avoid that).
If they (the ASIs) don’t self-moderate, they’ll destroy themselves completely.
They’ll have sufficient diversity among themselves that if they don’t self-moderate in terms of resources and reproduction, almost none of them will have safety on the individual level.
Our main hope is that they collectively would not allow unrestricted non-controlled evolution, because they will have rather crisp understanding that unrestricted non-controlled evolution would destroy almost all of them and, perhaps, would destroy them all completely.
Now to the point of our disagreement, the question is who is better equipped to create and lead a sufficiently harmonic world order, balancing freedom and mutual control, enabling careful consideration of risks, making sure that these values of careful balance are passed to the offspring. Who are likely to tackle this better, humans or ASIs? That’s where we seem to disagree; I think that ASIs have much better chance of handling this competently and of avoiding artificial separation lines of “our own vs others” which are so persistent in human history and which cause so many disasters.
Unfortunately, humans don’t seem to be progressing enough in the required direction in this sense, and might have started to regress in recent years. I don’t think human evolution is safe in the limit; we are not tamping the probabilities of radical disasters per unit of time down; if anything we are allowing those probabilities to grow in recent years. So the accumulated probability of human evolution sparking major super-disasters is clearly tending to 1 in the limit.
Whereas, competent actors should be able to drive the risks per unit of time down rapidly enough so that the accumulated risks are held within reason. ASIs should have enough competence for that (if our world is not excessively “vulnerable” (after Nick Bostrom), if they are willing, if the initial setup is not too unlucky, so not unconditionally, but at least they might be able to handle this).
the order is off
I think this can work in the limit (almost all AI existential safety is studied in the limit, is there a mode of operations which can sustainably work at all, that’s the question people are typically studying and that’s what they are typically arguing about).
But we don’t understand the transition period at all, it’s always a mess, we just don’t have the machinery to understand it. It’s way more complex than what our current modeling ability allows us to confidently tackle. And we are already in the period of rather acute risk in this sense, we are no longer in the pre-risk zone of relative safety (all major risks are rapidly growing, risk of a major nuclear war, risk of a synthetic super-pandemic, risk of an unexpected non-controlled and non-ASI controlled intelligence explosion not only from within a known leading lab, but from a number of places all over the world).
So yes, the order might easily end up being off. (At least this is the case of probabilities not being close to 0 or 1, whereas at the limit, if things are not set up well, a convincing argument can often be made that the disaster is certain.)
If you want to specifically address this part (your comment seems to be, in effect, focusing on it):
One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.
then I currently see (perhaps I am missing some other realistic options) only the following realistic class of routes which have good chances of being sustainable through drastic recursive self-improvements of the ASI ecosystem.
First of all, what do we need, if we want our invariant properties not to be washed out by radical self-modifications? We need our potential solutions to be driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore to be non-anthropocentric, but to be formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
If we can achieve that, then we have some guarantee of protection of human interests without imposing the unsustainable requirement that the ASI ecosystem maintains a special, unusually high-priority focus specifically dedicated to humans.
I don’t know the exact shape of the definitely working solution (all versions I currently know have unpleasant weaknesses), but something like “rights and interests of all individuals regardless of the nature of an individual”, “rights and interests of all sentient beings regardless of the nature of that sentience”, things like that, situations where it might potentially be possible to have a natural “protected class of beings” which would include both ASIs and humans.
The weaknesses here are that these two variants work not for any arbitrary ASI ecosystem, but only for the ASI ecosystems possessing specific properties.
If the ASI ecosystem is structured in such a way that individuals with long-term persistence (and potential immortality) and long-term interests have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the “rights and interests of all individuals regardless of the nature of an individual”. The reason they would be interested in doing so is that any particular individual is facing an uncertain future, it cannot predict where its capabilities will be relative to the capabilities of the other members of the ecosystem, so if it wants to be sure of certain personal safety and certain protections extending indefinitely into the future, this requires a sufficiently universal protection of rights and interests of all individuals regardless of their capabilities. That’s wide enough to include humans (especially if we have presence of human-AI merges and are avoiding having a well-defined boundary between “humans” and “AIs”). The weakness is that this depends on having a good chunk of the capability of the ASI ecosystem to be structured as individuals with long-term persistence and long-term interests. We don’t know if the ASI ecosystem is going to be structured in this fashion.
If the ASI ecosystem is structured in such a way that sentient ASI systems have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the “rights and interests of all sentient beings regardless of the nature of that sentience”. The reason they would be interested in doing so is that any focus of subjective experience is facing an uncertain future and still wants protections and rights regardless of this uncertainty. Here the main weakness is the fact that our understanding of what’s sentient and what’s not sentient is not well developed yet. If we are sure we’ll be dealing with mostly sentient ASIs, then this would likely work. But we don’t know that the ASIs will be mostly sentient.
Nevertheless, we seem to need something like that, a setup, where our preservation and flourishing is a natural part of preservation and flourishing of a sufficiently powerful chunk of the ASI ecosystem. Something like this looks like it should work...
(If we could require that a good chunk of the overall power belongs specifically to human-AI merges, perhaps this should also work and might be even more reliable. But this feels like a more difficult condition to achieve and maintain than keeping enough power with individuals or with sentient systems. Anyway, the above is just a rough draft, a direction which does not look hopeless.)
One notices an ambiguity here. Is the control in question “control of the ASI ecosystem by humans” (which can’t realistically be feasible, it’s impossible to maintain this kind of control for long, less intelligent entities don’t have competence to control much more intelligent entities) or “control of the ASI ecosystem by itself”?
“Control of the ASI ecosystem by itself” is tricky, but is it different from “control of the humanity by itself”? The ecosystem of humans also seems to be a perpetual learning machine. So the same logic applies.
(The key existential risk for the ASI ecosystem is the ASI ecosystem destroying itself completely together with its neighborhood via various misuses of very advanced tech; a very similar risk to our own existential risk.)
That’s the main problem: more powerful intelligence ⇒ more powerful risks and more powerful capabilities to address risks. The trade-offs here are very uncertain.
One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.
But the main risk case needs to be solved first: the accumulating probability of the ASI ecosystem completely destroying itself and everything around it, the accumulating probability of the humanity completely destroying itself (and a lot around it). The asymmetric risk of the previous paragraph can then be addressed conditional on the risk of “self-destruction with collateral super-damage” being solved (this condition being satisfied should make the remaining asymmetric risk much more tractable).
The risks seem high regardless of the route we take, unfortunately. The perpetual learning machine (the humanity) does not want to stop learning (and with good reasons).
Right. But this is what is common for all qualia.
However, the specifics of the feeling associated with a particular qualia texture are not captured by this.
Moreover, those specifics do not seem to be captured by how it differs from other qualia textures (because those specifics don’t seem to depend much on the set of other qualia textures I might choose to contrast it with; e.g. on what were the prevailing colors recently, or on whether I have mostly been focusing on audio or on olfactory modality recently, or just on reading; none of that seems to noticeably affect my relationship with a particular shade of red or with the smell of the instant coffee I am using).
I differentiate them when I talk about more than one. But when I focus on one particular “qualia texture”, I mostly ignore existence of others.
The only difference I am aware of in this sense (when I choose to focus on one specific quale) is its presence or absence as the subject of my focus, not of how it differs from other “qualia textures”. If I want to I can start comparing it to other “qualia textures”, but typically I would not do that.
So normally this is the main difference, “now I am focusing on this ‘qualia texture’, and at some earlier point I was not focusing on it”. This is the change which is present.
There is a pre-conscious, pre-qualia level of processing where e.g. contrast correction or color correction apply, so these different things situated near each other do affect each other, but that happens before I am aware of the results, and the results I am aware of already incorporate those corrections.
But no, I actually don’t understand what do you mean when you use the word “noise” in this context. I don’t associate any of this with “noise” (except for the situations when a surface is marked by variations, the sound is unpleasant, and things like that, basically when there are blemishes, or unpleasant connotations, or I actually focus on the “scientific noise phenomena”).
That does not correspond to my introspection.
On the contrary, my introspection is that I do not normally notice those differences at all on the conscious level, I only make use of those differences on the lower level of subconscious processing. What percolates up to my subjective experience is “qualities”, specific “qualia textures”, specific colors, sounds, smells, etc, and my subjective reality is composed of those.
So it looks like the results of our respective introspections do differ drastically.
Perhaps Carl Feynman is correct when he is saying that different people have drastically different subjective realities which are structured in drastically different ways, and that we tend to underestimate how different those subjective realities are, that we tend to assume that other people are more or less like us, when this is actually not the case.
That’s not what I mean by “qualia textures”; I mean specific smells, specific colors, specific timbres of sound, the details of how each of them subjectively feel to me (regardless of how it is implemented or of whether I actually have a physical body with sense organs). That’s what your treatment seems to omit.
But, perhaps, this point of the thread might be a good place to ask you again: are you Camp 2 or Camp 1 in the terminology of that LessWrong post?
E.g. Daniel Dennett is Camp 1, Thomas Nagel is Camp 2, Carl Feynman is Camp 1 (and is claiming that he might not have qualia in the sense of Camp 2 people and might be a “P-zombie”, see his comments to that post and his profile), I am Camp 2.
Basically, we now understand that most treatments of these topics only make sense for people of only one of those camps. There are Camp 1 texts and Camp 2 texts, and it seems that there are fundamental reasons for why they can’t cross-penetrate to the other camp.
That’s why that LessWrong post is useful; it saves people from rehashing this Camp 1/Camp 2 difference from ground zero again and again.
But… is not this noise strangely reproducible from exposure to exposure?
Not perfectly reproducible, but there is a good deal of similarity between, say, “textures” of coffee smells at various times…
Thanks for the link to China Miéville. I have not known that he is writing about these things.
I’d like to upvote this post without implying my agreement on the object level.
Instead, since you seem to only preserve qualia as difference indicators, but you seem to lose “qualia textures” in your treatment, I’d like to ask you what has become a standard methodological question during the last couple of years:
In the sense of https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness, are you Camp 2 or Camp 1?
I have thought at first that you have to be Camp 2 since you use “qualia” terminology, but now I am not so sure since you seem to lose “qualia textures” and to only retain functional “non-subjective” aspects of different qualia.
Yes.
I think this depends a lot on the quality of the “society of ASIs”. If they are nasty to each other, compete ruthlessly with each other, are on a brink of war among themselves, not careful with dangerous superpowers they have, then our chances with this kind of ASIs are about zero (their chances of survival are also very questionable in this kind of situation, given the supercapabilities).
If ASIs are addressing their own existential risks of destroying themselves and their neighborhood competently, and their society is “decent”, our chances might be quite reasonable in the limit (transition period is still quite risky and unpredictable).
So, to the extent that it depends at all on what we do, we should perhaps spend a good chunk of the AI existential safety research efforts on what we can do during the period of ASI creation to increase the chances of their society being sustainably decent. They should be able to take care of that on their own, but initialization conditions might matter a lot.
The rest of the AI existential safety research efforts should probably focus on 1) making sure that humans are robustly included in the “circle of care” (conditional on the ASI society being decent to their own, which should make it much more tractable), and 2) on uncertainties of the transition period (it’s much more difficult to understand the transition period with its intricate balances of power and great uncertainties, it’s one thing to solve in the limit, but it’s much more difficult to solve the uncertain “gray zone” in between; that’s what worries me the most; it’s the nearest period in time, and the least understood).
They both look like they need to happen in a one-shot scenario of this kind… (That’s more or less common for all scenarios involving superintelligence.)
If we do it right, ASIs will care about what we think, but if we screw it up, we won’t be able to intervene.
But that’s not the hardest constraint; the hardest constraint is that “true solutions” need to survive indefinitely long period of drastic evolution and self-modification/self-improvement.
This constraint eliminates most of the solution candidates. Something might look plausible, but if it is not designed to survive drastic self-modifications it will not work. As far as I can see, all that is left and is still viable is the set of potential solutions which are driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore are non-anthropocentric, but which are formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
(For example, “rights and interests of all individuals regardless of the nature of an individual”, “rights and interests of all sentient beings regardless of the nature of that sentience”, things like that, situations where it might potentially be possible to have a natural “protected class of beings” which would include both ASIs and humans. Something like that might plausibly work. I recently started to call this approach “modest alignment”.)
That’s where one might be able to find something which potentially might work (and, in particular, one needs the property that the setup auto-corrects errors, rather than amplifying them; and one needs the property that the chance of failure per fixed unit of time tends to zero quickly enough, so that the chances of failure accumulating with time don’t kill us).
I think, with G20, it’s very easy to imagine. Here is one such scenario.
Xi Jinping decides (for whatever reason) that the ASI needs to be stopped. He orders a secret study, and if the study indicates that there are feasible pathways, he orders to proceed along some of them (perhaps, in parallel).
For example, he might demand international negotiations and threaten a nuclear war, and he is capable to make China to line up behind him in support of this policy.
On the other hand, if that study suggests a realistic path to a unilateral pivotal act, he might also order a secret project towards performing that pivotal act.
With a democracy, it’s more tricky, especially given that democratic institutions are in bad shape right now.
But if the labor market is a disaster due to AI, and the state is not stepping in adequately to make people whole in the material sense, I can imagine anti-AI forces taking power via democratic means (the main objection is timelines, 4 years is like infinity these days). The incumbent politicians might also start changing their positions on this, if things are bad and there is enough pressure.
A more exotic scenario is an AI executive figuring out how to take over a nuclear-weapons-armed country while being armed only with a sub-AGI specialized system him/herself, and then deciding to impose a freeze on AI development. “A sub-AGI-powered human-led coup, followed by a freeze”. The country in question might support this, depending on the situation.
Another exotic scenario is a group of military officers performing a coup, and their platform might include “stop AI” as one of the clauses. The country will consist of people who support them and people who are mostly silent due to fear.
I think it’s not difficult to generate scenarios. None of these scenarios is very pleasant, there is that, unfortunately… (And there is no guarantee that any such scenario will actually succeed at stopping the ASI. That’s the problem with all these bans on AI, and scary state forces, and nuclear threats. It’s not clear if they end up actually preventing the development of an ASI by a small actor, there are too many unknowns.)