kaarelh AT gmail DOT com
Kaarel
Positives of a future with human capability improvement over an AI future.
Beren Millidge has an essay arguing for the claim that a future in which humanity proceeds with biological capability increasing is scarier than a future in which we develop AGI. Here are some concluding sentences from his essay:
Ultimately, human intelligence amplification and the resulting biosingularity has a deeper and more intractable alignment problem than AI alignment, at least if we don’t just assume it away by asserting that humans and our transhuman creations just have some intrinsic and ineffable access to ‘human values’ that potential AIs lack.
The only potential positive as regards alignment of the biosingularity is that it will happen much later, most likely in the closing decades of the 21st century and around the end of the natural lifespans of my personal cohort. This gives significantly more time to prepare than AGI, which is likely coming much sooner, but the problem is much harder and requires huge advances in neuroscience and understanding of brain algorithms to even reach the level of control we have over today’s AI systems (which is likely far from sufficient).
I disagree with his thesis — I think that instead of creating AIs smarter than humans, it would be much better to proceed with increasing the capabilities of humans (for at least the next 100 years). [1] [2] Millidge’s claim that the only potential positive of a biofoom is that it starts later seems clearly false. In this note, I will list other imo important (pro tanto) positives of a human foom. I agree with Millidge that there are also positives of the AGI path; [3] these won’t be discussed in the present note. To assess which path is better overall, one could want to compare the positives of the human path to the positives of the AGI path, [4] but I will not do that here. The rest of this note is the list of positives. [5]
more similar entity ⇒ more similar values
an argument in favor of the human path:
assumption 1. it makes sense to speak of the values of an entity, and the values of an entity are some sort of kinda-smooth function of the entity’s structure/constitution
assumption 2. humanity (taken as an entity) currently has pretty good values
conclusion. somewhat modified humanity will still have pretty good values. in particular, a somewhat bioenhanced humanity will still have pretty good values
in contrast: an AGI future will involve the-process-happening-on-earth changing a lot more from what it is now, certainly by each point in time, but also by the time each higher capability level is reached
we can also give a similar argument for single humans:
assumption 1. it makes sense to speak of the values of an entity, and the values of an entity are some sort of kinda-smooth function of the entity’s structure/constitution
assumption 2′. individual humans growing up in current human societies have pretty good values
conclusion’. somewhat modified individual humans growing up in somewhat modified societies will still have pretty good values
in contrast: an AGI future will involve AGIs that are much more different than these future humans growing up in contexts that are much more different from the contexts in which current humans grow up. this is certainly true by each time (or each time after the beginning of each “foom proper”), but also by the point each higher capability level is reached
said another way:
the target of right values is drawn largely around where the arrow(s) determining the values that humans have landed. shooting more similar arrows in a similar way is a decent strategy for hitting that target again.
[humans are]/[humanity is] just cool
on the biofoom path, it will still be humanity, made of humans. humanity is cool. humans are cool. humans with increased capabilities and humanity with increased capabilities would be cool as well. like, of course it’s possible for us to become even much cooler, but we’re already occupying a very specific rare cool region in mindspace
the smarter humans you make will be slotting into existing human institutions/organizations/communities. existing human institutions/organizations/communities will still be useful to the smarter humans. this is a reason for existing human institutions/organizations/communities to persist/thrive/develop. [6] and existing human institutions are carriers/supporters/implementers of human values and also cool
the more capable humans will be continuing existing human (social, political, artistic, technological, scientific, mathematical, philosophical) projects and traditions. human thought will continue to develop. this is cool
like, even if the humans with improved capabilities were to (let’s say) form their own state and build a massive biodome around it and cause the outside to eventually become uninhabitable by polluting it or militarily leveling it to make room for automated factories or whatever, killing all other humans, that would be a very bad thing for them to do, but it would still be a human future, and that’s pretty important
an analogy:
suppose you are at intelligence level
and you have to replace yourself with something at intelligence level . let’s say that you have the following options for making an agent at intelligence level :the agent could just be you after having taken a linear algebra course or after inventing some new methods in numerical linear algebra
the agent could just be you after getting gene therapy which changes a single genetic variant to one that better supports learning (even in adulthood)
the agent could be some sort of novel mind you create from scratch by some training procedure (ok maybe involving some culture which was also involved when you grew up)
now, there is at least some range of your mind-making precision/understanding parameter from
to idk [7] in which the last option is massively worse on the axis of “how good is it for the resulting guy to live its life” than the first two options! you consider yourself cool! you shouldn’t commit suicide! [8]to spell out the analogy: with humanity as the agent, the human path is like becoming smarter via gene therapy and learning linear algebra, whereas the AI path is like creating a novel thing happening on earth largely in novel ways from scratch [9]
reasons why human thought would be guiding development more/better in a biofoom
we have a lot of experience with and understanding about humans and specifically how to raise humans
for example, voters, politicians, and researchers will all have much higher-quality starting intuitions about what millions of human einsteins would be like than about what a bunch of AIs would be like
biofooming will be happening later, so we will have more time to prepare before it starts happening [10]
but also biofooming will be happening much slower, so humans will have more time to think about each improvement
like, the speed of development will be slower compared to the speed of human thought, so more human thought can go into each step of development. each unit step of development will be more human-thought-fully chosen
this is true at the level of individual humans thinking about what should be done, and also true at the level of institutions and governments (like, a government running at human speed will be better able to legislate a slow biofoom)
reasons a biofooming world would be diffusely human-friendly (sociopolitical factors, anti-[gradual disempowerment] stuff)
since humans will still be expensive to create and will continue to run on [at least order
watts]/[a similar nutrient budget] in the biofoom (unlike how soon after AGI, it would be very cheap to create a new copy of an AI more capable than any human), even a current-100-iq human will probably remain productively employable for a long timeit will be somewhat difficult/weird for more capable humans to render the environment unlivable to less capable humans, because all humans have basically the same environmental requirements (for now) [11]
ditto for good governance, laws, norms. e.g. it’d be quite natural for a law that makes it harder to manipulate the parents of an IVF einstein out of their property to also make it harder to manipulate the amish out of their property
ditto for organizations and economic structures. e.g. it would be very natural for AIs to have a research process that operates in neuralese on a server, with translation costs being high enough that even if a human could do some useful task, doing the task yourself is cheaper than “translating” the task and context to a human; this effect is still present in human-human interactions but the degree to which it is present on the human path is smaller than the degree to which it is present on the AI path
empathy is easier/”more natural” toward beings that are more like you. when agent B is more similar to agent A, it is more likely that [golden rule]/[categorical imperative] style reasoning leads A to treat B well [12]
power concentration stuff is less bad in the biofoom case. it’s not like there would be some entity controlling these more capable humans. these humans will be growing up in various families, involved in various communities/cultures/nations, going to various schools. in the biofoom case, RSI will not be localized to a single lab [13] . monopolies are much more natural in the AI case than in the biofoom case
the AGI path involves power shifting away from people (and to AIs or companies) a lot more
fooming happening slower means that the change in technological/social/cultural/environmental conditions during any
years of each human’s life is smaller. to the extent that this change endangers the survival/welfare/usefulness/employability of each human, it is good for it to happen slowerfooming happening slower compared to the pace of life means that more life can be lived by existing living beings before being at the frontier of development start to find them boring and useless (or like, i don’t want to say that this happens necessarily, but there is a force pushing in this direction)
in a biofoom, there will be a graph of caring connecting most humans to most other humans with not that many steps; it will only have such-and-such clustering coefficients
human institutions, organizations, systems are generally more likely to survive for longer. in particular, the following specific human institutions/organizations/systems supporting the ability of each human to live a good long life are more likely to survive for longer: states, laws, systems enforcing laws, social safety nets, democracy, cryonics organizations
technologies which benefit humans with higher capabilities will also be somewhat likely to benefit current-100-iq humans. eg cures to diseases and other medical treatments, better educational methods, new words/concepts, most consumer technologies. if humans remain central to [doing stuff]/[the economy], then a large fraction of economically rewarded innovation will be making humans more capable, and so technological progress will generally be pointed in a more humane direction
in general: various mad local forces at play in the world (greed, status-seeking, etc) will stay pointed in a more humane direction
also, governments will be much better able to understand, track, and deal with messy world problems in the biofoom case
we know humans [can be and often are] kind/nice to other humans
about humans, we know the following:
most humans consider it very bad to personally kill other humans, and would not kill other humans in normal circumstances. most humans consider the killing of humans highly reprehensible in general [14]
most humans consider it bad to steal from humans. most humans think human property rights should basically be respected. most humans are not in favor of taking property from people. most people would consider it very bad to take property from a person such that this person would then be unable to live an alright life
probably there are some humans who have a deep enough commitment to humanity that they’d remain nice to all existing humans even after individually fooming a lot (though note this won’t be happening in the human foom)
also:
there are specific steps/niches in human evolutionary history which made us nicer to unrelated strangers
other factors
there are factors much more tightly constraining capabilities in the biofoom case than in the AI foom case:
brain volume is tough to increase by idk more than
(unlike AI compute)roughly, there is some not-THAT-large finite number of intelligence-increasing “genetic ideas” in the current gene pool, and for at least some initial period these set a cap on how far you can go biologically. it will be very hard to genetically write novel more capable human learning algorithms. the genome interface to mind reprogramming is kinda cursed
it will be expensive and potentially immoral and illegal to “run experiments”
humanity’s correct values are somewhat well-tracked by reflection / self-endorsed development, and the biofoom path will be like that
it makes sense to speak of what sort of reflection and development should happen, distinguishing this from development that is likely to happen. it is very much not true that anything goes! if it were likely/natural that our society would get replaced by a molecular squiggle maximizer, that wouldn’t mean that our society’s true values are to make lots of those molecular squiggles, and that wouldn’t mean this was what should happen all along. whereas if we reflect carefully and understand more stuff and become better versions of ourselves and conclude that we should make lots of some molecular squiggles, then it’s plausible that this was what should happen all along.
it is (at least prima facie) extremely scary/reckless to have a step in your development where you hand things over to some novel mind created largely from scratch! this is done much more on the AI path than on the human path
i say some more on good development and also the general topic of this note here: https://www.lesswrong.com/posts/iemgJhjNLa5eyevWR/kh-s-shortform?commentId=PHm2ZkagfyrhT2Wvz
a concluding remark
self-improvement generally has many good properties over creating a new agent/mind from scratch [15]
- ↩︎
i think we should ban AGI
- ↩︎
It is also a possibility that both options are bad. My view is that we should push forward with increasing human capabilities biologically and culturally/educationally. But I think this question deserves serious analysis, and there are certainly specific things here that one should be very careful about and regulate. However, this note will not be analyzing this question.
- ↩︎
That said, I think the analysis of the positives in his essay gets very many things wrong.
- ↩︎
But one also doesn’t have to do that, to compare the two paths. One can also just “directly” think about what would happen along each path.
- ↩︎
They are not listed in order of importance.
- ↩︎
clearly this is a reason for these to be around for longer in wall clock time, but also it’s a reason for them to be around until higher capability levels
- ↩︎
in fact it seems plausible that, at least if the mind design has to be done from your own bounded perspective, it will keep being better to self-improve forever. i think this is plausible on this individual future life coolness axis and also all-things-considered.
- ↩︎
ok, if you (imo incorrectly) believe in some sort of soul theory of personal identity then maybe to make this a fair example you would need to imagine the soul getting detached in all three examples, but then maybe you will think all of these are suicides… so maybe this isn’t a good analogy for you… but hmm i guess maybe you should also in the same sense believe in there being a soul attached to each society though, and then it would be a good analogy actually
- ↩︎
ok, there’s a meaningful amount of shared culture. even if you thought human minds and AI minds are “mostly cultural” and that subbing out representation/learning/etc algorithms/structures and radically changing learning contexts doesn’t make it legit to say AGI will be a novel thing created largely in novel ways from scratch, it is still at least a really big step; it’s still creating some totally new guy
- ↩︎
as Millidge says
- ↩︎
yes, there are sort of examples like climate change. but it is still much easier to imagine entities that are just programs that can be run on arbitrary computers being totally fine with or even preferring a very different environment. there is a large difference in degree here
- ↩︎
and we at least know humans have a propensity to carry out and be moved by this style of reasoning
- ↩︎
or three labs or whatever. In reality, I think only a single lab will matter, absent strict capability regulation.
- ↩︎
there is a major issue around diffuse effects — like, people and institutions currently not tracking [decreasing the lifespan of
people by day each] as an instance of [killing people] or minimally still extremely bad, and ditto for probabilistic versions of this, but i think this is largely an epistemic/skill/intelligence issue - ↩︎
kinda agree, but a consideration worth noting: if your company currently carries out process
by spending of work on tasks of type and of work on tasks of type , then if doing type stuff gets sped up while type stuff isn’t sped up, Amdahl’s law style reasoning like what you say in your comment would give that you get a roughly speedup, but really you can quite plausibly get like a speedup because in reality [a sufficient quantity of type work can partly substitute for type work in pushing forward] / [it wasn’t really necessary to do the type work, just good to do it at the previous relative speeds]. (the overall speedup number will of course depend on specifics of the example.) e.g.:if algo research is sped up 1000x but compute buildup isn’t sped up, I think you will still have
AI progress for some time even though in the past the two might have contributed similarly to AI progressmaybe: if high-iq algo research isn’t sped up much but kinda dumb algo research tasks are sped up
and previously these contributed equally to AI progress, you could still get a speedup on AI progress
so, i think what you’re saying is technically true for things which are really bottlenecks — like, in the sense that you will really have to keep doing the same amount of the same thing later for each unit of AI progress — but i’m concerned about various things one would want to apply this thinking to not actually being bottlenecks in that sense
I agree with your point in the canonical solomonoff sequence prediction case. I think your point is what I mean in my note by “you have pathology of not specifying even the hypothesis in the seq prediction case (like it’ll be better to drop bits and take the likelihood loss)”. I think this pathology is maybe not present in “function solomonoff” (I state this in the note as well but don’t really explain it), though I’m very much uncertain.
to state the hopeful “function solomonoff” story in more detail:
By “function solomonoff”, I mean that we have a data set of string pairs
, and we think of one hypothesis as being a program that takes in an and outputs a probability distribution on strings from which is sampled. Let’s say that we are in the classification case so always (we’re distinguishing pictures which show dogs vs ones which don’t, say).The “canonical loss” (from which one derives a posterior distribution via exponentiation) here would be the length of the program specifying the distribution plus the negative log likelihood assigned to
summed over all . What I’m suggesting is this loss but with a higher coefficient on the length of the program than on the likelihood terms.Suppose that the classification boundary is most simply given by what we will consider a “simple model” of complexity
bits, together with “systematic human error” which changes the answer from the simple model on a fraction of the inputs, with the set of those inputs taking bits to specify.If we turned this into sequence prediction by interleaving like
, then I’d agree that if we penalize hypothesis length more steeply than likelihood, then: over getting a model which does not predict the errors, we would get a universal-like hypothesis, which in particular starts to predict the human errors after being conditioned on sufficiently many bits. So the idea of more steep penalization of hypothesis length doesn’t do what we want in the sequence prediction case. But I have some hope that the function case doesn’t have this pathology?Some models of the given data in the function case:
the “good model”: The distribution is given by the simple model with a
probability of flipping the answer on top (independently on each input). This gets complexity loss like plus something small for specifying the flip model, and its expected neg log likelihood is .the “model that learns the errors”: This should generically take
bits to specify, and it gets expected neg log likelihood.the “50/50 random distribution” model: This takes
bits to specify and has bit of expected neg log likelihood.some “universal hypothesis model”: I’m not actually sure what this would even be in the function setting? If you handled the likelihood part by giving a global string of random bits which gets conditioned on other input-output pairs, then I agree we could write something bad just like in the sequence prediction case. But if each input gets its own private randomness, then I don’t see how to write down a universal hypothesis that gets good loss here.
So at least given these models, it looks like the “good model” could be a vertex of the convex hull of the set of attainable (hypothesis complexity, expected neg log likelihood) tuples? If it’s on the convex hull, it’s picked out by some loss of the form described (even in the limit of many data points, though we will need to increase the hypothesis term coefficient compared to the sum of log likelihoods term as the data set size increases, ie in the bayesian picture we will need to pick a stronger prior when the data set is larger in this example).
that said:
Maybe I’m just failing to construct the right “universal hypothesis” for this example?
It seems plausible that some other pathology is present that prevents nice behavior.
I haven’t spent that much time trying to come up with other pathological constructions or searching for a proof that sth like the good model is optimal for some hyperparameter setting.
I can see some other examples where this functional setup still doesn’t work nicely. I might write more about that in a later comment. The example here is definitely somewhat cherry-picked for the idea to work, though I also don’t consider it completely contrived.
I think it’s very unlikely this steeper penalization is anywhere close to a full solution to the philosophical problem here. I only have some hope that it works in some specific toy cases.
some more variations on this theme:
Nick Land in Meltdown: “Nothing human makes it out of the near-future.”, “Capital only retains anthropological characteristics as a symptom of underdevelopment; reformatting primate behaviour as inertia to be dissipated in self-reinforcing artificiality. Man is something for it to overcome: a problem, drag.”
Historical materialism views the organization of society throughout history as being the argmax of production (or maybe argmax the development of production or productive power or something), and after AGI, humans will not be part of the argmax of production for long.
“when you make something less useful (eg by introducing other things that can do its “jobs/functions” better), you make it less likely to stick around”, “what is no longer good for anything tends to get discarded” [1]
“messy futures are bad for humans” (in the limit: “a uniformly random configuration of atoms doesn’t have anything like humans in it”)
- ↩︎
conversely, you can make something more likely to be preserved by figuring out how to make it instrumental to more valued/productive/competitive things/processes — each such process then provides a reason to keep the thing around, and provides a constraint on any replacement to the thing. “instrumentalizing the terminal”, ie protecting good things this way, is a sort of dual to subgoal stomp. i think protection by instrumentality is the main way one gets conserved structures in biological evolution
maybe even more generally, there is a “game of questions/problems and answers/solutions” played by humans and human communities, that one can study to become better able to create a setup in which AIs are playing this game. some questions about this game: “how does an individual human or a human community remain truth-tracking?”, “what structures can do load-bearing work in a truth-tracking system?”, “to involve a new mind in a community of truth/knowledge/understanding, what is required of the new mind and what is required of its teachers/environment?”, “what interventions make a system more truth-tracking?”, “how does one avoid meaning drift/subversion?”. this includes the science stuff you talk about but also very basic stuff like a kid learning arithmetic from their parents or humans working successfully with integrals for two centuries before we could define them rigorously — like, how come we can mostly avoid goodharting answers against the judgment of other people, how come we can mostly avoid becoming predictors of what other people would say, how come we can do easy-to-hard generalization of notions, etc.. the usual losses/setups currently used by ML practitioners might be sorta wrong for these things, and maybe one could think carefully about the human case and come up with better losses/setups to use in an epistemic system. an obstacle is that in the human case, stuff working well is probably meaningfully aided by the agents already having shared human purposes [1] [2] and by already having similar “priors” coming from the human brain architecture and similar upbringings. another obstacle is that the human thing is probably relying on various low-level things that are hard to see and that probably lack equivalents in current ML systems and are too low-level to be created by any simple intervention on a community of LLMs. another obstacle is that there are probably just very many ideas involved in making humans truth-tracking (though you can then ask: how do we set up a meta-level thing that finds and implements good ideas for how an epistemic system should work). another obstacle is that in the human case, human purposes are broadly aligned with understanding stuff better in the systems of understanding we have (whereas if we force some system of presenting understanding on the LLMs and try to get them to produce some understanding and present it legibly in that system, their purposes are probably not well-aligned by default with doing that). (oh also, if your work results in understanding these questions well, you should worry about your work helping with capabilities. maybe don’t give capabilities researchers good answers to “how do we make it so the originators of good ideas get rewarded in an epistemic community?”, “how does one tell when a new notion is good to introduce into the shared lexicon?”, “what is the process of coming up with a good new notion like?”, “what sort of thing is a good model of a situation?”, “how does one avoid assigning a lot of resources to useless cancers like algebraic number theory?” [3] .) anyway, despite these issues, it still seems like an interesting direction to work on
copying a note i wrote for myself on a related question:
″
beating solomonoff induction at grokking a notion
how come as humans we can understand what someone means when using a word. as opposed to becoming a predictor of what they would say. it is possible for a human to not make the mistakes another person would make when eg classifying images for having dogs vs not! roughly speaking solomonoff would be making the same mistakes the person would make
this is a classic issue plaguing many (maybe even most?) things in alignment. eg ELK, AGI via predictive modeling, CIRL/RLHF or just pretty much anything involving human feedback
can’t we write an algo for that, and have that not be dumb like solomonoff is dumb
some ideas for ways to implement a thing that is good like this / what’s going on in making the human thing work:
an even stronger simplicity prior than solomonoff. eg if there are explainable mistakes on a simple model, you want the simple model that doesn’t predict the mistakes. this will have inf log loss but let’s just do a version of the simple hypothesis with noise, and then penalize the likelihood term less. have people not already considered this for solving the model + data split problem? does this attempt to solve the model data split problem introduce some pathologies?
you have pathology of not specifying even the hypothesis in the seq prediction case (like it’ll be better to drop bits and take the likelihood loss). but i think at least this pathology is not present in the function case, if we don’t get randomness in the universal semimeasure way (like if we make the randomness not shared between different inputs — each input has to sample its own random bits)
alternatively: just set abs bound on model complexity, rest has to be likelihood. this feels bad because if you get the bound wrong you get some nonsense. that said in a sense this is equivalent to the previous proposal (like if you pick the length bound the previous thing with some hyperparam would find). idk maybe in the function case you can look at how many bits of entropy are left given the hypothesis, like imagine this graphed as a function of hypothesis length, and like see some point at which the derivative changes or sth. (this doesn’t show up in the seq case because there it’s pretty much just 1 bit paying for 1 bit (until you specify it in full if it’s finite complexity))
simplicity prior defined in terms of existing understanding
you specify properties of the thing or notion sometimes
eg [concrete] and [abstract] make a partition of things maybe, but [alice would think this is concrete] and [alice would think this is abstract] might not. eg knowing [if something is abstract, then it usually helps a lot to study examples to understand it] can help you understand when your teacher alice is making a mistake about an abstractness claim
or eg: 1+1=2 won’t be true if you accidentally assign 1->rabbit and 2->chicken from a demonstration (for any reasonable meaning of plus)
some sort of t complexity bound might help. tho really you aren’t gaining a mechanism when you learn what a dog is. you are more like learning a new question/problem
also as a human one can just ask: what is it that this person is trying to teach me. what is this person trying to point at. this is a question you can approach like any other question
when we gain a notion, we gain sth like a question that can be asked about a thing. and we have criteria on this notion. we gain “inference rules”/”axioms” involving the notion. ultimately we are wanting it to play some role in our thought and action. that role can guide the precisification/development/reworking of the concept. the role can be communicated. it can be
shared between mindsto gain the chair notion is to gain the question “is this a chair?”. this has an immediate verifier (mostly visual), but also further questions: “can i sit on it?”, “is it comfortable to sit on it?”, “would i use it when working or dining?”, “does it have a back support part and a butt support part and legs?”. a chair should support the activities of sitting and working and dining. all these can have their own immediate verifiers and further questions
we understand “is this a chair?” as clearly separate from “would the person who taught me the chair notion consider it a chair?”. it is much closer to “should the person who taught me the chair notion consider it a chair?”. it is also close to “should i consider it a chair?”
important basic point here: our dog thing is NOT a classifier. classifiers or noticing trick circuits can be attached to our dog structure but the structure is not a classifier
toy problem here: how do you pin down the notion of a proof? (how did we historically?) how do you pin down the notion of an integral? (how did we historically?) maybe study these actual examples
pinning down the notion of a proof might be a good example to study in detail. like, how does one become able to tell whether something is a good proof? a valid reasoning step? how does one start to reason validly? one reason to be interested in this is that it’s analogous to: how does one become able to tell what’s good, and come to act well? both are examples of getting some sort of normativity into a system
another example: we have a notion of truth, not just some practical thing like provability (or in a broader context supporting action well maybe). our notion of truth is separate from our notion of provability eg because we have the “axiom/principle” when talking about truth that exactly one of a sentence and its negation is provable, or alternatively/equivalently we have an inference rule of going from “P is not true” to “not-P is true”, and such a rule is just not right for provability (there are sentences such that the sentence and its negation are both not provable). by gödel’s completeness theorem, i guess a fine notion of truth, ie one which has a model, is precisely one which assigns 0⁄1 to all sentences and is coherent under proving. we operate with truth by relying on these properties, without having a decision algorithm or even a definition for truth (cf tarski’s thm).
how did we understand what an integral is?
i think we were using integrals for like two centuries before we knew how to properly define them (eg via riemann sums). how come we were pretty successful with that? like, how come we did all this cool stuff, we came to all these correct conclusions, without properly knowing what integrals are? i think the general thing that happened is that we hypothesized an object with some properties and these properties turned out to be those of a real thing, and in fact to pin it down uniquely! though of course this leaves the following important question: how did we identify this set of properties as important?
″
But continental Europe historically and China today offer some counter evidence, as they’re technologically competitive without having a comparably competent philosophical tradition.
continental europe historically seems like a clear example of high technological competence together with high philosophical competence (both measured relative to the time)
today’s US has much higher incarceration rate than today’s China
i’d guess that the incarceration rate among chinese americans is at most roughly as large as the incarceration rate in china though. [1] controlling for the two countries having different people seems important if we’re trying to assess the repressiveness of each country’s governing system. (that said: chinese americans are also richer than chinese chinese, and one would want to control for that as well, introducing a correction in the other direction)
(that said: my overall position is that it is very bad for the US to race with china)
assorted thoughts in response:
I definitely want people to think more about what AIs would think and do over a lot of reflection/development, and when more powerful. People should think more about the effects of a mind. People should think of the AGI situation as us probably having to correctly determine the future via an extremely long causal chain. [1]
I don’t think it’s weird to speak of values the way I’m speaking of values. I think people accept this sort of value-talk in other contexts. E.g. it’s common for antirealists to think of ethical truths as being determined by some ideal reflection; e.g. the notion of CEV. I think people who in some contexts use “egregious misalignment” in this “egregious misbehavior in mundane situations” sense also sometimes make inferences as if they were using “misalignment” in the sense I suggest. That said, one could want to make a distinction between reflection and development-in-general, and certainly it makes sense to distinguish between more and less endorsed forms of development. I think I was somewhat sloppy with this in my first comment.
I think it’d in principle be fine for some ideal beings to use words however. In practice, [people are stupid]/[thinking is difficult], and it’s very natural to make the inference “the AI is egregiously misaligned”
“the AI wants to egregiously misbehave in normal circumstances” and also to make the inference “the AI endorses each step of a process which leads to all humans dying” “it was egregiously misaligned”, but I think there isn’t a concept that supports both of these inferences at once (or at least I think our language should leave this as an open question). So, I mostly don’t endorse using “catastrophic/egregious/large misalignment”, and trying to say what one means in other words. I should maybe have used different words in my first comment as well. I don’t have good alternative terms to suggest atm, except saying what one means with more words. I guess I’d want more people to try spending some time thinking about the AI situation while tabooing a bunch of Constellation-speak and MIRI-speak, building up their own Entish.
- ↩︎
Some people think they can avoid this difficulty by having a first mess-AI “solve alignment” and launch some sort of aligned ASI sovereign, with the first AI not being that weird. I think that to first order one should think of this as the original AI trying to determine the future via a bottleneck. And in real life, people would plausibly just let the AI self-improve with some monitoring lol, in which case it’s not exactly a tight bottleneck. The original AI will also already be doing a lot of reflection and development. Also, there will be a long chain of causality after the ASI sovereign that needs to go right. (Also, in practice, instead of some clever scheme with boxed AIs solving alignment, we will probably just get some total mess with AIs deployed broadly, connected to the internet, plausibly just running AI labs. And there’s the AIs breaking out, and there’s fooming being fast, and there’s not having much time to be careful.)
I think that if we try to make sense of “what a current AI would do after reflecting+developing for a long time”, that thing does not involve being nice to humans. I think it’s still not nice to humans if we add the constraint “and the reflection/development process has to be basically [endorsed by the AI]/[good according to the AI]”. I think it’s pretty standard to take what you would do [after a lot of reflection + if you were more powerful] to reflect your values better than what you would do instinctively. So, if I’m right about what would happen given further (self-endorsed) development, it seems like a standard use of language (at least in alignment and in philosophy) + true to say current AIs are bad? I’d agree it is also pretty standard + [maybe true] to say “current AIs are good” in the sense that they mostly have pretty acceptable instinctive behaviors. This situation is pretty unfortunate, and maybe calls on us to start explicitly making this distinction. [1]
- ↩︎
“Catastrophic misalignment” is a bad term, in addition to the reason I already gave in my comment, also because it could mean that this AI in fact would cause a catastrophe (without human help), which I don’t think is true for current AIs. That said, I think that’s prevented by capabilities, not by alignment — I think the closest thing to a current AI which is capable of causing a catastrophe would cause a catastrophe. I guess maybe one should say “misalignment sufficient for a catastrophic outcome if choosing the future were handed to the AI”.
- ↩︎
I think current AI systems are likely catastrophically misaligned, but instead of properly arguing for it here, I want to clear the much lower bar of making the position sound much less weird than it might at first. When I imagine a person to whom this position sounds weird, I imagine them saying sth like:
“AIs are acting nicely in various contexts. They look nice in our evaluations, and they look nice to users in practice. Isn’t it unlikely that they are really evil, hiding it, waiting to strike?”
While I think it’s likely that current AI systems are catastrophically misaligned, I don’t much feel like taking a position one way or the other about the “are really evil, hiding it, waiting to strike” part. I think the hypothetical interlocutor above is making a false equivalence. When I say current AI system are “catastrophically misaligned”, what I have in mind is this:
When someone sets up some initial AI system and lets it develop a lot (ie lets it do RSI) [1] , with anything like this that could be done in practice [2] , this doesn’t go well for humans. I think that the default without strict regulation of AI development is: in the first 10 years after AGI (by which I mean AI that autonomously does conceptual research better than top humans), there will be a lot of development — like probably more development than there has been in total in all of history. [3] Like, after developing for a lot of “subjective time”, the AI systems that come out of this development process would trivially be able to replace humans with whatever other processes from some vast number of options; the negentropy/[free energy]/atoms I’m currently using could probably be used to run
processes of similar complexity/interestingness. Despite it being trivial for the AI to do this, the AI needs to not do this (or, maybe disassemble me, but at least recreate me on a computer, I guess...). In fact, the AI doesn’t just need to leave me alone, it needs to protect me from being killed by any other beings, and make sure I have a bunch of resources so I can live a long life. It’s kinda like I need to be very close to the coolest possible process to this AI, despite being “objectively” extremely boring, slow, wasteful, with “objectively” nothing to offer to the AI. This seems like a really sharp property; it feels like a measure 0 sort of thing. Preserving this forever feels especially sharp. I think it’s unlikely that this property would be upheld. I don’t think it is that reassuring if this long development process is started by AIs whose cached policies for mundane situations are pretty nice(-looking). [4]
Maybe this at least makes it seem not weird to think that current AI systems are catastrophically misaligned. It’s plausible we’re just using the same words differently, but in that case I think my use better tracks the niceness-type property that really matters. Like, it ultimately matters whether our AIs will continue to protect us
forever when everything is up to them, not whether they behave nicely in mundane interactions now. I guess the terms “catastrophic/egregious misalignment” or “a large amount of misalignment” are quite unfortunate because it’s sort of unclear if one should read them as [misalignment sufficient for things to end up being really bad] (in that case, given doomy views, even an extremely small failure to set valuing up properly constitutes catastrophic/egregious/large misalignment, and it’s plausible to me that of humans are egregiously misaligned by default, tho I’m not sure [5] ) or as [the AI wanting to behave egregiously badly in mundane circumstances]. I think that there being these two really different interpretations of the same term has caused a bunch of confused thinking by people in alignment.- ↩︎
this could be framed as asking the AI to develop a good successor; the initial setup might have some processes tasked with “solving alignment”; there might be multiple AIs involved doing different things, eg there can be monitors
- ↩︎
absent fundamental breakthroughs in alignment
- ↩︎
in practice, the only way to regulate this is by banning AGI or by having some AI(s) effectively take over the world and then self-regulate
- ↩︎
My guess is also that things will also naively be looking worse once we get to AIs that are actually able to do research autonomously, because these AIs will be less based on human imitation, they will be actually able to come up with new thinky-stuff (new words/concepts/ideas/methods etc), they will not have nice chains of thought, and they will be more trained on clearly inhuman things like doing math/coding/science/tech.
- ↩︎
it maybe also depends on what self-improvement affordances are made available to a human
Who is working on this sort of thing?
Here’s a bunch of stuff off the top of my head, in no particular order, including people who aren’t thinking much about the issue in full generality, but are addressing aspects: [1]
economics has the subfields of social choice theory and mechanism/incentive/institution design. public economics is also relevant. internalizing externalities
there’s a bunch of econ stuff on people coordinating in/as a firm
there is a lot of political philosophy/theory/science on what sorts of political institutions we ought to have. eg see here for a bunch of pointers to
contemporary thinking on sortition, or see communist proposals for how we should coordinate, or see anarcho-capitalists proposalsvarious groups are trying to get money out of politics, eg trying to get Citizens United v. FEC overturned. there are various anti-corruption groups and pro-transparency groups
there have been various attempts to establish a world government
the legal system is one of the main instruments society has for acting on its values and determining facts in specific cases. there’s a lot of work on what it should be like
i think a bunch of sociologists are studying polarization and social media echo chamber stuff
there’s a bunch of work on how to inform people / how to get people to pay attention / how to get people to believe something. eg advertising research, work on how to run propaganda campaigns, theory of journalism
there’s a bunch of work on how to make people able to understand stuff: education theory, designing curricula, teaching
there are various forecasting and (specifically) prediction market initiatives, eg metaculus and manifold
people who created and run twitter community notes, fact-checking in general
people running wikipedia
people running the alignment forum and lesswrong
work on reputation systems
metascience and in particular replication crisis stuff. people trying to improve academic publishing, peer review, academic credit assignment
the field of social epistemology. also just epistemology
there’s a bunch of work on the social and bioevolutionary development of cooperation and trust and trustworthiness. there’s psychology research and self-help stuff on developing into a trustworthy person
there’s a lot of work on how to reduce crime
probably many other directions in sociology and social theory
So, there’s a huge amount of work broadly on coordination. Maybe there should be a more systematic body of understanding here. Maybe there should be an academic field. My personal term for this is “weltgeistbehandlung”. Copying a note I wrote on this for myself:
“In a broad sense, “weltgeistbehandlung” just means improving the world. In a stricter sense, it’s about improving the more living parts over the inert parts (like, improving the academic credit assignment system, not making buildings more beautiful), the more procedural/meta parts over the more object-level parts (like, reducing dysfunction in democratic systems over reducing animal suffering). Even more strictly, it is about improving the more think-y parts of the world: about making the world more truth-tracking, about making the world generate new ideas faster when a need arises, about making decision-making more guided by the best thinking, about making it so our values are worked out more fully, about making it so our values are better heard when decisions are made.
related but clearly non-synonymous: social epistemology, metascience, incentive design, institutional design. i think tikkun olam is somewhat similar. LATER EDIT: Daniel Schmachtenberger’s The Consilience Project seems very similar
characterizing weltgeistbehandlung:
it is somewhat less a science and more an engineering discipline. it’s like medicine / medical science, but we’re healing the world-spirit
a central theme is setting up incentives, setting up hyperparameters, pushing the world toward goodness, with the heavy lifting being done by blind local mess incentives (even by stuff like greed and status-seeking), as opposed to being done by some pure correct judgments of goodness operating locally. like, if we’re setting up incentives with goodness in mind, ultimately the good stuff that happens is (to the extent that we’re successful) caused by a judgment of goodness, but this is happening indirectly. it’s about nudging a mad weltgeist subtly so it propels itself toward goodness. it’s about making goodness rewarded, comfortable, easy. it’s about making good processes/institutions/agents/etc outcompete others. it’s about preserving and expanding the niche/purpose of each good thing. it’s about making goodness win.
i think it should set out to be looking mostly for pareto improvements. despite being sort of about organizing our polis, it could still be kinda apolitical. that said, sometimes some groups just have to lose (eg people who explicitly want to make AIs even if they cause human extinction, eg paid lobbyists or companies effectively buying policies)
important components of weltgeistbehandlung:
coming up with general components for schemes. like patent auctions, prediction markets, accountability mechanisms
constructing particular incentive-fixing/goodness-promoting proposals
analyzing decisions between options (like, which voting scheme should we have?)
implementing these proposals (like what a doctor does)
identifying issues: like, noticing that there is a lot of lying in US business and politics, noticing that one isn’t sufficiently incentivized to provide some certain public good, noticing that academia is goodharting in various ways, etc”
- ↩︎
I’ll be taking a somewhat broad view on what counts as a “coordination failure”, as you seem to be taking.
It indeed seems deeply unnatural for a very smart AI to look at the human world from the outside, be able to replace it with
whatever, and be like: “no, i’m not going to use these atoms and this negentropy/energy for anything else — this human world that is here by default is the best thing that could be here; in fact, I will make sure it has a lot of resources to flourish in the future”. It seems [deeply unnatural]/[extremely sharp] for anyone to have values like this. I think it’s unlikely that even humanity-after-developing-correctly-for-a-million-years would think like this if it encountered another Earth with a current-humanity-level alternate humanity on it. [1]One approach to tackling this difficulty is to try to somehow make an AI that does this imo deeply unnatural thing anyway. But there is also the following alternative approach: to try to make it so there is not anyone that is judging the human world from the outside like this — i.e., that it’s just the human world judging itself. The judgment “we are cool, we have lots of cool projects going on, and we definitely should avoid killing ourselves” is very natural; in particular, it is much more natural than the judgment the AI looking at the human world from the outside needs to make. I think this alternative path
requires banning AGI.One more alternative approach (that overlaps with the previous one): one can also hope to have humans flourish for a long time without any judgment that humans are very cool directly controlling local decision-making. Instead, we can try to set up local incentives so that goodness/humanness is promoted. This way, humans might be able to flourish even in a “hot mess” world. For this, it is crucial that humans and human institutions remain useful. So, this also requires banning AGI.
- ↩︎
Indeed, human civilizations have historically not treated less developed civilizations with much kindness.
- ↩︎
It seems plausible that what you suggest is one significant contributor. Here’s one more thing that imo plausibly contributes significantly:
Most of these people are consequentialists, i.e. they think of ethics in terms of sth like designing a good spacetime block. [1] Like, when making a decision, you are making a decision as if standing outside the universe and choosing which of two spacetime blocks [2] is better. Given this view of ethics, it is very natural to imagine a future in which there actually is some guy that designs/chooses a good spacetime block, and it becomes somewhat less natural to imagine futures in which the spacetime block keeps getting “designed/chosen” in a messy way by all the messy stuff inside the spacetime block, with the designing/choosing and the being-valuable done by the same entities. A person who thinks in terms of duties or a person who thinks in terms of virtues would find it much less natural to have such a strong separation between the locus of moral-agent-hood and the locus of moral-patient-hood.
some additional recent AI x-risk things by Bernie Sanders:
it (correctly) claims to be so
it’s a bit complicated but there is a sense in which the following is true:
Taiwan does not currently claim to be a separate country from mainland China. There are currently two governing systems claiming to be the legitimate government of the entirety of China: one is in Taiwan, and the other in the mainland.
I haven’t thought a lot about this but my guess is that this approach basically can’t work because chaos is a thing so you need to determine parameters on the fly so you need to put some controllers inside
edit: oh i guess maybe you’re suggesting controlling cell division directly very precisely with optical stimulation at precise points inside the strawberry somehow? hmm. i guess you also need to control cell death very precisely
edit 2: oh also you have a major chicken and egg problem with the ovules and the surrounding structure in the parent plant right?
Aren’t we extremely confused about how one would go about making two strawberries which are identical down to the cellular level? Like, the simplest path might go through nanotech or some other pretty crazy thing? (Being able to do that probably implies it wouldn’t be much harder to mass-manufacture humans who are identical down to the cellular level?) I feel like you’re saying you basically know how to reduce it to a bunch of grad student gruntwork (or at least think someone else could) and that sounds really wild to me!
I think this doesn’t make sense capabilities-wise for solving genuinely hard scientific/technological/mathematical/philosophical problems such as the strawberry problem. (It makes sense when the big task has a basically known decomposition into a large number of small easy tasks though.) A central issue is that good high-level decisions are very important, there are very many of them, and they
need to be made with deep understanding of the domain/[design space], which the human in this setup doesn’t have by default for hard problems and which can only be gained by spending a lot of time understanding novel stuff. Like, it would be extremely silly to have a setup in which a human with no university-level math education is suggesting high-level riemann hypothesis proof strategies to Terence Tao. That human could not be contributing basically anything positive to Tao’s ability to solve the problem.Maybe the following is a key observation in this (you might have considered it already, but including it just in case):
The example we should have in mind is NOT having a research problem that takes an individual human 1000 years to solve, which has some clever decomposition into 5 problems, each of which takes 200 years to solve, with the human needing to provide the 5 problem decomposition and the AI solving the 200-year problems. This is NOT what we should have in mind because if the AI can solve 200 year problems, then we are already very close to making an AI that can autonomously solve 1000 year problems (in fact in practice for these particular numbers I expect we would be there basically instantly). Instead, for the question to be interesting, we should imagine the AI being able to solve much shorter problems, like idk 1-week problems. In that case, even if there is some reasonable decomposition into eventually 1-week tasks, it will be a really big complicated object.
One also gets a bound on the capabilities-usefulness of this scheme from the consideration that if decomposition work is easy enough for a task, then one should just be able to have an AI with a small time horizon do it as well, at least if we trust time-horizon-thinking. And so either decomposition work is easy and you could replace the human in this scheme with that AI (or better yet, just have the AI solving the subproblems also make decomposition decisions) or decomposition work is hard and it takes a long time for this AI-human system to do the task. Quantitatively, this is saying that if a task can be done by a human-AI system but not an only-AI system, then it should take at least the AI time horizon in wall clock time. I guess this conclusion would be softened if decomposition-work is outlier-hard among things with the same human time horizon, which seems plausible.
That said, one can of course get some speedup as a human researcher from asking AIs to sometimes do small tasks, I’m just doubting that this can give a huge speedup for solving hard scientific/technological/mathematical/philosophical problems without the AI being basically able to solve them autonomously.
My experience is extremely different from yours. I think almost all the non-[rat/EA] people in my life whose positions on this I know consider it plausible that an AI substantially smarter than any human will be created this century. [1] Thinking of the set of non-[rat/EA] friends/[close-ish acquaintances] I haven’t discussed this topic with yet, my guess is that more than half of them already think this and almost all of them would think this after a 2 hour conversation with me. It’s probably important that my distribution skews very high iq (maybe importantly both quant and verbal) [2] and high openness. [3]
- ↩︎
this includes e.g. the 4 family members I’ve discussed this topic with
- ↩︎
like, these are mostly people I know from the international olympiad circuit, math and physics majors from my MIT undergrad, and classmates from the best high school in Estonia
- ↩︎
Some of them deferring to me partly on the question is probably also doing some work tbh, but I think this isn’t a big enough effect to change the broad strokes conditional on getting them to consider the hypothesis at all.
- ↩︎
generally, humans are cool. in fact probably all current humans are intrinsically cool. a few are suffering very badly and say they would rather not exist, and in some cases their lives have been net negative so far. we should try to help these people. some humans are doing bad things to other humans and that’s not cool. some humans are sufficiently bad to others that it would have been better if they were never born. such humans should be rehabilitated and/or contained, and conditions should be maintained/created in which this is disincentivized
not group specific in principle, but human life is pro tanto strongly cooler. but eg a mind uploaded human society would still be cool. continuing human life is very important. deep friendships with aliens should not be ruled out in principle, but should be approached with great caution. any claim that we should already care deeply about the possible lives of some not-specifically-chosen aliens that we might create, that we haven’t yet created, and so that we have great reason to create them, is prima facie very unlikely. this universe probably only has negentropy for so many beings (if you try to dovetail all possible lives, you won’t even get to running any human for a single step); we should think extremely carefully about which ones we create and befriend
i agree these are problems that would need to be handled on the human path