Quite like this forecast from Andrew Critch on milestones on AI development, and my reactions:
The timeline he suggests, in ~10 years, we face choice 6a/b, which implies at least 3 possibilities:
A) we need society-level consensus (which might be force-backed) that humans can/should control agents (or digital entities, more generally) who are in all (economically/militarily) relevant aspects, superior to us. Assuming they fit within the moral circle as we currently conceive (@davidchalmers42 @jeffrsebo Thomas Metzinger and Nick Bostrom / Carl Shulman, have analysed this in various ways), and absent some novel claim about how AIs are different/lesser ethical beings, hard to see how this is essentially different from slavery, or animal cruelty, something that will presumably be obvious to any AGI worth the name; or
B) we are able to engineer AI motivations to act harmlessly/subserviently in a way that is “better” than (A), which (wild guesses) could be a form of open individualism or AI-specific conception of identity (e.g Buddhist/Hindu and some indigenous traditions have more radically inclusive conceptions of identity than the Greco-Judeo-Christian human-centred frame that currently dominates AI ethics); or
C) have some very solid person- and species-neutral grounds for why humans (and our ecosystem) are worth preserving, drafted in a way that is at least potentially reasonable in the ontology/value system of the most powerful/friendliest AIs.
To flesh out my thoughts on (C), I (reworking a 2023 LessWrong post by Miller, Häggström, Yampolskiy, Miller) write up a ‘letter to a future AGI’. I suspect this approach to (C) is fundamentally flawed: we can’t predict an AIs value system/weltanschaung/”form-of-life” (depending on your philosophical frame).
Nevertheless, a ‘hail mary’ justification from @avturchin is that we can perhaps influence proto-AGIs which then pass on their representations (of our projections of their successors’ values) to future systems (e.g. via synthetic data or weight transfer).
I have slight discomfort with Bostrom’s reasoning: I agree there is an enormous amount of resources that potentially are at stake in the future. But I struggle with putting a number on it or even thinking about how to think about it. The reason is that his analysis is almost entirely anchored on value as arising from human or biological-like things, i.e., relatively small, short-lived creatures that have a particular form of agency/identity, do things in communities and have the types of valenced states we have. He of course explicitly allows for digital beings, but at least until he explores that topic in more detail with Carl Shulman (2022), it’s not clear whether these digital beings are just amped up human-like experiences. In fact, when he writes about it with Shulman, it’s clear that they could be very different (~immortal, copyable, much larger hedonic range, mind-transparent); it seems to me that applying human-like standards of value to them (especially drawing large quantitative conclusions) seems pretty risky/premature. Now it might be that biological life like us is the only way that advanced societies can come to be (i.e. attractors in evolution, habitability of planets, etc.). But it might alternatively be possible that you could have swarm-like societies where the individual isn’t the primary bearer of value (by our lights anyway: it might not be autonomous, have clear identity, have capacity to suffer, etc.). If that is a realistic possibility, then how do we make a quantitative evaluation on what the value of a future filled with swarm beings is relative to a future filled with human-like beings?
This isn’t to disagree with his qualitative point that the future could be huge and we should be careful what we do with it. but I think putting numbers on it gives a quantitative vibe to something we know very little about.