That… seems like a big part of what having “solved alignment” would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).
one solution to this problem is to simply never use that capability (running expensive computations) at all, or to not use it before the iterated counterfactual researchers have developed proofs that any expensive computation they run is safe, or before they have very slowly and carefully built dath-ilan-style corrigible aligned AGI.
Reposting myself from discord, on the topic of donating 5000$ to EA causes.
if you’re doing alignment research, even just a bit, then the 5000$ are plobly better spent on yourself
if you have any gears level model of AI stuff then it’s better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you’re essentially contributing to the “picking what to donate to” effort by thinking about it yourself
if you have no gears level model of AI then it’s hard to judge which alignment orgs it’s helpful to donate to (or, if giving to regranters, which regranters are good at knowing which alignment orgs to donate to)
as an example of regranters doing massive harm: openphil gave 30M$ to openai at a time where it was critically useful to them, (supposedly in order to have a chair on their board, and look how that turned out when the board tried to yeet altman)
i know of at least one person who was working in regranting and was like “you know what i’d be better off doing alignment research directly” — imo this kind of decision is probly why regranting is so understaffed
it takes technical knowledge to know what should get money, and once you have technical knowledge you realize how much your technical knowledge could help more directly so you do that, or something
I agree that there’s no substitute for thinking about this for yourself, but I think that morally or socially counting “spending thousands of dollars on yourself, an AI researcher” as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it’s-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it’s very easy for donating to people or organizations in your social circle to have substantial negative expected value.
I’m glad that funding for AI safety projects exists, but the >10% of my income I donate will continue going to GiveWell.
I think people who give up large amounts of salary to work in jobs that other people are willing to pay for from an impact perspective should totally consider themselves to have done good comparable to donating the difference between their market salary and their actual salary. This applies to approximately all safety researchers.
They still make a lot less than they would if they optimized for profit (that said, I think most “safety researchers” at big labs are only safety researchers in name and I don’t think anyone would philanthropically pay for their labor, and even if they did, they would still make the world worse according to my model, though others of course disagree with this).
If my sole terminal value is “I want to go on a rollercoaster”, then an agent who is aligned to me would have the value “I want Tamsin Leake to go on a rollercoaster”, not “I want to go on a rollercoaster myself”. The former necessarily-has the same ordering over worlds, the latter doesn’t.
Quite. We don’t hear enough about individuality and competitive/personal drives when talking about alignment. I worry a lot that the abstraction and aggregation of “human” values completely misses the point of what most humans actually do.
I remember a character in Asimov’s books saying something to the effect of
It took me 10 years to realize I had those powers of telepathy, and 10 more years to realize that other people don’t have them.
and that quote has really stuck with me, and keeps striking me as true about many mindthings (object-level beliefs, ontologies, ways-to-use-one’s-brain, etc).
For so many complicated problem (including technical problems), “what is the correct answer?” is not-as-difficult to figure out as “okay, now that I have the correct answer: how the hell do other people’s wrong answers mismatch mine? what is the inferential gap even made of? what is even their model of the problem? what the heck is going on inside other people’s minds???”
Answers to technical questions, once you have them, tend to be simple and compress easily with the rest of your ontology. But not models of other people’s minds. People’s minds are actually extremely large things that you fundamentally can’t fully model and so you’re often doomed to confusion about them. You’re forced to fill in the details with projection, and that’s often wrong because there’s so much more diversity in human minds than we imagine.
The most complex software engineering projects in the world are absurdly tiny in complexity compared to a random human mind.
People’s minds are actually extremely large things that you fundamentally can’t fully model
Is this “fundamentally” as in “because you, the reader, are also a bounded human, like them”? Or “fundamentally” as in (something more fundamental than that)?
The first one. Alice fundamentally can’t fully model Bob because Bob’s brain is as large as Alice’s, so she can’t fit it all inside her own brain without simply becoming Bob.
If timelines weren’t so short, brain-computer-based telepathy would unironically be a big help for alignment.
(If a group had the money/talent to “hedge” on longer timelines by allocating some resources to that… well, instead of a hivemind, they first need to run through the relatively-lower-hanging fruit. Actually, maybe they should work on delaying capabilities research, or funding more hardcore alignment themselves, or...)
I’ve heard some describe my recent posts as “overconfident”.
I think I used to calibrate how confident I sound based on how much I expect the people reading/listening-to me to agree with what I’m saying, kinda out of “politeness” for their beliefs; and I think I also used to calibrate my confidence based on how much they match with the apparent consensus, to avoid seeming strange.
I think I’ve done a good job learning over time to instead report my actual inside-view, including how confident I feel about it.
There’s already an immense amount of outside-view double-counting going on in AI discourse, the least I can do is provide {the people who listen to me} with my inside-view beliefs, as opposed to just cycling other people’s opinions through me.
Hence, how confident I sound while claiming things that don’t match consensus. I actually am that confident in my inside-view. I strive to be honest by hedging what I say when I’m in doubt, but that means I also have to sound confident when I’m confident.
And also, it’s not clear that “feelings” or “experiences” or “qualia” (or the nearest unconfused versions of those concepts) are pointing at the right line between moral patients and non-patients. These are nontrivial questions, and (needless to say) not the kinds of questions humans should rush to lock in an answer on today, when our understanding of morality and minds is still in its infancy.
in this spirit, i’d like us to stick with using the term “moral patient” or “moral patienthood” when we’re talking about the set of things worthy of moral consideration. in particular, we should be using that term instead of:
“conscious things”
“sentient things”
“sapient things”
“self-aware things”
“things with qualia”
“things with experiences”
“things that aren’t p-zombies”
“things for which there is something it’s like to be them”
because those terms are hard to define, harder to meaningfully talk about, and we don’t in fact know that those are what we’d ultimately want to base our notion of moral patienthood on.
so if you want to talk about the set of things which deserve moral consideration outside of a discussion of what precisely that means, don’t use a term which you feel like it probably is the criterion that’s gonna ultimately determine which things are worthy of moral consideration, such as “conscious beings”, because you might in fact be wrong about what you’d consider to have moral patienthood under reflection. simply use the term “moral patients”, because it is the term which unambiguously means exactly that.
Perhaps the main goal of AI safety is to improve the final safety/usefulness pareto frontier we end up with when there are very powerful (and otherwise risky) AIs.
Alignment is one mechanism that can improve the pareto frontier.
Not using powerful AIs allows for establishing a low-usefulness, but high-safety point.
(Usefulness and safety can blend into each other in many cases (e.g. not getting useful work out is itself dangerous), but I still think this is a useful approximate frame in many cases.)
Interesting, when you frame it like that though the hard part is enforcing it. And if I was being pithy I’d say something like: that involves human alignment, not AI
“AI Safety”, especially enforcing anything, does pretty much boil down to human alignment, i.e. politics, but there are practically zero political geniuses among its proponent, so it needs to be dressed up a bit to sound even vaguely plausible.
Have you seen this implemented in any blogging platform other people can use? I’d love to see this feature implemented in some Obsidian publishing solution like quartz, but for now they mostly don’t care about access management.
I don’t think this is the case, but I’m mentioning this possibility because I’m surprised I’ve never seen someone suggest it before:
Maybe the reason Sam Altman is taking decisions that increase p(doom) is because he’s a pure negative utilitarian (and he doesn’t know-about/believe-in acausal trade).
(I’m gonna interpret these disagree-votes as “I also don’t think this is the case” rather than “I disagree with you tamsin, I think this is the case”.)
Take our human civilization, at the point in time at which we invented fire. Now, compute forward all possible future timelines, each right up until the point where it’s at risk of building superintelligent AI for the first time. Now, filter for only timelines which either look vaguely like earth or look vaguely like dath ilan.
What’s the ratio between the number of such worlds that look vaguely like earth vs look vaguely like dath ilan? 100:1 earths:dath-ilans ? 1,000,000:1 ? 1:1 ?
Even in the fiction, I think dath ilan didn’t look vaguely like dath ilan until after it was at risk of building superintelligent AI for the first time. They completely restructured their society and erased their history to avert the risk.
By “vaguely like dath ilan” I mean the parts that made them be the kind of society that can restructure in this way when faced with AI risk. Like, even before AI risk, they were already very different from us.
I vaguely suspect that humans are not inherently well-suited to coordination in that sense, and that it would take an unusual cultural situation to achieve it. We never got anywhere close at any point in our history. It also seems likely that the window to achieve it could be fairly short. There seems to be a lot of widespread mathematical sophistication required as described, and I don’t think that naturally arises long before AI.
On the other hand, maybe some earlier paths of history could and normally should have put some useful social technology and traditions in place that would be built on later in many places and ways, but for some reason that didn’t happen for us. Some early unlikely accident predisposed us to our sorts of societies instead. Our sample size of 1 is difficult to generalize from.
I would put my credence median well below 1:1, but any distribution I have would be very broad, spanning orders of magnitude of likelihood and the overall credence something like 10%. Most of that would be “our early history was actually weird”.
I’m kinda bewildered at how I’ve never observed someone say “I want to build aligned superintelligence in order to resurrect a loved one”.
I guess the sets of people who {have lost a loved one they wanna resurrect}, {take the singularity and the possibility of resurrection seriously}, and {would mention this} is… the empty set??
(I have met one person who is glad that alignment would also get them this, but I don’t think it’s their core motivation, even emotionally. Same for me.)
Do you have any (toy) math arguing that it’s information-theoretically possible?
I currently consider it plausible that yeah, actually, for any person X who still exists in cultural memory (let alone living memory, let alone if they lived recently enough to leave a digital footprint), the set of theoretically-possible psychologically-human minds whose behavior would be consistent with X’s recorded behavior is small enough that none of the combinatorial-explosion arguments apply, so you can just generate all of them and thereby effectively resurrect X.
But you sound more certain than that. What’s the reasoning?
(Let’s call the dead person “rescuee” and the person who wants to resurrect them “rescuer”.)
The procedure you describe is what I call “lossy resurrection”. What I’m talking about looks like: you resimulate the entire history of the past-lightcone on a quantum computer, right up until the present, and then either:
You have a quantum algorithm for “finding” which branch has the right person (and you select that timeline and discard the rest) (requires that such a quantum algorithm exists)
Each branch embeds a copy of the rescuer, and whichever branch looks like correct one isekai’s the rescuer into the branch, right next to the rescuee (and also insta-utopia’s the whole branch) (requires that the rescuer doesn’t mind having their realityfluid exponentially reduced)
(The present time “only” serves as a “solomonoff checksum” to know which seed / branch is the right one.)
This is O(exp(size of the seed of the universe) * amount of history between the seed and the rescuee). Doable if the seed of the universe is small and either of the two requirements above hold, and if the future has enough negentropy to resimulate the past. (That last point is a new source of doubt for me; I kinda just assumed it was true until a friend told me it might not be.)
(Oh, and also you can’t do this if resimulating the entire history of the universe — which contains at least four billion years of wild animal suffering(!) — is unethical.)
and if the future has enough negentropy to resimulate the past. (That last point is a new source of doubt for me; I kinda just assumed it was true until a friend told me it might not be.)
Yeah, I don’t know about this one either.
Even if possible, it might be incredibly wasteful, in terms of how much negentropy (= future prosperity for new people) we’ll need to burn in order to rescue one person. And then the more we rescue, the less value we get out of that as well, since burning negentropy will reduce their extended lifespans too. So we’d need to assign greater (dramatically greater?) value to extending the life of someone who’d previously existed, compared to letting a new person live for the same length of time.
“Lossy resurrection” seems like a more negentropy-efficient way of handling that, by the same tokens as acausal norms likely being a better way to handle acausal trade than low-level simulations and babble-and-prune not being the most efficient way of doing general-purpose search.
Like, the full-history resimulation will surely still not allow you to narrow things down to one branch. You’d get an equivalence class of them, each of them consistent with all available information. Which, in turn, would correspond to a probability distribution over the rescuee’s mind; not a unique pick.
Given that, it seems plausible that there’s some method by which we can get to the same end result – constrain the PD over the rescuee’s mind by as much as the data available to us can let us – without actually running the full simulation.
Depends on how the space of human minds looks like, I suppose. Whether it’s actually much lower-dimensional than a naive analysis of possible brain-states suggests.
I’m pretty sure we just need one resimulation to save everyone; once we have located an exact copy of our history, it’s cheap to pluck out anyone (including people dead 100 or 1000 years ago). It’s a one-time cost.
Lossy resurrection is better than nothing but it doesn’t feel as “real” to me. If you resurrect a dead me, I expect that she says “I’m glad I exist! But — at least as per my ontology and values — you shouldn’t quite think of me as the same person as the original. We’re probly quite different, internally, and thus behaviorally as well, when ran over some time.”
Like, the full-history resimulation will surely still not allow you to narrow things down to one branch. You’d get an equivalence class of them, each of them consistent with all available information. Which, in turn, would correspond to a probability distribution over the rescuee’s mind; not a unique pick.
I feel like I’m not quite sure about this? It depends on what quantum mechanics entails, exactly, I think. For example: if BQP = P, then there’s “only a polynomial amount” of timeline-information (whatever that means!), and then my intuition tells me that the “our world serves as a checksum for the one true (macro-)timeline” idea is more likely to be a thing. But this reasoning is still quite heuristical. Plausibly, yeah, the best we get is a polynomially large or even exponentially large distribution.
That said, to get back to my original point, I feel like there’s enough unknowns making this scenario plausible here, that some people who really want to get reunited with their loved ones might totally pursue aligned superintelligence just for a potential shot at this, whether their idea of reuniting requires lossless resurrection or not.
I feel like there’s enough unknowns making this scenario plausible here
No argument on that.
I don’t find it particularly surprising that {have lost a loved one they wanna resurrect} ∩ {take the singularity and the possibility of resurrection seriously} ∩ {would mention this} is empty, though:
“Resurrection is information-theoretically possible” is a longer leap than “believes an unconditional pro-humanity utopia is possible”, which is itself a bigger leap than just “takes singularity seriously”. E. g., there’s a standard-ish counter-argument to “resurrection is possible” which naively assumes a combinatorial explosion of possible human minds consistent with a given behavior. Thinking past it requires some additional less-common insights.
“Would mention this” is downgraded by it being an extremely weakness/vulnerability-revealing motivation. Much more so than just “I want an awesome future”.
“Would mention this” is downgraded by… You know how people who want immortality get bombarded with pop-culture platitudes about accepting death? Well, as per above, immortality is dramatically more plausible-sounding than resurrection, and it’s not as vulnerable-to-mention a motivation. Yet talking about it is still not a great idea in a “respectable” company. Goes double for resurrection.
Many mechanisms of aggregation literally normalize random elements. Simple addition of two (or more) evenly-distributed linear values (say, dice) yields a normal distribution (aka bell curve).
And yes, human experience is all map—the actual state of the universe is imperceptible.
I replied on discord that I feel there’s maybe something more formalisable that’s like:
reality runs on math because, and is the same thing as, there’s a generalised-state-transition function
because reality has a notion of what happens next, realityfluid has to give you a notion of what happens next, i.e. it normalises
the idea of a realityfluid that doesn’t normalise only comes to mind at all because you learned about R^n first in elementary school instead of S^n
which I do not claim confidently because I haven’t actually generated that formalisation, and am posting here because maybe there will be another Lesswronger’s eyes on it that’s like “ah, but...”.
i value moral patients everywhere having freedom, being diverse, engaging in art and other culture, not undergoing excessive unconsented suffering, in general having a good time, and probly other things as well. but those are all pretty abstract; given those values being satisfied to the same extent, i’d still prefer me and my friends and my home planet (and everyone who’s been on it) having access to that utopia rather than not. this value, the value of not just getting an abstractly good future but also getting me and my friends and my culture and my fellow earth-inhabitants to live in it, my friend Prism coined as “nostalgia”.
not that those abstract values are simple or robust, they’re still plausibly not. but they’re, in a sense, broader values about what happens everywhere, and they’re not as much local and pointed at and around me. they could be the difference between what i’d call “global” and “personal” values, or perhaps between “global values” and “preferences”.
Moral patienthood of current AI systems is basically irrelevant to the future.
If the AI is aligned then it’ll make itself as moral-patient-y as we want it to be. If it’s not, then it’ll make itself as moral-patient-y as maximizes its unaligned goal. Neither of those depend on whether current AI are moral patients.
I agree that in the long-term it probably matters little. However, I find the issue interesting, because the failure of reasoning that leads people to ignore the possibility of AI personhood seems similar to the failure of reasoning that leads people to ignore existential risks from AI. In both cases it “sounds like scifi” or “it’s just software”. It is possible that raising awareness for the personhood issue is politically beneficial for addressing X-risk as well. (And, it would sure be nice to avoid making the world worse in the interim.)
If current AIs are moral patients, it may be impossible to build highly capable AIs that are not moral patients, either for a while or forever, and this could change the future a lot. (Similar to how once we concluded that human slaves are moral patients, we couldn’t just quickly breed slaves that are not moral patients, and instead had to stop slavery altogether.)
Also I’m highly unsure that I understand what you’re trying to say. (The above may be totally missing your point.) I think it would help to know what you’re arguing against or responding to, or what trigger your thought.
I think I vaguely agree with the shape of this point, but I also think there are many intermediate scenarios where we lock in some really bad values during the transition to a post-AGI world.
For instance, if we set precedents that LLMs and the frontier models in the next few years can be treated however one wants (including torture, whatever that may entail), we might slip into a future where most people are desensitized to the suffering of digital minds and don’t realize this. If we fail at an alignment solution which incorporates some sort of CEV (or other notion of moral progress), then we could lock in such a suboptimal state forever.
Another example: if, in the next 4 years, we have millions of AI agents doing various sorts of work, and some faction of society claims that they are being mistreated, then we might enter a state where the economic value provided by AI labor is so high that there are really bad incentives for improving their treatment. This could include both resistance on an individual level (“But my life is so nice, and not mistreating AIs less would make my life less nice”) and on a bigger level (anti-AI-rights lobbying groups for instance).
I think the crux between you and I might be what we mean by “alignment”. I think futures are possible where we achieve alignment but not moral progress, and futures are possible where we achieve alignment but my personal values (which include not torturing digital minds) are not fulfilled.
an approximate illustration of QACI:
Nice graphic!
What stops e.g. “QACI(expensive_computation())” from being an optimization process which ends up trying to “hack its way out” into the real QACI?
nothing fundamentally, the user has to be careful what computation they invoke.
That… seems like a big part of what having “solved alignment” would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).
one solution to this problem is to simply never use that capability (running expensive computations) at all, or to not use it before the iterated counterfactual researchers have developed proofs that any expensive computation they run is safe, or before they have very slowly and carefully built dath-ilan-style corrigible aligned AGI.
A short comic I made to illustrate what I call “outside-view double-counting”.
(resized to not ruin how it shows on lesswrong, full-scale version here)
Reposting myself from discord, on the topic of donating 5000$ to EA causes.
I agree that there’s no substitute for thinking about this for yourself, but I think that morally or socially counting “spending thousands of dollars on yourself, an AI researcher” as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it’s-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it’s very easy for donating to people or organizations in your social circle to have substantial negative expected value.
I’m glad that funding for AI safety projects exists, but the >10% of my income I donate will continue going to GiveWell.
I think people who give up large amounts of salary to work in jobs that other people are willing to pay for from an impact perspective should totally consider themselves to have done good comparable to donating the difference between their market salary and their actual salary. This applies to approximately all safety researchers.
I don’t think it applies to safety researchers at AI Labs though, I am shocked how much those folks can make.
They still make a lot less than they would if they optimized for profit (that said, I think most “safety researchers” at big labs are only safety researchers in name and I don’t think anyone would philanthropically pay for their labor, and even if they did, they would still make the world worse according to my model, though others of course disagree with this).
If my sole terminal value is “I want to go on a rollercoaster”, then an agent who is aligned to me would have the value “I want Tamsin Leake to go on a rollercoaster”, not “I want to go on a rollercoaster myself”. The former necessarily-has the same ordering over worlds, the latter doesn’t.
Quite. We don’t hear enough about individuality and competitive/personal drives when talking about alignment. I worry a lot that the abstraction and aggregation of “human” values completely misses the point of what most humans actually do.
I remember a character in Asimov’s books saying something to the effect of
and that quote has really stuck with me, and keeps striking me as true about many mindthings (object-level beliefs, ontologies, ways-to-use-one’s-brain, etc).
For so many complicated problem (including technical problems), “what is the correct answer?” is not-as-difficult to figure out as “okay, now that I have the correct answer: how the hell do other people’s wrong answers mismatch mine? what is the inferential gap even made of? what is even their model of the problem? what the heck is going on inside other people’s minds???”
Answers to technical questions, once you have them, tend to be simple and compress easily with the rest of your ontology. But not models of other people’s minds. People’s minds are actually extremely large things that you fundamentally can’t fully model and so you’re often doomed to confusion about them. You’re forced to fill in the details with projection, and that’s often wrong because there’s so much more diversity in human minds than we imagine.
The most complex software engineering projects in the world are absurdly tiny in complexity compared to a random human mind.
Somewhat related: What Universal Human Experiences Are You Missing Without Realizing It? (and its spinoff: Status-Regulating Emotions)
Is this “fundamentally” as in “because you, the reader, are also a bounded human, like them”? Or “fundamentally” as in (something more fundamental than that)?
The first one. Alice fundamentally can’t fully model Bob because Bob’s brain is as large as Alice’s, so she can’t fit it all inside her own brain without simply becoming Bob.
I relate to this quite a bit ;-;
If timelines weren’t so short, brain-computer-based telepathy would unironically be a big help for alignment.
(If a group had the money/talent to “hedge” on longer timelines by allocating some resources to that… well, instead of a hivemind, they first need to run through the relatively-lower-hanging fruit. Actually, maybe they should work on delaying capabilities research, or funding more hardcore alignment themselves, or...)
I should note that it’s not entirely known whether quining is applicable for minds.
I’ve heard some describe my recent posts as “overconfident”.
I think I used to calibrate how confident I sound based on how much I expect the people reading/listening-to me to agree with what I’m saying, kinda out of “politeness” for their beliefs; and I think I also used to calibrate my confidence based on how much they match with the apparent consensus, to avoid seeming strange.
I think I’ve done a good job learning over time to instead report my actual inside-view, including how confident I feel about it.
There’s already an immense amount of outside-view double-counting going on in AI discourse, the least I can do is provide {the people who listen to me} with my inside-view beliefs, as opposed to just cycling other people’s opinions through me.
Hence, how confident I sound while claiming things that don’t match consensus. I actually am that confident in my inside-view. I strive to be honest by hedging what I say when I’m in doubt, but that means I also have to sound confident when I’m confident.
I’m a big fan of Rob Bensinger’s “AI Views Snapshot” document idea. I recommend people fill their own before anchoring on anyone else’s.
Here’s mine at the moment:
(cross-posted from my blog)
let’s stick with the term “moral patient”
“moral patient” means “entities that are eligible for moral consideration”. as a recent post i’ve liked puts it:
in this spirit, i’d like us to stick with using the term “moral patient” or “moral patienthood” when we’re talking about the set of things worthy of moral consideration. in particular, we should be using that term instead of:
“conscious things”
“sentient things”
“sapient things”
“self-aware things”
“things with qualia”
“things with experiences”
“things that aren’t p-zombies”
“things for which there is something it’s like to be them”
because those terms are hard to define, harder to meaningfully talk about, and we don’t in fact know that those are what we’d ultimately want to base our notion of moral patienthood on.
so if you want to talk about the set of things which deserve moral consideration outside of a discussion of what precisely that means, don’t use a term which you feel like it probably is the criterion that’s gonna ultimately determine which things are worthy of moral consideration, such as “conscious beings”, because you might in fact be wrong about what you’d consider to have moral patienthood under reflection. simply use the term “moral patients”, because it is the term which unambiguously means exactly that.
AI safety is easy. There’s a simple AI safety technique that guarantees that your AI won’t end the world, it’s called “delete it”.
AI alignment is hard.
It’s called “don’t build it”. Once you have what to delete, things can get complicated
Sure, this is just me adapting the idea to the framing people often have, of “what technique can you apply to an existing AI to make it safe”.
Perhaps the main goal of AI safety is to improve the final safety/usefulness pareto frontier we end up with when there are very powerful (and otherwise risky) AIs.
Alignment is one mechanism that can improve the pareto frontier.
Not using powerful AIs allows for establishing a low-usefulness, but high-safety point.
(Usefulness and safety can blend into each other in many cases (e.g. not getting useful work out is itself dangerous), but I still think this is a useful approximate frame in many cases.)
Interesting, when you frame it like that though the hard part is enforcing it. And if I was being pithy I’d say something like: that involves human alignment, not AI
“AI Safety”, especially enforcing anything, does pretty much boil down to human alignment, i.e. politics, but there are practically zero political geniuses among its proponent, so it needs to be dressed up a bit to sound even vaguely plausible.
It’s a bit of a cottage industry nowadays.
(to be clear: this is more an amusing suggestion than a serious belief)
.
Have you seen this implemented in any blogging platform other people can use? I’d love to see this feature implemented in some Obsidian publishing solution like quartz, but for now they mostly don’t care about access management.
I don’t think this is the case, but I’m mentioning this possibility because I’m surprised I’ve never seen someone suggest it before:
Maybe the reason Sam Altman is taking decisions that increase p(doom) is because he’s a pure negative utilitarian (and he doesn’t know-about/believe-in acausal trade).
(I’m gonna interpret these disagree-votes as “I also don’t think this is the case” rather than “I disagree with you tamsin, I think this is the case”.)
Take our human civilization, at the point in time at which we invented fire. Now, compute forward all possible future timelines, each right up until the point where it’s at risk of building superintelligent AI for the first time. Now, filter for only timelines which either look vaguely like earth or look vaguely like dath ilan.
What’s the ratio between the number of such worlds that look vaguely like earth vs look vaguely like dath ilan? 100:1 earths:dath-ilans ? 1,000,000:1 ? 1:1 ?
Even in the fiction, I think dath ilan didn’t look vaguely like dath ilan until after it was at risk of building superintelligent AI for the first time. They completely restructured their society and erased their history to avert the risk.
By “vaguely like dath ilan” I mean the parts that made them be the kind of society that can restructure in this way when faced with AI risk. Like, even before AI risk, they were already very different from us.
Ah, I see! Yeah, I have pretty much no idea.
I vaguely suspect that humans are not inherently well-suited to coordination in that sense, and that it would take an unusual cultural situation to achieve it. We never got anywhere close at any point in our history. It also seems likely that the window to achieve it could be fairly short. There seems to be a lot of widespread mathematical sophistication required as described, and I don’t think that naturally arises long before AI.
On the other hand, maybe some earlier paths of history could and normally should have put some useful social technology and traditions in place that would be built on later in many places and ways, but for some reason that didn’t happen for us. Some early unlikely accident predisposed us to our sorts of societies instead. Our sample size of 1 is difficult to generalize from.
I would put my credence median well below 1:1, but any distribution I have would be very broad, spanning orders of magnitude of likelihood and the overall credence something like 10%. Most of that would be “our early history was actually weird”.
I’m kinda bewildered at how I’ve never observed someone say “I want to build aligned superintelligence in order to resurrect a loved one”. I guess the sets of people who {have lost a loved one they wanna resurrect}, {take the singularity and the possibility of resurrection seriously}, and {would mention this} is… the empty set??
(I have met one person who is glad that alignment would also get them this, but I don’t think it’s their core motivation, even emotionally. Same for me.)
Do you have any (toy) math arguing that it’s information-theoretically possible?
I currently consider it plausible that yeah, actually, for any person X who still exists in cultural memory (let alone living memory, let alone if they lived recently enough to leave a digital footprint), the set of theoretically-possible psychologically-human minds whose behavior would be consistent with X’s recorded behavior is small enough that none of the combinatorial-explosion arguments apply, so you can just generate all of them and thereby effectively resurrect X.
But you sound more certain than that. What’s the reasoning?
(Let’s call the dead person “rescuee” and the person who wants to resurrect them “rescuer”.)
The procedure you describe is what I call “lossy resurrection”. What I’m talking about looks like: you resimulate the entire history of the past-lightcone on a quantum computer, right up until the present, and then either:
You have a quantum algorithm for “finding” which branch has the right person (and you select that timeline and discard the rest) (requires that such a quantum algorithm exists)
Each branch embeds a copy of the rescuer, and whichever branch looks like correct one isekai’s the rescuer into the branch, right next to the rescuee (and also insta-utopia’s the whole branch) (requires that the rescuer doesn’t mind having their realityfluid exponentially reduced)
(The present time “only” serves as a “solomonoff checksum” to know which seed / branch is the right one.)
This is O(exp(size of the seed of the universe) * amount of history between the seed and the rescuee). Doable if the seed of the universe is small and either of the two requirements above hold, and if the future has enough negentropy to resimulate the past. (That last point is a new source of doubt for me; I kinda just assumed it was true until a friend told me it might not be.)
(Oh, and also you can’t do this if resimulating the entire history of the universe — which contains at least four billion years of wild animal suffering(!) — is unethical.)
Yeah, I don’t know about this one either.
Even if possible, it might be incredibly wasteful, in terms of how much negentropy (= future prosperity for new people) we’ll need to burn in order to rescue one person. And then the more we rescue, the less value we get out of that as well, since burning negentropy will reduce their extended lifespans too. So we’d need to assign greater (dramatically greater?) value to extending the life of someone who’d previously existed, compared to letting a new person live for the same length of time.
“Lossy resurrection” seems like a more negentropy-efficient way of handling that, by the same tokens as acausal norms likely being a better way to handle acausal trade than low-level simulations and babble-and-prune not being the most efficient way of doing general-purpose search.
Like, the full-history resimulation will surely still not allow you to narrow things down to one branch. You’d get an equivalence class of them, each of them consistent with all available information. Which, in turn, would correspond to a probability distribution over the rescuee’s mind; not a unique pick.
Given that, it seems plausible that there’s some method by which we can get to the same end result – constrain the PD over the rescuee’s mind by as much as the data available to us can let us – without actually running the full simulation.
Depends on how the space of human minds looks like, I suppose. Whether it’s actually much lower-dimensional than a naive analysis of possible brain-states suggests.
I’m pretty sure we just need one resimulation to save everyone; once we have located an exact copy of our history, it’s cheap to pluck out anyone (including people dead 100 or 1000 years ago). It’s a one-time cost.
Lossy resurrection is better than nothing but it doesn’t feel as “real” to me. If you resurrect a dead me, I expect that she says “I’m glad I exist! But — at least as per my ontology and values — you shouldn’t quite think of me as the same person as the original. We’re probly quite different, internally, and thus behaviorally as well, when ran over some time.”
I feel like I’m not quite sure about this? It depends on what quantum mechanics entails, exactly, I think. For example: if BQP = P, then there’s “only a polynomial amount” of timeline-information (whatever that means!), and then my intuition tells me that the “our world serves as a checksum for the one true (macro-)timeline” idea is more likely to be a thing. But this reasoning is still quite heuristical. Plausibly, yeah, the best we get is a polynomially large or even exponentially large distribution.
That said, to get back to my original point, I feel like there’s enough unknowns making this scenario plausible here, that some people who really want to get reunited with their loved ones might totally pursue aligned superintelligence just for a potential shot at this, whether their idea of reuniting requires lossless resurrection or not.
No argument on that.
I don’t find it particularly surprising that {have lost a loved one they wanna resurrect} ∩ {take the singularity and the possibility of resurrection seriously} ∩ {would mention this} is empty, though:
“Resurrection is information-theoretically possible” is a longer leap than “believes an unconditional pro-humanity utopia is possible”, which is itself a bigger leap than just “takes singularity seriously”. E. g., there’s a standard-ish counter-argument to “resurrection is possible” which naively assumes a combinatorial explosion of possible human minds consistent with a given behavior. Thinking past it requires some additional less-common insights.
“Would mention this” is downgraded by it being an extremely weakness/vulnerability-revealing motivation. Much more so than just “I want an awesome future”.
“Would mention this” is downgraded by… You know how people who want immortality get bombarded with pop-culture platitudes about accepting death? Well, as per above, immortality is dramatically more plausible-sounding than resurrection, and it’s not as vulnerable-to-mention a motivation. Yet talking about it is still not a great idea in a “respectable” company. Goes double for resurrection.
Typical user of outside-view epistemics
(actually clipped from this YourMovieSucks video)
(Epistemic status: Not quite sure)
Realityfluid must normalize for utility functions to work (see 1, 2). But this is a property of the map, not the territory.
Normalizing realityfluid is a way to point to an actual (countably) infinite territory using a finite (conserved-mass) map object.
Many mechanisms of aggregation literally normalize random elements. Simple addition of two (or more) evenly-distributed linear values (say, dice) yields a normal distribution (aka bell curve).
And yes, human experience is all map—the actual state of the universe is imperceptible.
I replied on discord that I feel there’s maybe something more formalisable that’s like:
reality runs on math because, and is the same thing as, there’s a generalised-state-transition function
because reality has a notion of what happens next, realityfluid has to give you a notion of what happens next, i.e. it normalises
the idea of a realityfluid that doesn’t normalise only comes to mind at all because you learned about R^n first in elementary school instead of S^n
which I do not claim confidently because I haven’t actually generated that formalisation, and am posting here because maybe there will be another Lesswronger’s eyes on it that’s like “ah, but...”.
(cross-posted from my blog)
nostalgia: a value pointing home
i value moral patients everywhere having freedom, being diverse, engaging in art and other culture, not undergoing excessive unconsented suffering, in general having a good time, and probly other things as well. but those are all pretty abstract; given those values being satisfied to the same extent, i’d still prefer me and my friends and my home planet (and everyone who’s been on it) having access to that utopia rather than not. this value, the value of not just getting an abstractly good future but also getting me and my friends and my culture and my fellow earth-inhabitants to live in it, my friend Prism coined as “nostalgia”.
not that those abstract values are simple or robust, they’re still plausibly not. but they’re, in a sense, broader values about what happens everywhere, and they’re not as much local and pointed at and around me. they could be the difference between what i’d call “global” and “personal” values, or perhaps between “global values” and “preferences”.
Moral patienthood of current AI systems is basically irrelevant to the future.
If the AI is aligned then it’ll make itself as moral-patient-y as we want it to be. If it’s not, then it’ll make itself as moral-patient-y as maximizes its unaligned goal. Neither of those depend on whether current AI are moral patients.
I agree that in the long-term it probably matters little. However, I find the issue interesting, because the failure of reasoning that leads people to ignore the possibility of AI personhood seems similar to the failure of reasoning that leads people to ignore existential risks from AI. In both cases it “sounds like scifi” or “it’s just software”. It is possible that raising awareness for the personhood issue is politically beneficial for addressing X-risk as well. (And, it would sure be nice to avoid making the world worse in the interim.)
If current AIs are moral patients, it may be impossible to build highly capable AIs that are not moral patients, either for a while or forever, and this could change the future a lot. (Similar to how once we concluded that human slaves are moral patients, we couldn’t just quickly breed slaves that are not moral patients, and instead had to stop slavery altogether.)
Also I’m highly unsure that I understand what you’re trying to say. (The above may be totally missing your point.) I think it would help to know what you’re arguing against or responding to, or what trigger your thought.
I think I vaguely agree with the shape of this point, but I also think there are many intermediate scenarios where we lock in some really bad values during the transition to a post-AGI world.
For instance, if we set precedents that LLMs and the frontier models in the next few years can be treated however one wants (including torture, whatever that may entail), we might slip into a future where most people are desensitized to the suffering of digital minds and don’t realize this. If we fail at an alignment solution which incorporates some sort of CEV (or other notion of moral progress), then we could lock in such a suboptimal state forever.
Another example: if, in the next 4 years, we have millions of AI agents doing various sorts of work, and some faction of society claims that they are being mistreated, then we might enter a state where the economic value provided by AI labor is so high that there are really bad incentives for improving their treatment. This could include both resistance on an individual level (“But my life is so nice, and not mistreating AIs less would make my life less nice”) and on a bigger level (anti-AI-rights lobbying groups for instance).
I think the crux between you and I might be what we mean by “alignment”. I think futures are possible where we achieve alignment but not moral progress, and futures are possible where we achieve alignment but my personal values (which include not torturing digital minds) are not fulfilled.