That… seems like a big part of what having “solved alignment” would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).
one solution to this problem is to simply never use that capability (running expensive computations) at all, or to not use it before the iterated counterfactual researchers have developed proofs that any expensive computation they run is safe, or before they have very slowly and carefully built dath-ilan-style corrigible aligned AGI.
Have you seen this implemented in any blogging platform other people can use? I’d love to see this feature implemented in some Obsidian publishing solution like quartz, but for now they mostly don’t care about access management.
And also, it’s not clear that “feelings” or “experiences” or “qualia” (or the nearest unconfused versions of those concepts) are pointing at the right line between moral patients and non-patients. These are nontrivial questions, and (needless to say) not the kinds of questions humans should rush to lock in an answer on today, when our understanding of morality and minds is still in its infancy.
in this spirit, i’d like us to stick with using the term “moral patient” or “moral patienthood” when we’re talking about the set of things worthy of moral consideration. in particular, we should be using that term instead of:
“conscious things”
“sentient things”
“sapient things”
“self-aware things”
“things with qualia”
“things with experiences”
“things that aren’t p-zombies”
“things for which there is something it’s like to be them”
because those terms are hard to define, harder to meaningfully talk about, and we don’t in fact know that those are what we’d ultimately want to base our notion of moral patienthood on.
so if you want to talk about the set of things which deserve moral consideration outside of a discussion of what precisely that means, don’t use a term which you feel like it probably is the criterion that’s gonna ultimately determine which things are worthy of moral consideration, such as “conscious beings”, because you might in fact be wrong about what you’d consider to have moral patienthood under reflection. simply use the term “moral patients”, because it is the term which unambiguously means exactly that.
i value moral patients everywhere having freedom, being diverse, engaging in art and other culture, not undergoing excessive unconsented suffering, in general having a good time, and probly other things as well. but those are all pretty abstract; given those values being satisfied to the same extent, i’d still prefer me and my friends and my home planet (and everyone who’s been on it) having access to that utopia rather than not. this value, the value of not just getting an abstractly good future but also getting me and my friends and my culture and my fellow earth-inhabitants to live in it, my friend Prism coined as “nostalgia”.
not that those abstract values are simple or robust, they’re still plausibly not. but they’re, in a sense, broader values about what happens everywhere, and they’re not as much local and pointed at and around me. they could be the difference between what i’d call “global” and “personal” values, or perhaps between “global values” and “preferences”.
an approximate illustration of QACI:
Nice graphic!
What stops e.g. “QACI(expensive_computation())” from being an optimization process which ends up trying to “hack its way out” into the real QACI?
nothing fundamentally, the user has to be careful what computation they invoke.
That… seems like a big part of what having “solved alignment” would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).
one solution to this problem is to simply never use that capability (running expensive computations) at all, or to not use it before the iterated counterfactual researchers have developed proofs that any expensive computation they run is safe, or before they have very slowly and carefully built dath-ilan-style corrigible aligned AGI.
i’ve previously talked about publishing infohazards, notably with regards to AI capability infohazards.
i’ve just added the feature of locked posts to my blog, and published the first such post.
Have you seen this implemented in any blogging platform other people can use? I’d love to see this feature implemented in some Obsidian publishing solution like quartz, but for now they mostly don’t care about access management.
(cross-posted from my blog)
let’s stick with the term “moral patient”
“moral patient” means “entities that are eligible for moral consideration”. as a recent post i’ve liked puts it:
in this spirit, i’d like us to stick with using the term “moral patient” or “moral patienthood” when we’re talking about the set of things worthy of moral consideration. in particular, we should be using that term instead of:
“conscious things”
“sentient things”
“sapient things”
“self-aware things”
“things with qualia”
“things with experiences”
“things that aren’t p-zombies”
“things for which there is something it’s like to be them”
because those terms are hard to define, harder to meaningfully talk about, and we don’t in fact know that those are what we’d ultimately want to base our notion of moral patienthood on.
so if you want to talk about the set of things which deserve moral consideration outside of a discussion of what precisely that means, don’t use a term which you feel like it probably is the criterion that’s gonna ultimately determine which things are worthy of moral consideration, such as “conscious beings”, because you might in fact be wrong about what you’d consider to have moral patienthood under reflection. simply use the term “moral patients”, because it is the term which unambiguously means exactly that.
(cross-posted from my blog)
nostalgia: a value pointing home
i value moral patients everywhere having freedom, being diverse, engaging in art and other culture, not undergoing excessive unconsented suffering, in general having a good time, and probly other things as well. but those are all pretty abstract; given those values being satisfied to the same extent, i’d still prefer me and my friends and my home planet (and everyone who’s been on it) having access to that utopia rather than not. this value, the value of not just getting an abstractly good future but also getting me and my friends and my culture and my fellow earth-inhabitants to live in it, my friend Prism coined as “nostalgia”.
not that those abstract values are simple or robust, they’re still plausibly not. but they’re, in a sense, broader values about what happens everywhere, and they’re not as much local and pointed at and around me. they could be the difference between what i’d call “global” and “personal” values, or perhaps between “global values” and “preferences”.