It’s a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer). An example application is robustly leaving aliens alone even if you don’t like them (without a compulsion to give them the universe), or closer to home leaving humans alone (in a sense where not stepping on them with your megaprojects is part of the concept), even if your preference doesn’t consider them particularly valuable.
This makes the alignment target something other than preference, a larger target that’s easier to hit. It’s not CEV and leaves value on the table, doesn’t make efficient use of all resources according to any particular preference. But it might suffice for establishing AGI-backed security against overeager maximizers, with aligned optimizers coming later, when there is time to design them properly.
It’s a boundary concept (element of a deontological agent design),...
What is this in reference to?
The Stanford Encyclopedia of Philosophy has no reference entry for “boundary concept” nor any string matches at all to “deontological agent” or “deontological agent design”.
It’s an element of a deontological agent design in the literal sense of being an element of a design of an agent that acts in a somewhat deontological manner, instead of being a naive consequentialist maximizer, even if the same design falls out of some acausal society norm equilibrium on consequentialist game theoretic grounds.
Here’s where I think the conversation went off the rails. :( I think what happened is M.Y.Zuo’s bullshit/woo detector went off, and they started asking pointed questions about the credentials of Critch and his ideas. Vlad and LW more generally react allergically to arguments from authority/status, so downvoted M.Y.Zuo for making this about Critch’s authority instead of about the quality of his arguments.
Personally I feel like this was all a tragic misunderstanding but I generally side with M.Y.Zuo here—I like Critch a lot as a person & I think he’s really smart, but his ideas here are far from rigorous clear argumentation as far as I can tell (I’ve read them all and still came away confused, which of course could be my fault, but still...) so I think M.Y.Zuo’s bullshit/woo detector was well-functioning.
That said, I’d advise M.Y.Zuo to instead say something like “Hmm, a brief skim of those posts leaves me confused and skeptical, and a brief google makes it seem like this is just Critch’s opinion rather than something I should trust on authority. Got any better arguments to show me? If not, cool, we can part ways in peace having different opinions.”
I appreciate the attempt at diagnosing what went wrong here. I agree this is ~where it went off the rails, and I think you are (maybe?) correctly describing what was going on from M.Y. Zou’s perspective. But this doesn’t feel like it captured what I found frustrating.
[/edit]
What feels wrong to me about this is that, for the question of:
How would one arrive at a value system that supports the latter but rejects the former?
it just doesn’t make sense to me to be that worried about either authority or rigor. I think the nonrigorous concept, generally held in society of “respect people’s boundaries/autonomy” is sufficient to answer the question, without even linking to Critch’s sequence. Critch’s sequence is a nice-to-have that sketches out a direction for how you might formalize this, but I don’t get why this level of formalization is even particularly desired here.
(Like, last I checked we don’t have any rigorous conceptions of functioning human value systems that actually work, either for respecting boundaries or aggregating utility or anything else. For purposes of this conversation this just feels like an isolated demand for rigor)
I think that there are many answers along these lines (like “I’m not talking about a whole value system, I’m talking about a deontological constraint”) which would have been fine here.
The issue was that sentences like “It’s a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer)” use the phrasing of someone pointing to a well-known, clearly-defined concept, but then only link to Critch’s high-level metaphor.
Okay, I get where you’re coming from now. Will have to mull over whether I agree but I am at least no longer feel confused about what the disagreement is about now.
Thanks, & thanks for putting in your own perspective here. I sympathize with that too; fwiw Vladimir_Nesov’s answer would have satisfied me, because I am sufficiently familiar with what the terms mean. But for someone new to those terms, they are just unexplained jargon, with links to lots of lengthy but difficult to understand writing. (I agree with Richard’s comment nearby). Like, I don’t think Vladimir did anything wrong by giving a jargon-heavy, links-heavy answer instead of saying something like “It may be hard to construct a utility function that supports the latter but rejects the former, but if instead of utility maximization we are doing something like utility-maximization-subject-to-deontological-constraints, it’s easy: just have a constraint that you shouldn’t harm sentient beings. This constraint doesn’t require you to produce more sentient beings, or squeeze existing ones into optimized shapes.” But I predict that this blowup wouldn’t have happened if he had instead said that.
I may be misinterpreting things of course, wading in here thinking I can grok what either side was thinking. Open to being corrected!
To be clear I super appreciate you stepping in and trying to see where people were coming from (I think ideally I’d have been doing a better job with that in the first place, but it was kinda hard to do so from inside the conversation)
I found Richard’s explanation about what-was-up-with-Vlad’s comment to be helpful.
Thanks for the insight. After looking into ’Vladimir_Nesov’s background I would tend to agree it was because of some issue with the phrasing of the parent comment that triggered the increasingly odd replies, instead of any substantive confusion.
At the time I gave him the benefit of the doubt in confusing what SEP is, what referencing an entry in encyclopedias mean, what I wanted to convey, etc., but considering there are 1505 seemingly coherent wiki contributions to the account’s credit since 2009, these pretty common usages should not have been difficult to understand.
To be fair, I didn’t consider his possible emotional states nor how my phrasing might be construed as being an attack on his beliefs. Perhaps I’m too used to the more formal STEM culture instead of this new culture that appears to be developing.
I think upon reflection I maybe agree that there isn’t exactly an “argument” here – I think most of what Critch is doing is saying “here is a frame of how to think about a lot of game theoretic stuff.” He doesn’t (much) argue for that frame, but he lays out how it works, shows a bunch of examples, and basically is hoping (at this point) that the examples resonate.”
(I haven’t reread the whole sequence in detail but that was actually my recollection of it last time I read it)
So, I’ll retract my particular phrasing here.
I do think that intuitively, boundaries exist, and as soon as they are pointed out as a frame that’d be good to formalize and incorporate into game/decision theory, I’m like “oh, yeah obviously.” I don’t know how much I think lawful-neutral aliens would automatically respect boundaries, but I would be highly surprised if they didn’t at least include them as a term to be considered as they developed their coordination theories.
Your original comment said “How would one arrive at a value system that supports the latter but rejects the former?”, Vlad said (paraphrased) “by invoking boundaries as a concept”. If that doesn’t make sense to you, okay, but, while I agree Critch doesn’t quite argue for the concept’s applicability, I do think he lays out a bunch of concepts and how they could relate, and this should at least be an existence proof for “it is possible to develop a theory that accomplishes the “care about allowing the continued survival of existing things without wanting to create more.” And I still don’t think it makes sense to summarize this as a “personal opinion.” It’s a framework, you can buy the framework or not.
… Vlad said (paraphrased) “by invoking boundaries as a concept”. If that doesn’t make sense to you, okay, but, while I agree Critch doesn’t quite argue for the concept’s applicability, I do think he lays out a bunch of concepts and how they could relate, and this should at least be an existence proof for “it is possible to develop a theory that accomplishes the “care about allowing the continued survival of existing things without wanting to create more.”
I appreciate the update. The actual meaning behind “invoking boundaries as a concept” is what I’m interested in, if that is the right paraphrase.
If it made intuitive sense then the question wouldn’t have been asked, so your right that the concepts could relate but the crux is that this has not been proven to any degree. Thus, I’m still inclined to consider it a personal opinion.
For the latter part, I don’t get the meaning, from what I understand there’s no such ‘should at least be an existence proof’.
Why do you need more than one description of such a value system in order to answer your original question? This isn’t about arguing the value system is ideal or that you should adopt it.
And, like, respecting boundaries is a pretty mainstream concept lots of people care about.
Why do you need more than one description of such a value system in order to answer your original question?
I don’t think I am asking for multiple descriptions of ‘such a value system’.
What value system are you referring to and where does it appear I’m asking that?
Also, I’m not quite sure how ‘respecting boundaries’ relates to this discussion, is it something to do with the idea of ‘invoking boundaries as a concept’?
Research is full of instances of having nothing to go on but the argument itself, not even a reason to consider the argument.
(Among Critch’s legible contributions is Parametric Bounded Löb, wrapping up one line of research in modalembedded agency. See also the recent paper on open source game theory institution design, which works as an introduction with grounding in the informal motivations behind the topic and its relevance to the real world.)
The work seems interesting but none of it makes an individual’s personal opinions a credible reference. If it was a group of folks with credible track records expressing a joint opinion in a conference, I’d be more willing to consider it, but literally a single individual just doesn’t make sense.
Research is full of instances of having nothing to go on but the argument itself, not even a reason to consider the argument.
I’m not sure how to parse this, the commonly accepted view is that research is based on experiments, observations, logical proofs, mathematical proofs, etc… do you not believe this?
It’s not a “credible reference” in the sense of having behind it massive evidence of being probably worthwhile to study. But I in turn find the background demand for credible references (in their absence) baffling, both in principle and given that it’s not a constraint that non-mainstream research could survive under.
I personally think it’s important to separate philosophical speculation from well-developed rigorous work, and Critch’s stuff on boundaries seems to land well in the former category.
This is a communicative norm not an epistemic norm—you’re welcome to believe whatever you like about Critch’s stuff, but when you cite it as if it’s widely-understood (across the LW community, or elsewhere) to be a credible, well-developed idea, then this undermines our ability to convey the ideas that are widely-understood to be credible.
important to separate philosophical speculation from well-developed rigorous work
Sure.
when you cite it as if it’s widely-understood (across the LW community, or elsewhere) to be credible
I don’t think I did though? My use of “reference” was merely in the sense of explaining the intended meaning of the word “boundary” I used in the top level comment, so it’s mostly about definitions and context of what I was saying. (I did assume that the reference would plausibly be understood, and I linked to a post on the topic right there in the original comment to gesture at the intended sense and context of the word. There’s also been a post on the meaning of this very word just yesterday.)
And then M. Y. Zuo started talking about credibility, which still leaves me confused about what’s going on, despite some clarifying back and forth.
And then M. Y. Zuo started talking about credibility, which still leaves me confused about what’s going on, despite some clarifying back and forth.
A reference implies some associated credibility, as in the example found in comment #4:
The Stanford Encyclopedia of Philosophy has no reference entry for “boundary concept” nor any string matches at all to “deontological agent” or “deontological agent design”.
e.g. referencing entries in an encyclopedia, usually presumed to be authoritative to some degree, which grants some credibility to what’s written regarding the topic
By the way, I’m not implying Andrew_Critch’s credibility is zero, but it’s certainly a lot lower then SEP, so much so that I think most LW readers, who likely haven’t heard of him, would sooner group his writings with random musings then SEP entries.
I’m fairly certain the widely accepted definition of ‘reference’ encompasses the idea of referencing entries in an encyclopedia. So in this case I wouldn’t trust ‘TVTropes’ at all.
I personally think it’s important to separate philosophical speculation from well-developed rigorous work
Yes, but of course Critch is the tip of a rather large iceberg. Rationalists tend to think you should familiarise yourself with a mass of ideas virtually none of which have been rigourously proven.
But I in turn find the background demand for credible references (in their absence) baffling, both in principle and given that it’s not a constraint that non-mainstream research could survive under.
The writings linked don’t exclude the possibility of ‘non-mainstream research’ having experiments, observations, logical proofs, mathematical proofs, etc...
In fact the opposite, that happens every day on the internet, including on LW at least once a week.
Huh, I would never have guessed that by looking at the karma his posts received on average. Guess that shows how misleading the karma score sometimes may be.
For example, if an account has 20 posts and 1000 post karma, that’s still only an average of 50 per post, which would indicate the account holder is not that well known.
If you were more like the person you wish to be, and you were smarter, do you think you’d still want our descendants not to optimise when needed to leave alone beings who’d prefer to be left alone? If you would still think that, why is it not CEV?
It’s probably implied by CEV. The point is that you don’t need the whole CEV to get it, it’s probably easier to get, a simpler concept and a larger alignment target that might be sufficient to at least notkilleveryone, even if in the end we lose most of the universe. Also, you gain the opportunity to work on CEV and eventually get there, even if you have many OOMs less resources to work with. It would of course be better to get CEV before building ASIs with different values or going on a long value drift trip ourselves.
I’d suggest that long-term corrigibility is a still easier target. If respecting future sentients’ preferences is the goal, why not make that the alignment target?
While boundaries are a coherent idea, imposing them in our alignment solutions would seem to very much be dictating the future rather than letting it unfold with protection from benevolent ASI.
In an easy world, boundaries are neutral, because you can set up corrigibility on the other side to eventually get aligned optimization there. The utility of boundaries is for worlds where we get values alignment or corrigibility wrong, and most of the universe eventually gets optimized in at least somewhat misaligned way.
Slight misalignment concern also makes personal boundaries in this sense an important thing to set up first, before any meaningful optimization changes people, as people are different from each other and initial optimization pressure might be less than maximally nuanced.
So it’s complementary and I suspect it’s a shard of human values that’s significantly easier to instill in this different-than-values role than either the whole thing or corrigibility towards it.
How would one arrive at a value system that supports the latter but rejects the former?
It’s a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer). An example application is robustly leaving aliens alone even if you don’t like them (without a compulsion to give them the universe), or closer to home leaving humans alone (in a sense where not stepping on them with your megaprojects is part of the concept), even if your preference doesn’t consider them particularly valuable.
This makes the alignment target something other than preference, a larger target that’s easier to hit. It’s not CEV and leaves value on the table, doesn’t make efficient use of all resources according to any particular preference. But it might suffice for establishing AGI-backed security against overeager maximizers, with aligned optimizers coming later, when there is time to design them properly.
What is this in reference to?
The Stanford Encyclopedia of Philosophy has no reference entry for “boundary concept” nor any string matches at all to “deontological agent” or “deontological agent design”.
It’s a reference to Critch’s Boundaries Sequence and related ideas, see in particular the introductory post and Acausal Normalcy.
It’s an element of a deontological agent design in the literal sense of being an element of a design of an agent that acts in a somewhat deontological manner, instead of being a naive consequentialist maximizer, even if the same design falls out of some acausal society norm equilibrium on consequentialist game theoretic grounds.
I don’t get this, it seems your exclusively referencing another LW user’s personal opinions?
I’ve never heard of this ‘Andrew_Critch’ or any of his writings before today, nor do they appear that popular, so I’m quite baffled.
Here’s where I think the conversation went off the rails. :( I think what happened is M.Y.Zuo’s bullshit/woo detector went off, and they started asking pointed questions about the credentials of Critch and his ideas. Vlad and LW more generally react allergically to arguments from authority/status, so downvoted M.Y.Zuo for making this about Critch’s authority instead of about the quality of his arguments.
Personally I feel like this was all a tragic misunderstanding but I generally side with M.Y.Zuo here—I like Critch a lot as a person & I think he’s really smart, but his ideas here are far from rigorous clear argumentation as far as I can tell (I’ve read them all and still came away confused, which of course could be my fault, but still...) so I think M.Y.Zuo’s bullshit/woo detector was well-functioning.
That said, I’d advise M.Y.Zuo to instead say something like “Hmm, a brief skim of those posts leaves me confused and skeptical, and a brief google makes it seem like this is just Critch’s opinion rather than something I should trust on authority. Got any better arguments to show me? If not, cool, we can part ways in peace having different opinions.”
[edit]
I appreciate the attempt at diagnosing what went wrong here. I agree this is ~where it went off the rails, and I think you are (maybe?) correctly describing what was going on from M.Y. Zou’s perspective. But this doesn’t feel like it captured what I found frustrating.
[/edit]
What feels wrong to me about this is that, for the question of:
it just doesn’t make sense to me to be that worried about either authority or rigor. I think the nonrigorous concept, generally held in society of “respect people’s boundaries/autonomy” is sufficient to answer the question, without even linking to Critch’s sequence. Critch’s sequence is a nice-to-have that sketches out a direction for how you might formalize this, but I don’t get why this level of formalization is even particularly desired here.
(Like, last I checked we don’t have any rigorous conceptions of functioning human value systems that actually work, either for respecting boundaries or aggregating utility or anything else. For purposes of this conversation this just feels like an isolated demand for rigor)
I think that there are many answers along these lines (like “I’m not talking about a whole value system, I’m talking about a deontological constraint”) which would have been fine here.
The issue was that sentences like “It’s a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer)” use the phrasing of someone pointing to a well-known, clearly-defined concept, but then only link to Critch’s high-level metaphor.
Okay, I get where you’re coming from now. Will have to mull over whether I agree but I am at least no longer feel confused about what the disagreement is about now.
(updated the previous comment with some clearer context-setting)
Thanks, & thanks for putting in your own perspective here. I sympathize with that too; fwiw Vladimir_Nesov’s answer would have satisfied me, because I am sufficiently familiar with what the terms mean. But for someone new to those terms, they are just unexplained jargon, with links to lots of lengthy but difficult to understand writing. (I agree with Richard’s comment nearby). Like, I don’t think Vladimir did anything wrong by giving a jargon-heavy, links-heavy answer instead of saying something like “It may be hard to construct a utility function that supports the latter but rejects the former, but if instead of utility maximization we are doing something like utility-maximization-subject-to-deontological-constraints, it’s easy: just have a constraint that you shouldn’t harm sentient beings. This constraint doesn’t require you to produce more sentient beings, or squeeze existing ones into optimized shapes.” But I predict that this blowup wouldn’t have happened if he had instead said that.
I may be misinterpreting things of course, wading in here thinking I can grok what either side was thinking. Open to being corrected!
To be clear I super appreciate you stepping in and trying to see where people were coming from (I think ideally I’d have been doing a better job with that in the first place, but it was kinda hard to do so from inside the conversation)
I found Richard’s explanation about what-was-up-with-Vlad’s comment to be helpful.
Thanks for the insight. After looking into ’Vladimir_Nesov’s background I would tend to agree it was because of some issue with the phrasing of the parent comment that triggered the increasingly odd replies, instead of any substantive confusion.
At the time I gave him the benefit of the doubt in confusing what SEP is, what referencing an entry in encyclopedias mean, what I wanted to convey, etc., but considering there are 1505 seemingly coherent wiki contributions to the account’s credit since 2009, these pretty common usages should not have been difficult to understand.
To be fair, I didn’t consider his possible emotional states nor how my phrasing might be construed as being an attack on his beliefs. Perhaps I’m too used to the more formal STEM culture instead of this new culture that appears to be developing.
I’d describe this as “Critch listed a bunch of arguments, and the arguments are compelling.”
I’m genuinely not seeing any linked or attached proofs for these arguments, whether logical, statistical, mathematical, etc.
EDIT: Can you link or quote to what you believe is a credible argument?
I think upon reflection I maybe agree that there isn’t exactly an “argument” here – I think most of what Critch is doing is saying “here is a frame of how to think about a lot of game theoretic stuff.” He doesn’t (much) argue for that frame, but he lays out how it works, shows a bunch of examples, and basically is hoping (at this point) that the examples resonate.”
(I haven’t reread the whole sequence in detail but that was actually my recollection of it last time I read it)
So, I’ll retract my particular phrasing here.
I do think that intuitively, boundaries exist, and as soon as they are pointed out as a frame that’d be good to formalize and incorporate into game/decision theory, I’m like “oh, yeah obviously.” I don’t know how much I think lawful-neutral aliens would automatically respect boundaries, but I would be highly surprised if they didn’t at least include them as a term to be considered as they developed their coordination theories.
Your original comment said “How would one arrive at a value system that supports the latter but rejects the former?”, Vlad said (paraphrased) “by invoking boundaries as a concept”. If that doesn’t make sense to you, okay, but, while I agree Critch doesn’t quite argue for the concept’s applicability, I do think he lays out a bunch of concepts and how they could relate, and this should at least be an existence proof for “it is possible to develop a theory that accomplishes the “care about allowing the continued survival of existing things without wanting to create more.” And I still don’t think it makes sense to summarize this as a “personal opinion.” It’s a framework, you can buy the framework or not.
I appreciate the update. The actual meaning behind “invoking boundaries as a concept” is what I’m interested in, if that is the right paraphrase.
If it made intuitive sense then the question wouldn’t have been asked, so your right that the concepts could relate but the crux is that this has not been proven to any degree. Thus, I’m still inclined to consider it a personal opinion.
For the latter part, I don’t get the meaning, from what I understand there’s no such ‘should at least be an existence proof’.
There’s ‘proven correct’, ‘proven incorrect’, ‘unproven’, ‘conjecture’, ‘hypothesis’, etc...
Why do you need more than one description of such a value system in order to answer your original question? This isn’t about arguing the value system is ideal or that you should adopt it.
And, like, respecting boundaries is a pretty mainstream concept lots of people care about.
I don’t think I am asking for multiple descriptions of ‘such a value system’.
What value system are you referring to and where does it appear I’m asking that?
Also, I’m not quite sure how ‘respecting boundaries’ relates to this discussion, is it something to do with the idea of ‘invoking boundaries as a concept’?
Research is full of instances of having nothing to go on but the argument itself, not even a reason to consider the argument.
(Among Critch’s legible contributions is Parametric Bounded Löb, wrapping up one line of research in modal embedded agency. See also the recent paper on open source game theory institution design, which works as an introduction with grounding in the informal motivations behind the topic and its relevance to the real world.)
The work seems interesting but none of it makes an individual’s personal opinions a credible reference. If it was a group of folks with credible track records expressing a joint opinion in a conference, I’d be more willing to consider it, but literally a single individual just doesn’t make sense.
I’m not sure how to parse this, the commonly accepted view is that research is based on experiments, observations, logical proofs, mathematical proofs, etc… do you not believe this?
It’s not a “credible reference” in the sense of having behind it massive evidence of being probably worthwhile to study. But I in turn find the background demand for credible references (in their absence) baffling, both in principle and given that it’s not a constraint that non-mainstream research could survive under.
I personally think it’s important to separate philosophical speculation from well-developed rigorous work, and Critch’s stuff on boundaries seems to land well in the former category.
This is a communicative norm not an epistemic norm—you’re welcome to believe whatever you like about Critch’s stuff, but when you cite it as if it’s widely-understood (across the LW community, or elsewhere) to be a credible, well-developed idea, then this undermines our ability to convey the ideas that are widely-understood to be credible.
Sure.
I don’t think I did though? My use of “reference” was merely in the sense of explaining the intended meaning of the word “boundary” I used in the top level comment, so it’s mostly about definitions and context of what I was saying. (I did assume that the reference would plausibly be understood, and I linked to a post on the topic right there in the original comment to gesture at the intended sense and context of the word. There’s also been a post on the meaning of this very word just yesterday.)
And then M. Y. Zuo started talking about credibility, which still leaves me confused about what’s going on, despite some clarifying back and forth.
A reference implies some associated credibility, as in the example found in comment #4:
e.g. referencing entries in an encyclopedia, usually presumed to be authoritative to some degree, which grants some credibility to what’s written regarding the topic
By the way, I’m not implying Andrew_Critch’s credibility is zero, but it’s certainly a lot lower then SEP, so much so that I think most LW readers, who likely haven’t heard of him, would sooner group his writings with random musings then SEP entries.
Hence why I was surprised.
Well, I’m pretty sure that’s not what the word means, but in any case that’s not what I meant by it, so that point isn’t relevant to any substantive disagreement, which does seem present; it’s best to taboo “reference” in this context.
It appears you linked to tvtropes.org?
I’m fairly certain the widely accepted definition of ‘reference’ encompasses the idea of referencing entries in an encyclopedia. So in this case I wouldn’t trust ‘TVTropes’ at all.
Here’s Merriam-Webster:
Yes, but of course Critch is the tip of a rather large iceberg. Rationalists tend to think you should familiarise yourself with a mass of ideas virtually none of which have been rigourously proven.
The writings linked don’t exclude the possibility of ‘non-mainstream research’ having experiments, observations, logical proofs, mathematical proofs, etc...
In fact the opposite, that happens every day on the internet, including on LW at least once a week.
Did you intend to link to something else?
Critch is a “local hero”...well known in rationalist circles.
Huh, I would never have guessed that by looking at the karma his posts received on average. Guess that shows how misleading the karma score sometimes may be.
? He has over 3000 karma.
I suggest to reread the first sentence.
For example, if an account has 20 posts and 1000 post karma, that’s still only an average of 50 per post, which would indicate the account holder is not that well known.
If you were more like the person you wish to be, and you were smarter, do you think you’d still want our descendants not to optimise when needed to leave alone beings who’d prefer to be left alone? If you would still think that, why is it not CEV?
It’s probably implied by CEV. The point is that you don’t need the whole CEV to get it, it’s probably easier to get, a simpler concept and a larger alignment target that might be sufficient to at least notkilleveryone, even if in the end we lose most of the universe. Also, you gain the opportunity to work on CEV and eventually get there, even if you have many OOMs less resources to work with. It would of course be better to get CEV before building ASIs with different values or going on a long value drift trip ourselves.
I’d suggest that long-term corrigibility is a still easier target. If respecting future sentients’ preferences is the goal, why not make that the alignment target?
While boundaries are a coherent idea, imposing them in our alignment solutions would seem to very much be dictating the future rather than letting it unfold with protection from benevolent ASI.
In an easy world, boundaries are neutral, because you can set up corrigibility on the other side to eventually get aligned optimization there. The utility of boundaries is for worlds where we get values alignment or corrigibility wrong, and most of the universe eventually gets optimized in at least somewhat misaligned way.
Slight misalignment concern also makes personal boundaries in this sense an important thing to set up first, before any meaningful optimization changes people, as people are different from each other and initial optimization pressure might be less than maximally nuanced.
So it’s complementary and I suspect it’s a shard of human values that’s significantly easier to instill in this different-than-values role than either the whole thing or corrigibility towards it.