That’s a good point. AIXI is my go-to example, and AIXI’s preferences are over its input tape. But, sticking to the cybernetic agent model, there are other action-dependent things Alice could have preferences over, like portions of her work tape, or her actions themselves. She could also have preferences over input-conditional logical constructs out of Everett’s program, like Everett’s work tape contents.
I agree it’s possible to build a non-AIXI-like Cartesian that wants to make paperclips, not just produce paperclip-experiences in itself. But Cartesians are weird, so it’s hard to predict how much progress that would represent.
For example, the Cartesian might wirehead under the assumption that doing so changes reality, instead of wireheading under the assumption that doing so changes its experiences. I don’t know whether a deeply dualistic agent would recognize that editing its camera to create paperclip hallucinations counts as editing its input sequence semi-directly. It might instead think of camera-hacking as a godlike way of editing reality as a whole, as though Alice had the power to create billions of representations of objective physical paperclips in Everett’s work tape just by editing the part of Everett’s work tape representing her hardware.
In general, I’m worried about including anything reminiscent of Cartesian reasoning in our ‘the seed AI can help us solve this’ corner-cutting category, because I don’t formally understand the precise patterns of mistakes Cartesians make well enough to think I can predict them and stay two steps ahead of those errors. And in the time it takes to figure out exactly which patches would make Cartesians safe and predictable without rendering them useless, it’s plausible we could have just built a naturalized architecture from scratch.
That’s a good point. AIXI is my go-to example, and AIXI’s preferences are over its input tape. But, sticking to the cybernetic agent model, there are other action-dependent things Alice could have preferences over, like portions of her work tape, or her actions themselves. She could also have preferences over input-conditional logical constructs out of Everett’s program, like Everett’s work tape contents.
I agree it’s possible to build a non-AIXI-like Cartesian that wants to make paperclips, not just produce paperclip-experiences in itself. But Cartesians are weird, so it’s hard to predict how much progress that would represent.
For example, the Cartesian might wirehead under the assumption that doing so changes reality, instead of wireheading under the assumption that doing so changes its experiences. I don’t know whether a deeply dualistic agent would recognize that editing its camera to create paperclip hallucinations counts as editing its input sequence semi-directly. It might instead think of camera-hacking as a godlike way of editing reality as a whole, as though Alice had the power to create billions of representations of objective physical paperclips in Everett’s work tape just by editing the part of Everett’s work tape representing her hardware.
In general, I’m worried about including anything reminiscent of Cartesian reasoning in our ‘the seed AI can help us solve this’ corner-cutting category, because I don’t formally understand the precise patterns of mistakes Cartesians make well enough to think I can predict them and stay two steps ahead of those errors. And in the time it takes to figure out exactly which patches would make Cartesians safe and predictable without rendering them useless, it’s plausible we could have just built a naturalized architecture from scratch.