I do think there’s real risk there even with base models, but it’s important to be clear where it’s coming from—simulators can be addictive when trying to escape the real world. Your agency needs to somehow aim away from the simulator, and use the simulator as an instrumental tool.
I think you just have to select for / rely on people who care more about solving alignment than escapism, or at least that are able to aim at alignment in conjunction with having fun. I think fun can be instrumental. As I wrote in my testimony, I often explored the frontier of my thinking in the context of stories.
My intuition is that most people who go into cyborgism with the intent of making progress on alignment will not make themselves useless by wireheading, in part because the experience is not only fun, it’s very disturbing, and reminds you constantly why solving alignment is a real and pressing concern.
I do think there’s real risk there even with base models, but it’s important to be clear where it’s coming from—simulators can be addictive when trying to escape the real world. Your agency needs to somehow aim away from the simulator, and use the simulator as an instrumental tool.
I think you just have to select for / rely on people who care more about solving alignment than escapism, or at least that are able to aim at alignment in conjunction with having fun. I think fun can be instrumental. As I wrote in my testimony, I often explored the frontier of my thinking in the context of stories.
My intuition is that most people who go into cyborgism with the intent of making progress on alignment will not make themselves useless by wireheading, in part because the experience is not only fun, it’s very disturbing, and reminds you constantly why solving alignment is a real and pressing concern.