It would in fact be infohazardous, but yes, I’ve kinda been doing all this introspection for years now with the intent of figuring out how to implement it in an AGI. In particular, I think there’s a nontrivial possibility that GPT-2 by itself is already AGI-complete and just needs to be prompted in the right intricate pattern to produce thoughts in a similar structure to how humans do. I do not have access to a GPU, so I cannot test and develop this, which is very frustrating to me.
I’m almost certainly wrong about how simple this is, but I need to be able to build and tweak a system actively in order to find out—and in particular, I’m really bad at explaining abstract ideas in my head, as most of them are more visual than verbal.
One bit that wouldn’t be infohazardous though afaik is the “caring intrinsically about other entities” bit. I’m sure you can see how a sufficiently intelligent language model could be used to predict, given a simulated future scenario, whether a simulated entity experiencing that scenario would prefer, upon being credibly given the choice, for the event to be undone / not have happened in the first place. This is intended to parallel the human ability—indeed, automatic subconscious tendency—to continually predict whether an action we are considering will contradict the preferences of others we care about, and choose not to do it if it will.
So, a starting point would be to try to make a model which is telling a story, but regularly asks every entity being simulated if they want to undo the most recent generation, and does so if even one of them asks to do it. Would this result in a more ethical sequence of events? That’s one of the things I want to explore.
It would in fact be infohazardous, but yes, I’ve kinda been doing all this introspection for years now with the intent of figuring out how to implement it in an AGI. In particular, I think there’s a nontrivial possibility that GPT-2 by itself is already AGI-complete and just needs to be prompted in the right intricate pattern to produce thoughts in a similar structure to how humans do. I do not have access to a GPU, so I cannot test and develop this, which is very frustrating to me.
I’m almost certainly wrong about how simple this is, but I need to be able to build and tweak a system actively in order to find out—and in particular, I’m really bad at explaining abstract ideas in my head, as most of them are more visual than verbal.
One bit that wouldn’t be infohazardous though afaik is the “caring intrinsically about other entities” bit. I’m sure you can see how a sufficiently intelligent language model could be used to predict, given a simulated future scenario, whether a simulated entity experiencing that scenario would prefer, upon being credibly given the choice, for the event to be undone / not have happened in the first place. This is intended to parallel the human ability—indeed, automatic subconscious tendency—to continually predict whether an action we are considering will contradict the preferences of others we care about, and choose not to do it if it will.
So, a starting point would be to try to make a model which is telling a story, but regularly asks every entity being simulated if they want to undo the most recent generation, and does so if even one of them asks to do it. Would this result in a more ethical sequence of events? That’s one of the things I want to explore.