I feel very confused about the problem. Would appreciate anyone’s help with the questions below.
Why doesn’t the Gooder Regulator theorem solve the Agent-Like Structure Problem?
The separation between the “world model”, “search process” and “problem specification” should be in space (not in time)? We should be able to carve the system into those parts, physically?
Why would problem specification nessecerily be outside of the world model??? I imagine it could be encoded as an extra object in the world model. Any intuition for why keeping them separate is good for the agent? (I’ll propose one myself, see 5.)
Why are the “world model” and “search process” two different entities, what does each of them do? What is the fundamental difference between “modeling the world” and “searching”? Like, imagine I have different types of heuristics (A, B, C) for predicting the world, but I also can use them for search.
Doesn’t the inner alignment problem resolve the Agent-Like Structure Problem? Let me explain. Take a human, e.g. me. I have a big, changing brain. Parts of my brain can be said to want different things. That’s an instance of the inner alignment problem. And that’s a reason why having my goals completely entangled with all other parts of my brain could be dangerous (in such case it could be easier for any minor misalignment to blow up and overwrite my entire personality).
As I understand, the arguments from here would at least partially solve the problem, right? If they were formalized.
I feel very confused about the problem. Would appreciate anyone’s help with the questions below.
Why doesn’t the Gooder Regulator theorem solve the Agent-Like Structure Problem?
The separation between the “world model”, “search process” and “problem specification” should be in space (not in time)? We should be able to carve the system into those parts, physically?
Why would problem specification nessecerily be outside of the world model??? I imagine it could be encoded as an extra object in the world model. Any intuition for why keeping them separate is good for the agent? (I’ll propose one myself, see 5.)
Why are the “world model” and “search process” two different entities, what does each of them do? What is the fundamental difference between “modeling the world” and “searching”? Like, imagine I have different types of heuristics (A, B, C) for predicting the world, but I also can use them for search.
Doesn’t the inner alignment problem resolve the Agent-Like Structure Problem? Let me explain. Take a human, e.g. me. I have a big, changing brain. Parts of my brain can be said to want different things. That’s an instance of the inner alignment problem. And that’s a reason why having my goals completely entangled with all other parts of my brain could be dangerous (in such case it could be easier for any minor misalignment to blow up and overwrite my entire personality).
As I understand, the arguments from here would at least partially solve the problem, right? If they were formalized.