Even if we did make a goal program, it’s still unknown how to build an AGI that is motivated to compute it, or to follow the goals it outputs.
Actually, it is (to a 0th approximation) known how to build an AGI that is motivated to compute it: use infra-Bayesian physicalism. The loss function in IBP already has the semantics “which programs should run”. Following the goal it outputs is also formalizable within IBP, but even without this step we can just have utopia inside the goal program itself[1].
We should be careful to prevent the inhabitants of the virtual utopia from creating unaligned AI which eats the utopia. This sounds achievable, assuming the premise that we can actually construct such programs.
Actually, it is (to a 0th approximation) known how to build an AGI that is motivated to compute it: use infra-Bayesian physicalism. The loss function in IBP already has the semantics “which programs should run”. Following the goal it outputs is also formalizable within IBP, but even without this step we can just have utopia inside the goal program itself[1].
We should be careful to prevent the inhabitants of the virtual utopia from creating unaligned AI which eats the utopia. This sounds achievable, assuming the premise that we can actually construct such programs.