I already figured that. The point of this question was to ask if there could possibly exist things that look indistinguishable from true alignment solutions (even to smart people), but that aren’t actually alignment solutions. Do you think things like this could exist?
By the way, good luck with your plan. Seeing people actively go out and do actually meaningful work to save the world gives me hope for the future. Just try not to burn out. Smart people are more useful to humanity when their mental health is in good shape.
I’m pretty uncertain on this one. Could a superintelligence find a plan which fools me? Yes. Will such a plans show up early on in a search order without actively trying to fool me? Ehh… harder to say. It’s definitely a possibility I keep in mind. Most importantly, over time as our understanding improves on the theory side, it gets less and less likely that a plan which would fool me shows up early in a natural search order.
I already figured that. The point of this question was to ask if there could possibly exist things that look indistinguishable from true alignment solutions (even to smart people), but that aren’t actually alignment solutions. Do you think things like this could exist?
By the way, good luck with your plan. Seeing people actively go out and do actually meaningful work to save the world gives me hope for the future. Just try not to burn out. Smart people are more useful to humanity when their mental health is in good shape.
I’m pretty uncertain on this one. Could a superintelligence find a plan which fools me? Yes. Will such a plans show up early on in a search order without actively trying to fool me? Ehh… harder to say. It’s definitely a possibility I keep in mind. Most importantly, over time as our understanding improves on the theory side, it gets less and less likely that a plan which would fool me shows up early in a natural search order.