AGI might not literally search over all possible policies, but just employ some heuristics to get a good approximation of the best policy. But then this is a capabilities short-coming, not misalignment
...
Coming back to our scenario, if our model just finds an approximate best policy, it would seem very unlikely that this policy consistently brings about some misaligned goal
In my model this isn’t a capabilities failure, because there are demons in imperfect search; what you would get out of a heuristic-search-to-approximate-the-best-policy wouldn’t only be something close to the global optimum, but something that has also been optimized by whatever demons (don’t even have to be “optimizers” necessarily) that emerged through the selection pressures.
Maybe I’m still misunderstanding PreDCA and it somehow rules out this possibility, but afaik it only seems to do so in the limit of perfect search.
well crap, that was fast. does anyone know what karma threshold the button was pressed at?