faul_sname comments on Focus on the Hardest Part First

faul_sname 15 Sep 2023 3:23 UTC
2 points
0

Your alignment example is very strange. C is basically “Solve Alignment” whereas A and B taken together do not constitute an alignment solution at all.

That is rather my point, yes. A solution to C would be most of the way to a solution to AI alignment , and would also solve management / resource allocation / corruption / many other problems. However, as things stand now a rather significant fraction of the entire world economy is directed towards mitigating the harms caused by the inability of principals to fully trust agents to act in their best interests. As valuable as a solution to C would be, I see no particular reason to expect it to be possible, and an extremely high lower bound on the difficulty of the problem (most salient to me is the multi-trillion-dollar value that could be captured from someone who did robustly solve this problem).

I wish anyone who decides to tackle C the best of luck, but I expect that the median outcome of such work would be something of no value, and the 99th percentile outcome of such work would be something like a clever incentive mechanism which e.g. bounds the extent to which the actions of an agent can harm the principal while appearing to be helpful to zero in a perfect-information world, and which degrades gracefully in a world of imperfect information.

In the meantime, I expect that attacking subproblems A and B will have a nonzero amount of value even in worlds where nobody finds a robust solution to subproblem C (both because better versions of A and B may be able to shore up an imperfect or partial solution to C, and also because while robust solutions to A/B/C may be one sufficient set of components to solve your overall problem, they may not be the only sufficient set of components, and by having solutions to A and B you may be able to find alternatives to C).

In alignment, if you don’t solve the problem you die. You can’t solve alignment 90% and then deploy an AI build with this 90% level of understanding because then the AI will still be approximately 0% aligned and kill you.

Whether this is true (in the narrow sense of “the AI kills you because it made an explicit calculation and determined that the optimal course of action was to perform behaviors that result in your death” rather than in the broad sense of “you die for reasons that would still have killed you even in the absence of the particular AI agent we’re talking about”) is as far as I know still an open question, and it seems to me one where preliminary signs are pointing against a misaligned singleton being the way we die.

The crux might be whether you expect a single recursively-self-improving agent to take control of the light cone, or whether you don’t expect a future where any individual agent can unilaterally determine the contents of the light cone.

Thank you for telling me about the CAP problem, I did not know about it.

For reference the CAP problem is one of those problems that sounds worse in theory than it actually is in practice: for most purposes—significant partitions are rare, and ” when a partition happens, data may not be fully up-to-date, and writes may be dropped or result in conflicts” is usually a good-enough solution in those rare cases.