Is there any particular reason to expect this problem as a whole to be solvable at all, even in principle?
This is another very good heuristic I think, that I agree is good to do first.
If the answers are “yes”, “no”, “no”, then I am inclined to agree that attacking C is the way to go. But also I think that combination of answers barely ever happens.
I think in alignment 2 is normally not the case, and if 2 is not the case 3 will not really help you. That’s why I think it is a reasonable assumption that you need A, B, and C solved.
Your childcare example is weird because the goal is to make money, which is a continuous success metric. You can make money without solving really any of the problems you listed. I did not say it in the original article (and I should have) but this technique is for problems where the actual solution requires you to solve all the problems. It would be like making money with childcare at all requires you to have solve all the problems. You can’t solve one problem a bit and then make a bit more money. If C is proven to be impossible, or way harder than some other problem set for a different way to make money, then you should switch to a different way to make money.
In alignment, if you don’t solve the problem you die. You can’t solve alignment 90% and then deploy an AI build with this 90% level of understanding because then the AI will still be approximately 0% aligned and kill you. We can’t figure out everything about a neural network, even what objective function corresponds to human values, except whether it is deceptively misaligned and live.
Your alignment example is very strange. C is basically “Solve Alignment” whereas A and B taken together do not constitute an alignment solution at all. The idea is that you have a set of subproblems that when taken together will constitute a solution to alignment. Having “solve alignment” in this set, breaks everything. The set should be such that when we trim all the unnecessary elements (elements that are not required because some subset of our set already constitutes a solution) you don’t remove anything, because all elements are necessary. Otherwise, I could add anything to the set and still end up with a set that solves alignment. The set should be (locally) minimal in size. If we trim the set that contains solve alignment, we just end up with a single element “solve alignment” and we have not simplified the problem by factoring it at all.
Or even better you make the set a tree instead, such that each node is a task, and you split nodes that are large into (ideally independent) subtasks until you can see how to solve each subpart. I guess this is superior to the original formulation. Ideally, there is not even a hardest part in the end. It should be obvious what you need to do to solve every leave node. The point where you look for the hardest part now is looking at what node to split next (the splitting might take a significant amount of time).
Thank you for telling me about the CAP problem, I did not know about it.
Your alignment example is very strange. C is basically “Solve Alignment” whereas A and B taken together do not constitute an alignment solution at all.
That is rather my point, yes. A solution to C would be most of the way to a solution to AI alignment , and would also solve management / resource allocation / corruption / many other problems. However, as things stand now a rather significant fraction of the entire world economy is directed towards mitigating the harms caused by the inability of principals to fully trust agents to act in their best interests. As valuable as a solution to C would be, I see no particular reason to expect it to be possible, and an extremely high lower bound on the difficulty of the problem (most salient to me is the multi-trillion-dollar value that could be captured from someone who did robustly solve this problem).
I wish anyone who decides to tackle C the best of luck, but I expect that the median outcome of such work would be something of no value, and the 99th percentile outcome of such work would be something like a clever incentive mechanism which e.g. bounds the extent to which the actions of an agent can harm the principal while appearing to be helpful to zero in a perfect-information world, and which degrades gracefully in a world of imperfect information.
In the meantime, I expect that attacking subproblems A and B will have a nonzero amount of value even in worlds where nobody finds a robust solution to subproblem C (both because better versions of A and B may be able to shore up an imperfect or partial solution to C, and also because while robust solutions to A/B/C may be one sufficient set of components to solve your overall problem, they may not be the only sufficient set of components, and by having solutions to A and B you may be able to find alternatives to C).
In alignment, if you don’t solve the problem you die. You can’t solve alignment 90% and then deploy an AI build with this 90% level of understanding because then the AI will still be approximately 0% aligned and kill you.
Whether this is true (in the narrow sense of “the AI kills you because it made an explicit calculation and determined that the optimal course of action was to perform behaviors that result in your death” rather than in the broad sense of “you die for reasons that would still have killed you even in the absence of the particular AI agent we’re talking about”) is as far as I know still an open question, and it seems to me one where preliminary signs are pointing against a misaligned singleton being the way we die.
The crux might be whether you expect a single recursively-self-improving agent to take control of the light cone, or whether you don’t expect a future where any individual agent can unilaterally determine the contents of the light cone.
Thank you for telling me about the CAP problem, I did not know about it.
For reference the CAP problem is one of those problems that sounds worse in theory than it actually is in practice: for most purposes—significant partitions are rare, and ” when a partition happens, data may not be fully up-to-date, and writes may be dropped or result in conflicts” is usually a good-enough solution in those rare cases.
This is another very good heuristic I think, that I agree is good to do first.
I think in alignment 2 is normally not the case, and if 2 is not the case 3 will not really help you. That’s why I think it is a reasonable assumption that you need A, B, and C solved.
Your childcare example is weird because the goal is to make money, which is a continuous success metric. You can make money without solving really any of the problems you listed. I did not say it in the original article (and I should have) but this technique is for problems where the actual solution requires you to solve all the problems. It would be like making money with childcare at all requires you to have solve all the problems. You can’t solve one problem a bit and then make a bit more money. If C is proven to be impossible, or way harder than some other problem set for a different way to make money, then you should switch to a different way to make money.
In alignment, if you don’t solve the problem you die. You can’t solve alignment 90% and then deploy an AI build with this 90% level of understanding because then the AI will still be approximately 0% aligned and kill you. We can’t figure out everything about a neural network, even what objective function corresponds to human values, except whether it is deceptively misaligned and live.
Your alignment example is very strange. C is basically “Solve Alignment” whereas A and B taken together do not constitute an alignment solution at all. The idea is that you have a set of subproblems that when taken together will constitute a solution to alignment. Having “solve alignment” in this set, breaks everything. The set should be such that when we trim all the unnecessary elements (elements that are not required because some subset of our set already constitutes a solution) you don’t remove anything, because all elements are necessary. Otherwise, I could add anything to the set and still end up with a set that solves alignment. The set should be (locally) minimal in size. If we trim the set that contains solve alignment, we just end up with a single element “solve alignment” and we have not simplified the problem by factoring it at all.
Or even better you make the set a tree instead, such that each node is a task, and you split nodes that are large into (ideally independent) subtasks until you can see how to solve each subpart. I guess this is superior to the original formulation. Ideally, there is not even a hardest part in the end. It should be obvious what you need to do to solve every leave node. The point where you look for the hardest part now is looking at what node to split next (the splitting might take a significant amount of time).
Thank you for telling me about the CAP problem, I did not know about it.
That is rather my point, yes. A solution to C would be most of the way to a solution to AI alignment , and would also solve management / resource allocation / corruption / many other problems. However, as things stand now a rather significant fraction of the entire world economy is directed towards mitigating the harms caused by the inability of principals to fully trust agents to act in their best interests. As valuable as a solution to C would be, I see no particular reason to expect it to be possible, and an extremely high lower bound on the difficulty of the problem (most salient to me is the multi-trillion-dollar value that could be captured from someone who did robustly solve this problem).
I wish anyone who decides to tackle C the best of luck, but I expect that the median outcome of such work would be something of no value, and the 99th percentile outcome of such work would be something like a clever incentive mechanism which e.g. bounds the extent to which the actions of an agent can harm the principal while appearing to be helpful to zero in a perfect-information world, and which degrades gracefully in a world of imperfect information.
In the meantime, I expect that attacking subproblems A and B will have a nonzero amount of value even in worlds where nobody finds a robust solution to subproblem C (both because better versions of A and B may be able to shore up an imperfect or partial solution to C, and also because while robust solutions to A/B/C may be one sufficient set of components to solve your overall problem, they may not be the only sufficient set of components, and by having solutions to A and B you may be able to find alternatives to C).
Whether this is true (in the narrow sense of “the AI kills you because it made an explicit calculation and determined that the optimal course of action was to perform behaviors that result in your death” rather than in the broad sense of “you die for reasons that would still have killed you even in the absence of the particular AI agent we’re talking about”) is as far as I know still an open question, and it seems to me one where preliminary signs are pointing against a misaligned singleton being the way we die.
The crux might be whether you expect a single recursively-self-improving agent to take control of the light cone, or whether you don’t expect a future where any individual agent can unilaterally determine the contents of the light cone.
For reference the CAP problem is one of those problems that sounds worse in theory than it actually is in practice: for most purposes—significant partitions are rare, and ” when a partition happens, data may not be fully up-to-date, and writes may be dropped or result in conflicts” is usually a good-enough solution in those rare cases.