I think this is correct, but doesn’t seem to note the broader trend towards human disempowerment in favor of bureaucratic and corporate systems, which this gradual disempowerment would continue, and hence elides or ignores why AI risk is distinct.
Good point. The reason AI risk is distinct is simply that it removes the need of those bureaucracies and corporations to keep some humans happy and healthy enough to actually run them. This doesn’t exactly put limits on how much they can disempower humans, but it does tend to provide at least some bargaining power for the humans involved.
I don’t think that covers it fully. Corporations “need… those bureaucracies,” but haven’t done what would be expected otherwise.
I think we need to add both that corporations are limited by only doing things they can convince humans to do, are aligned with at least somewhat human directors / controllers, have a check and balance system of both the people being able to whistleblow and the company being constrained by law to an extent that the people need to worry when breaking it blatantly.
But I think that breaking these constraints is going to be much closer to the traditional loss-of-control scenario than what you seem to describe.
I’m confused about this response. We explicitely claim that bureaucracies are limited by running on humans, which includes only being capable of actions human minds can come up with and humans are willing to execute (cf “street level bureaucrats”). We make the point explicite for states, but clearly holds for corporate burreocracies.
Maybe it does not shine through the writing but we spent hours discussing this when writing the paper and points you make are 100% accounted for in the conclusions.
I don’t think I disagree with you on the whole—as I said to start, I think this is correct. (I only skimmed the full paper, but I read the post; on looking at it, the full paper does discuss this more, and I was referring to the response here, not claiming the full paper ignores the topic.)
That said, in the paper you state that the final steps require something more than human disempowerment due to other types of systems, but per my original point, seem to elide how the process until that point is identical by saying that these systems have largely been aligned with humans until now, while I think that’s untrue; humans have benefitted despite the systems being poorly aligned. (Misalignment due to overoptimization failures would look like this, and is what has been happening when economic systems are optimizing for GDP and ignoring wealth disparity, for example; the wealth goes up, but as it becomes more extreme, the tails diverge, and at this point, maximizing GDP looks very different from what a democracy is supposed to do.)
Back to the point, to the extent that the unique part is due to cutting the last humans out of the decision loop, it does differ—but it seems like the last step definitionally required the initially posited misalignment with human goals, so that it’s an alignment or corrigibility failure of the traditional type, happening at the end of this other process that, again, I think is not distinct.
Again, that’s not to say I disagree, just that it seems to ignore the broader trend by saying this is really different.
But since I’m responding, as a last complaint, you do all of this without clearly spelling out why solving technical alignment would solve this problem, which seems unfortunate. Instead, the proposed solutions try to patch the problems of disempowerment by saying you need to empower humans to stay in the decision loop—which in the posited scenario doesn’t help when increasingly powerful but fundamentally misaligned AI systems are otherwise in charge. But this is making a very different argument, and one I’m going to be exploring when thinking about oversight versus control in a different piece I’m writing.
I think this is correct, but doesn’t seem to note the broader trend towards human disempowerment in favor of bureaucratic and corporate systems, which this gradual disempowerment would continue, and hence elides or ignores why AI risk is distinct.
Good point. The reason AI risk is distinct is simply that it removes the need of those bureaucracies and corporations to keep some humans happy and healthy enough to actually run them. This doesn’t exactly put limits on how much they can disempower humans, but it does tend to provide at least some bargaining power for the humans involved.
I don’t think that covers it fully. Corporations “need… those bureaucracies,” but haven’t done what would be expected otherwise.
I think we need to add both that corporations are limited by only doing things they can convince humans to do, are aligned with at least somewhat human directors / controllers, have a check and balance system of both the people being able to whistleblow and the company being constrained by law to an extent that the people need to worry when breaking it blatantly.
But I think that breaking these constraints is going to be much closer to the traditional loss-of-control scenario than what you seem to describe.
I’m confused about this response. We explicitely claim that bureaucracies are limited by running on humans, which includes only being capable of actions human minds can come up with and humans are willing to execute (cf “street level bureaucrats”). We make the point explicite for states, but clearly holds for corporate burreocracies.
Maybe it does not shine through the writing but we spent hours discussing this when writing the paper and points you make are 100% accounted for in the conclusions.
I don’t think I disagree with you on the whole—as I said to start, I think this is correct. (I only skimmed the full paper, but I read the post; on looking at it, the full paper does discuss this more, and I was referring to the response here, not claiming the full paper ignores the topic.)
That said, in the paper you state that the final steps require something more than human disempowerment due to other types of systems, but per my original point, seem to elide how the process until that point is identical by saying that these systems have largely been aligned with humans until now, while I think that’s untrue; humans have benefitted despite the systems being poorly aligned. (Misalignment due to overoptimization failures would look like this, and is what has been happening when economic systems are optimizing for GDP and ignoring wealth disparity, for example; the wealth goes up, but as it becomes more extreme, the tails diverge, and at this point, maximizing GDP looks very different from what a democracy is supposed to do.)
Back to the point, to the extent that the unique part is due to cutting the last humans out of the decision loop, it does differ—but it seems like the last step definitionally required the initially posited misalignment with human goals, so that it’s an alignment or corrigibility failure of the traditional type, happening at the end of this other process that, again, I think is not distinct.
Again, that’s not to say I disagree, just that it seems to ignore the broader trend by saying this is really different.
But since I’m responding, as a last complaint, you do all of this without clearly spelling out why solving technical alignment would solve this problem, which seems unfortunate. Instead, the proposed solutions try to patch the problems of disempowerment by saying you need to empower humans to stay in the decision loop—which in the posited scenario doesn’t help when increasingly powerful but fundamentally misaligned AI systems are otherwise in charge. But this is making a very different argument, and one I’m going to be exploring when thinking about oversight versus control in a different piece I’m writing.