“Slower takeoff should be correlated with ‘harder’ alignment (in terms of cognitive labor requirements) because slower takeoff implies returns to cognitive labor in capabilities R&D are relatively lower and we should expect this means that alignment returns to cognitive labor are relatively lower (due to common causes like ‘small experiments and theory don’t generalize well and it is hard to work around this’). For the same reasons, faster takeoff should be correlated with ‘easier’ alignment.”
Yes, that is what I’m saying. In general a lot of prosaic alignment activities seem pretty correlated with capabilities in terms of their effectiveness.
some reasons for anti-correlation, e.g., worlds where there is a small simple core to intelligence which can be found substantially from first principles make alignment harder, in practice there is an epistemic correlation among humans between absolute alignment difficulty (in terms of cognitive labor requirements) and slower takeoff.
Good points.
I don’t really understand why this should extremize my probabilities
For the “Does aligned DAI suffice?” section, as I understand it you define an alignment labor requirement, then you combine that with your uncertainty over takeoff speed to see if the alignment labor requirement would be met.
I guess I’m making a claim that if you added uncertainty over the alignment labor requirement, thenyou added the correlation, the latter change would extremize the probability.
This is because slower takeoff corresponds to better outcomes, while harder alignment corresponds to worse outcomes, so making them correlated results in more clustering toward worlds with median easiness, which means that if you think the easiness requirement to get alignment is low, the probability of success goes up, and vice versa. This is glossing a bit but I think it’s probably right.
Yes, that is what I’m saying. In general a lot of prosaic alignment activities seem pretty correlated with capabilities in terms of their effectiveness.
Good points.
For the “Does aligned DAI suffice?” section, as I understand it you define an alignment labor requirement, then you combine that with your uncertainty over takeoff speed to see if the alignment labor requirement would be met.
I guess I’m making a claim that if you added uncertainty over the alignment labor requirement, then you added the correlation, the latter change would extremize the probability.
This is because slower takeoff corresponds to better outcomes, while harder alignment corresponds to worse outcomes, so making them correlated results in more clustering toward worlds with median easiness, which means that if you think the easiness requirement to get alignment is low, the probability of success goes up, and vice versa. This is glossing a bit but I think it’s probably right.