If the transition from less to more disagreeableness doesn’t come along with an investigation of why agreeableness seemed like a plausible strategy and what was learned, then we’re still stuck trying to treat an adversary as an environment.
I think I agree with your statement; I assume that this happened, though? Or, at least, in a mirror of the ‘improvements visible from the outside’ comment earlier, the question is whether MIRI is now operating in a way that leads to successfully opposing their adversaries, rather than whether they’ve exposed their reasoning about this to the public.
If the transition from less to more disagreeableness doesn’t come along with an investigation of why agreeableness seemed like a plausible strategy and what was learned, then we’re still stuck trying to treat an adversary as an environment.
I think I agree with your statement; I assume that this happened, though? Or, at least, in a mirror of the ‘improvements visible from the outside’ comment earlier, the question is whether MIRI is now operating in a way that leads to successfully opposing their adversaries, rather than whether they’ve exposed their reasoning about this to the public.