I think you have me mistaken for my infamous doppelganger, @Robert Miles.
I did, apologies. I also recently discovered Max H != Max Harms, it’s quite confusing round here.
Figure 1 doesn’t represent any specific experiment’s data, unless I’m very confused—I think it’s just an illustration of the authors’ all-things-considered summary of their own results.
I got my figure numbers mixed up, but I think we’re roughly on the same page here. NB the twitter thread states: “Finding 2: There is an inconsistent relationship between model intelligence and incoherence” which looks spot on to me.
As for the rest—it really seems to me like you’re either trying to establish the same conceptual link I was arguing was unjustified, or making some other argument whose relationship to my post I don’t understand. I expect both variance and bias to fall in absolute terms as models get more powerful, and I don’t have a confident belief about which I expect to fall faster. Either possibility seems to admit of deceptive schemers, which look “incoherent” but friendly while you’re measuring them.
I don’t see much argument in your post, nor here. There are reasons to think that deceptive schemers will have low variance and there’s an absence of reasons to think mistake-makers will. You might think those reasons are weak, but I’d be much happier to see you demonstrate that you understand the reasons and explain why you think they’re weak than simply assert your doubt and condemn on the basis of that assertion. I think discussions that get into reasons are sometimes clarifying.
This mis-framing is a big part of the thing I’m complaining about. Should I downweight how likely I think we are to get a misaligned superintelligence that isn’t a deceptive schemer? Idk, man, I in fact didn’t think it was that likely before this paper
That’s not the correct update to make in the face of evidence that alignment scales better than capabilities; the correct update is that misaligned superintelligence is less likely, so I’d say you should either argue against the relevance or make that update.
Do you think the framing/narrative of this paper and the surrounding communications were basically reasonable, and that the experimental results of the paper are doing meaningful work in justifying that framing/narrative?
Look I dunno what to say here. I do think the well-calibrated narrative goes something like “this is extremely weak evidence that much more capable AI will be more prone to confusion than scheming, but we’re excited that we’ve found a way to study it at all”, but lots of scientific communication overstates its significance and I’m habituated to making allowances for that. I’d also love it if the paper tried a lot harder to establish why they thought this was relevant to confusion vs scheming for powerful AI, but for whatever reason arguments like this seem to be culturally inappropriate in ML papers, something which I also make allowances for. It doesn’t strike me as particularly unreasonable given those allowances.
I strongly disagree with this criterion! If a vision of the future tells us compellingly things like “future X is better than future Y (where both X and Y are plausible)” this is very valuable—we can use it to make plans, and our capacity to make plans is rising so it’s OK to presently have ambitions that outstrip our capacity to see them through. I don’t know if Machines of Loving Grace does this, I’m just arguing against your criterion. I do sorta feel like “cure cancer” doesn’t qualify by this criterion, it’s just too implausible that things are going well but we forget to cure cancer.
I don’t mean to say it isn’t additionally valuable to have plans (which should probably mostly be of the form “solve A, punt B”), but that it isn’t necessary to have them.