Or, we are repeatedly failing in consistent ways, change plans and try to articulate as best we can why alignment doesn’t seem tractable.
I think we probably do have different priors here on how much we’d be able to trust a pretty broad suite of measures, but I agree with the high-level take. Also relevant:
However, we expect it to also be valuable, to a lesser extent, in many plausible harder worlds where this work could provide the evidence we need about the dangers that lie ahead.
Ah, indeed! I think the “consistent” threw me off a bit there and so I misread it on first reading, but that’s good.
Sorry for missing it on first read, I do think that is approximately the kind of clause I was imagining (of course I would phrase things differently and would put an explicit emphasis on coordinating with other actors in ways beyond “articulation”, but your phrasing here is within my bounds of where objections feel more like nitpicking).
Meta: I’m confused and a little sad about the relative upvotes of Habryka’s comment (35) and Sam’s comment (28). I think it’s trending better, but what does it even mean to have a highly upvoted complaint comment based on a misunderstanding, especially one more highly upvoted than the correction?
Maybe people think Habryka’s comment is a good critique even given the correction, even though I don’t think Habryka does?
I interpreted Habryka’s comment as making two points, one of which strikes me as true and important (that it seems hard/unlikely for this approach to allow for pivoting adequately, should that be needed), and the other of which was a misunderstanding (that they don’t literally say they hope to pivot if needed).
That’s part of Step 6!
I think we probably do have different priors here on how much we’d be able to trust a pretty broad suite of measures, but I agree with the high-level take. Also relevant:
Ah, indeed! I think the “consistent” threw me off a bit there and so I misread it on first reading, but that’s good.
Sorry for missing it on first read, I do think that is approximately the kind of clause I was imagining (of course I would phrase things differently and would put an explicit emphasis on coordinating with other actors in ways beyond “articulation”, but your phrasing here is within my bounds of where objections feel more like nitpicking).
Meta: I’m confused and a little sad about the relative upvotes of Habryka’s comment (35) and Sam’s comment (28). I think it’s trending better, but what does it even mean to have a highly upvoted complaint comment based on a misunderstanding, especially one more highly upvoted than the correction?
Maybe people think Habryka’s comment is a good critique even given the correction, even though I don’t think Habryka does?
I interpreted Habryka’s comment as making two points, one of which strikes me as true and important (that it seems hard/unlikely for this approach to allow for pivoting adequately, should that be needed), and the other of which was a misunderstanding (that they don’t literally say they hope to pivot if needed).
(This aligns with what I intended. I feel like my comment is making a fine point, even despite having missed the specific section.)