I agree! We’ll see much more specific focus as we have off-the-shelf agents that do interesting things.
But if these could lead to short timelines, waiting a year to think about aligning them seems like it’s wasting 1/3-1/5 of the total time we’ve got? (Edit: that’s not fair, since a lot of the people who might work on agent-specific concerns would otherwise continue to work on aligning LLMs in isolation, which is also valuable in aligning LLM agents. I’m just worried there are severe agent-specific concerns we haven’t yet grappled with).
If this is a side effect of the admirably empirical focus of prosaic alignment work, I’m concerned. Taken to an extreme, as it sometimes is, you couldn’t work on anything you haven’t empirically observed. I’d worry that we’d barely start thinking about misalignment in takeover-capable real AGI until far too close to it being possible.
I agree! We’ll see much more specific focus as we have off-the-shelf agents that do interesting things.
But if these could lead to short timelines, waiting a year to think about aligning them seems like it’s wasting 1/3-1/5 of the total time we’ve got? (Edit: that’s not fair, since a lot of the people who might work on agent-specific concerns would otherwise continue to work on aligning LLMs in isolation, which is also valuable in aligning LLM agents. I’m just worried there are severe agent-specific concerns we haven’t yet grappled with).
If this is a side effect of the admirably empirical focus of prosaic alignment work, I’m concerned. Taken to an extreme, as it sometimes is, you couldn’t work on anything you haven’t empirically observed. I’d worry that we’d barely start thinking about misalignment in takeover-capable real AGI until far too close to it being possible.