I think the main issue is that inquiry generally follows two directions:
What was predicted before and gained cultural momentum as an area of study?
What exists now and is therefore an easy object of study?
Pretrained LLMs seem to have been somewhat unexpected as the probable path to AGI so there isn’t a large historical cultural discussion around more advanced variants of them / their systematic interaction.
And, there are not yet systems of interacting LLM agents so they cannot be an easy topic of study due to a plethora of available examples.
I think that’s basically why you don’t see more about this—but in like a year—when they start to emerge—you’ll see the conversation shift to them—because they will be easier to study.
I agree! We’ll see much more specific focus as we have off-the-shelf agents that do interesting things.
But if these could lead to short timelines, waiting a year to think about aligning them seems like it’s wasting 1/3-1/5 of the total time we’ve got? (Edit: that’s not fair, since a lot of the people who might work on agent-specific concerns would otherwise continue to work on aligning LLMs in isolation, which is also valuable in aligning LLM agents. I’m just worried there are severe agent-specific concerns we haven’t yet grappled with).
If this is a side effect of the admirably empirical focus of prosaic alignment work, I’m concerned. Taken to an extreme, as it sometimes is, you couldn’t work on anything you haven’t empirically observed. I’d worry that we’d barely start thinking about misalignment in takeover-capable real AGI until far too close to it being possible.
I think the main issue is that inquiry generally follows two directions:
What was predicted before and gained cultural momentum as an area of study?
What exists now and is therefore an easy object of study?
Pretrained LLMs seem to have been somewhat unexpected as the probable path to AGI so there isn’t a large historical cultural discussion around more advanced variants of them / their systematic interaction.
And, there are not yet systems of interacting LLM agents so they cannot be an easy topic of study due to a plethora of available examples.
I think that’s basically why you don’t see more about this—but in like a year—when they start to emerge—you’ll see the conversation shift to them—because they will be easier to study.
I agree! We’ll see much more specific focus as we have off-the-shelf agents that do interesting things.
But if these could lead to short timelines, waiting a year to think about aligning them seems like it’s wasting 1/3-1/5 of the total time we’ve got? (Edit: that’s not fair, since a lot of the people who might work on agent-specific concerns would otherwise continue to work on aligning LLMs in isolation, which is also valuable in aligning LLM agents. I’m just worried there are severe agent-specific concerns we haven’t yet grappled with).
If this is a side effect of the admirably empirical focus of prosaic alignment work, I’m concerned. Taken to an extreme, as it sometimes is, you couldn’t work on anything you haven’t empirically observed. I’d worry that we’d barely start thinking about misalignment in takeover-capable real AGI until far too close to it being possible.