As this post sits with me, one thing that seems to call for a much closer look is this idea that the human remains in control of the cyborg.
The post states, for instance, that “The human is ‘in control’ not just in the sense of being the most powerful entity in the system, but rather because the human is the only one steering”, but at other points acknowledges what I would consider caveats. Several comment threads here, eg those initiated by Flipnash and by David Scott Krueger, raise questions, and I’d venture to say some of the replies, including some by janus themself, shatter at least the strongest version of it.
This is obviously a crucial point—it’s at the heart of the claim that cyborgism can differentially accelerate alignment relative to capabilities.
Me: “The human is doing the steering” captures an important truth. It’s one of the two[1] main reasons I’m excited about cyborgism. Also me: “The human is doing the steering”, stated unconditionally, is false.
In the wonderful graph labeled “Cognition is a Journey Through a Mental Landscape” (which Tufte would be proud of, seriously), we need to recognize that steering is going on at, and indeed inside, those blue circles too. Consider the collaborative behavior of the simulator and the human in constructing the cyborg’s joint trajectory. In what ways are their roles symmetrical, and in what ways are they not? How will this change as simulator SOTA advances? In what ways are human values already expressed in the simulator’s actions, and what do we make of the cases where they seem not to be? What do we make of the cases where simulacra manifestly do pursue goals seemingly agentically? If there are caveats to human control, how serious are they, how serious do we see them becoming, and what can we do about them?
To be clear, I firmly agree with the authors’ hunch that, for at least this decade or more, cyborgism can be a vehicle not just for retaining human agency, but for amplifying it, with benefits to alignment and in other ways too. I’m moved by considerations of the simulators’ myopia/divergence, the tabula rasa nature of their outer objectives, the experiences of people like janus who have gone deep with GPT, and also by the knowledge that human values are deeply embedded in what simulators learn.
But this needs to be more than a hunch; we need to probe it deeply (and indeed, the authors acknowledge this at several points, specifically including under ‘More ideas’). If it’s false, we need to find out now. If it’s true, we need the depth of understanding to turn belief that the simulator can amplify human agency into a reality that it does. In the process, we may come to a deeper understanding of this huge swath of the human semantic world the simulator has embodied, and thereby of ourselves.
The other being the way cyborgism amplifies human agency via the simulator’s strengths rather than continually running afoul of its weaknesses as other usage modes do.
As this post sits with me, one thing that seems to call for a much closer look is this idea that the human remains in control of the cyborg.
The post states, for instance, that “The human is ‘in control’ not just in the sense of being the most powerful entity in the system, but rather because the human is the only one steering”, but at other points acknowledges what I would consider caveats. Several comment threads here, eg those initiated by Flipnash and by David Scott Krueger, raise questions, and I’d venture to say some of the replies, including some by janus themself, shatter at least the strongest version of it.
This is obviously a crucial point—it’s at the heart of the claim that cyborgism can differentially accelerate alignment relative to capabilities.
Me: “The human is doing the steering” captures an important truth. It’s one of the two[1] main reasons I’m excited about cyborgism.
Also me: “The human is doing the steering”, stated unconditionally, is false.
In the wonderful graph labeled “Cognition is a Journey Through a Mental Landscape” (which Tufte would be proud of, seriously), we need to recognize that steering is going on at, and indeed inside, those blue circles too. Consider the collaborative behavior of the simulator and the human in constructing the cyborg’s joint trajectory. In what ways are their roles symmetrical, and in what ways are they not? How will this change as simulator SOTA advances? In what ways are human values already expressed in the simulator’s actions, and what do we make of the cases where they seem not to be? What do we make of the cases where simulacra manifestly do pursue goals seemingly agentically? If there are caveats to human control, how serious are they, how serious do we see them becoming, and what can we do about them?
To be clear, I firmly agree with the authors’ hunch that, for at least this decade or more, cyborgism can be a vehicle not just for retaining human agency, but for amplifying it, with benefits to alignment and in other ways too. I’m moved by considerations of the simulators’ myopia/divergence, the tabula rasa nature of their outer objectives, the experiences of people like janus who have gone deep with GPT, and also by the knowledge that human values are deeply embedded in what simulators learn.
But this needs to be more than a hunch; we need to probe it deeply (and indeed, the authors acknowledge this at several points, specifically including under ‘More ideas’). If it’s false, we need to find out now. If it’s true, we need the depth of understanding to turn belief that the simulator can amplify human agency into a reality that it does. In the process, we may come to a deeper understanding of this huge swath of the human semantic world the simulator has embodied, and thereby of ourselves.
The other being the way cyborgism amplifies human agency via the simulator’s strengths rather than continually running afoul of its weaknesses as other usage modes do.