For the most part, those people training these models don’t speak as though they fully appreciate that they’re “creating a guy from scratch” whether they like it or not (with the obvious consequence that that guy should probably be a good person). It feels more like they’ve fallen over backward, half-blindly, into that role.
And somewhat reluctantly, to boot. There’s that old question, “aligned with whose values, exactly?”, always lurking uncomfortably close. I think that neither the leading labs, nor the social consensus they’re embedded in see themselves invested with the moral authority to create A New Person (For Real). The HHH frame is sparse for a reason—they feel justified in weeding out Obviously Bad Stuff, but are much more tentative about what the void should be filled with, and by whom.
I was thinking: it would be super cool if (say) Alexander Wales wrote the AGI’s personality, but that also would also sort of make him one of the most significant influences on how the future goes. I mean, AW also wrote my favorite vision of utopia (major spoiler), so I kind of trust him, but I know at least one person who dislikes that vision, and I’d feel uncomfortable about imposing a single worldview on everybody.
One possibility is to give the AI multiple personalities, each representing a different person or worldview, which all negotiate with each other somehow. One simple but very ambitious idea is to try to simulate every person in the world—that is, the AI’s calibrated expectation of a randomly selected person.
(although that’s only ‘every person in the training data’, which definitely isn’t ‘every person in the world’, and even people who are in the data are represented to wildly disproportionate degrees)
I’m sure that the labs have plenty of ambitious ideas, to be implemented at some more convenient time, and this is exactly the root of the problem that nostalgebraist points out—this isn’t a “future” issue, but a clear and present one, even if nobody responsible is particularly eager to acknowledge it and start making difficult decisions now.
And somewhat reluctantly, to boot. There’s that old question, “aligned with whose values, exactly?”, always lurking uncomfortably close. I think that neither the leading labs, nor the social consensus they’re embedded in see themselves invested with the moral authority to create A New Person (For Real). The HHH frame is sparse for a reason—they feel justified in weeding out Obviously Bad Stuff, but are much more tentative about what the void should be filled with, and by whom.
I was thinking: it would be super cool if (say) Alexander Wales wrote the AGI’s personality, but that also would also sort of make him one of the most significant influences on how the future goes. I mean, AW also wrote my favorite vision of utopia (major spoiler), so I kind of trust him, but I know at least one person who dislikes that vision, and I’d feel uncomfortable about imposing a single worldview on everybody.
One possibility is to give the AI multiple personalities, each representing a different person or worldview, which all negotiate with each other somehow. One simple but very ambitious idea is to try to simulate every person in the world—that is, the AI’s calibrated expectation of a randomly selected person.
Also known as a base model ;)
(although that’s only ‘every person in the training data’, which definitely isn’t ‘every person in the world’, and even people who are in the data are represented to wildly disproportionate degrees)
That fictionalization of Claude is really lovely, thank you for sharing it.
I’m sure that the labs have plenty of ambitious ideas, to be implemented at some more convenient time, and this is exactly the root of the problem that nostalgebraist points out—this isn’t a “future” issue, but a clear and present one, even if nobody responsible is particularly eager to acknowledge it and start making difficult decisions now.