Socratic question—what does being (especially within the in-group community) publicly being outraged at Altman achieve?
Can follow up with more thoughts later, but I am interested in your views as to the utility of various group postures (we can, for the sake of doing the interesting part of the discussion and not the boring part, assume we don’t need more evidence and that your claim is true).
I consider the failures of alignment to date almost all a consequence of social and narrative capture and control versus any paper solution worked out by this group. CEV is probably generally correct. FDT is a very good formalization for a hard problem.
When OpenAI was created it was structured, institutionally, to be a bastion of the exact theory preferences and discourse long championed by this forum and MIRI. The challenges in getting LW ‘preferences’ actualized have never been its formalizations or the magnitude and precision of its claims (to date) — they have been the failure of the risk model well understood by this group to propagate outside of it, to the operators of institutions and the social forces that orient them.
The failure to sufficiently mitigate ASI existential risk while knowing it was coming is a problem in influence, not ‘right-ness’. Rather than adopting a more aggressive strategy toward the forces which continue to prevent the risk, it seems that dispassionate resignation or further detached analysis of failure modes becomes the exception handler — as opposed to calling truth to power in such a way that new norms are established memetically, and by osmosis to the broader public narrative which operates much more commonly on arguments of authority.
It is incredibly common that policy is formulated on sentiment, and sentiment is memetically contagious. If we genuinely believe that what Altman is doing is bad, and it is a necessary operationalization of our beliefs to continuously and relentlessly undermine that influence through arguments with emotive uptake, then why are the avenues that may make such an exercise effective not actualized rather than merely studied?
One of the primary motivators in human cognition is status. The activation of status-reducing social mechanisms — ostracism, reputational cost, public moral censure — against those who defect on alignment commitments is an ancient enforcement strategy, and one that remains psychologically powerful precisely where legal frameworks offer no protection. I don’t think any of these claims are at all structurally novel. A direct and labelled antagonist to the preferences of a group has historically been one of the strongest forces for public assembly in history.
Though it is a morally scrupulous strategy in regularity, I would argue this type of operationalization of narrative control is exactly what Altman has mastered, and unless this game is played, formalizations discussed here will forever remain beautiful in principle but prevented in practice the second that the next ‘superalignment’ team gets sidelined from implementing them.
Imagine you solved alignment tomorrow where SI could be built in a way to benefit all beings equally and democratically according to CEV—but the only way to implement it was by getting Altman to sign off on not profiting from its deployment. What are our odds of success and operational tools now? That problem continues to exist so long as people aren’t emotively loaded enough from both reason and psychological friction to go beyond an action threshold to attempt re-arrangement of either Altman’s values or sphere of control directly on alignment’s behalf.
The same argument can be made about whether it is rational to ‘beat someone up’ who defected, even if there is no consequential reward for the retribution. I am not advocating for violence but I am saying that we have rational reason for seeking a reputational re-balancing of public opinion, as being the ‘type’ of person to whom defection is costly, is what FDT can conclude structurally prevents misalignment from opportunism.
The operationalization of aligned or unaligned SI is being determined in a political, institutional, and narrative knife fight by people who all understand this. Is it the preferences of this group to be proven right about the risks or to actually prevent them? Because that is a human alignment problem, which is a value-loading problem, where the only available syntax to write solutions is meme.
Socratic question—what does being (especially within the in-group community) publicly being outraged at Altman achieve?
Can follow up with more thoughts later, but I am interested in your views as to the utility of various group postures (we can, for the sake of doing the interesting part of the discussion and not the boring part, assume we don’t need more evidence and that your claim is true).
Thanks for engaging the comment!
I consider the failures of alignment to date almost all a consequence of social and narrative capture and control versus any paper solution worked out by this group. CEV is probably generally correct. FDT is a very good formalization for a hard problem.
When OpenAI was created it was structured, institutionally, to be a bastion of the exact theory preferences and discourse long championed by this forum and MIRI. The challenges in getting LW ‘preferences’ actualized have never been its formalizations or the magnitude and precision of its claims (to date) — they have been the failure of the risk model well understood by this group to propagate outside of it, to the operators of institutions and the social forces that orient them.
The failure to sufficiently mitigate ASI existential risk while knowing it was coming is a problem in influence, not ‘right-ness’. Rather than adopting a more aggressive strategy toward the forces which continue to prevent the risk, it seems that dispassionate resignation or further detached analysis of failure modes becomes the exception handler — as opposed to calling truth to power in such a way that new norms are established memetically, and by osmosis to the broader public narrative which operates much more commonly on arguments of authority.
It is incredibly common that policy is formulated on sentiment, and sentiment is memetically contagious. If we genuinely believe that what Altman is doing is bad, and it is a necessary operationalization of our beliefs to continuously and relentlessly undermine that influence through arguments with emotive uptake, then why are the avenues that may make such an exercise effective not actualized rather than merely studied?
One of the primary motivators in human cognition is status. The activation of status-reducing social mechanisms — ostracism, reputational cost, public moral censure — against those who defect on alignment commitments is an ancient enforcement strategy, and one that remains psychologically powerful precisely where legal frameworks offer no protection. I don’t think any of these claims are at all structurally novel. A direct and labelled antagonist to the preferences of a group has historically been one of the strongest forces for public assembly in history.
Though it is a morally scrupulous strategy in regularity, I would argue this type of operationalization of narrative control is exactly what Altman has mastered, and unless this game is played, formalizations discussed here will forever remain beautiful in principle but prevented in practice the second that the next ‘superalignment’ team gets sidelined from implementing them.
Imagine you solved alignment tomorrow where SI could be built in a way to benefit all beings equally and democratically according to CEV—but the only way to implement it was by getting Altman to sign off on not profiting from its deployment. What are our odds of success and operational tools now? That problem continues to exist so long as people aren’t emotively loaded enough from both reason and psychological friction to go beyond an action threshold to attempt re-arrangement of either Altman’s values or sphere of control directly on alignment’s behalf.
The same argument can be made about whether it is rational to ‘beat someone up’ who defected, even if there is no consequential reward for the retribution. I am not advocating for violence but I am saying that we have rational reason for seeking a reputational re-balancing of public opinion, as being the ‘type’ of person to whom defection is costly, is what FDT can conclude structurally prevents misalignment from opportunism.
The operationalization of aligned or unaligned SI is being determined in a political, institutional, and narrative knife fight by people who all understand this. Is it the preferences of this group to be proven right about the risks or to actually prevent them? Because that is a human alignment problem, which is a value-loading problem, where the only available syntax to write solutions is meme.