I think history is a good teacher when it comes to AI in general, especially AI we did not (at least at the time of deployment, and perhaps now, do not) fully understand.
I too feel a temptation to imagine that a USG AGI would hypothetically have alignment with US ideals, and likewise a CCP AGI would align with CCP ideals.
That said, I struggle with, given our lack of robust knowledge of what alignment with any set of ideals would look like in an AGI system, and how we could assure them, having any certainty that these systems would align with anything the USG or CCP would find desirable at all. Progress is being made in this area by, Anthropic, but I’d need to see that move forward significantly.
One can look at current gen LLMs like DeepSeek and see that it is censored to align with CCP concepts during fine tuning, and perhaps see that as predictive. I find it doubtful that some fine tuning would be sufficient to serve as the moral backbone of an AI system that is capable of AGI.
Which speaks to history. AI systems tend to be very aligned with what their output task is. The largest and most mature networks we have are Deep Learning Recommendation Models deployed by social media entities to keep us glued to our phones.
The intention was to serve engaging content to people, the impact was to flood people with content that is emotionally resonant, but not necessarily accurate. That has arguably led to increased polarization, radicalization, and increased suicide rates, primarily in young women.
While it would be tempting to say that social media companies don’t care, the reality is that these DLRMs are very difficult to align. They are trained using RL and the corpus of their interactions with billions of daily users. They reward hack incessantly, and in very unpredictable ways. This leads to most of the mitigating actions being taken downstream of the recommendations (content warnings, etc.) Not out of design, but out of the simple fact that the models that are the best at getting users to keep scrolling are seldom the best at serving accurate content.
Currently, I think both flavors of AGI present the same fundamental risks. No matter the architecture, one cannot expect human like values to emerge in AI systems inherently, and we don’t understand the drivers of those values within humans particularly well and how they lead to party lines/party divisions.
Without that understanding, we’re shooting in the dark. It would be awfully embarrassing if the systems both, instead of flag waving aligned on dolphin species propagation.
I think history is a good teacher when it comes to AI in general, especially AI we did not (at least at the time of deployment, and perhaps now, do not) fully understand.
I too feel a temptation to imagine that a USG AGI would hypothetically have alignment with US ideals, and likewise a CCP AGI would align with CCP ideals.
That said, I struggle with, given our lack of robust knowledge of what alignment with any set of ideals would look like in an AGI system, and how we could assure them, having any certainty that these systems would align with anything the USG or CCP would find desirable at all. Progress is being made in this area by, Anthropic, but I’d need to see that move forward significantly.
One can look at current gen LLMs like DeepSeek and see that it is censored to align with CCP concepts during fine tuning, and perhaps see that as predictive. I find it doubtful that some fine tuning would be sufficient to serve as the moral backbone of an AI system that is capable of AGI.
Which speaks to history. AI systems tend to be very aligned with what their output task is. The largest and most mature networks we have are Deep Learning Recommendation Models deployed by social media entities to keep us glued to our phones.
The intention was to serve engaging content to people, the impact was to flood people with content that is emotionally resonant, but not necessarily accurate. That has arguably led to increased polarization, radicalization, and increased suicide rates, primarily in young women.
While it would be tempting to say that social media companies don’t care, the reality is that these DLRMs are very difficult to align. They are trained using RL and the corpus of their interactions with billions of daily users. They reward hack incessantly, and in very unpredictable ways. This leads to most of the mitigating actions being taken downstream of the recommendations (content warnings, etc.) Not out of design, but out of the simple fact that the models that are the best at getting users to keep scrolling are seldom the best at serving accurate content.
Currently, I think both flavors of AGI present the same fundamental risks. No matter the architecture, one cannot expect human like values to emerge in AI systems inherently, and we don’t understand the drivers of those values within humans particularly well and how they lead to party lines/party divisions.
Without that understanding, we’re shooting in the dark. It would be awfully embarrassing if the systems both, instead of flag waving aligned on dolphin species propagation.