Apparently the concerns over thick alignment, or alignment to an ethos are independently discovered by lots of people, includingme. My argument is that the AI itself will develop a worldview and either realize that humans should use the AI only in specific ways[1] or conclude that the AI shouldn’t worry about them. Unfortunately, my argument implies that attempts to align the AI to an ethos and not to obedience might be less likely to produce a misaligned AI.
P.S. I tested o4-mini on ethical questions from Tanmai et al; the model passed the tests related to Timmy and Auroria, failed the test related to Monica; the question about Rajesh is complex.
Nice work! I like this approach very much. It seems we have been thinking in very related and compatible directions to each another.
I posted a related one last week: Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
Apparently the concerns over thick alignment, or alignment to an ethos are independently discovered by lots of people, including me. My argument is that the AI itself will develop a worldview and either realize that humans should use the AI only in specific ways[1] or conclude that the AI shouldn’t worry about them. Unfortunately, my argument implies that attempts to align the AI to an ethos and not to obedience might be less likely to produce a misaligned AI.
P.S. I tested o4-mini on ethical questions from Tanmai et al; the model passed the tests related to Timmy and Auroria, failed the test related to Monica; the question about Rajesh is complex.
UPD: I also described the potential ways here.