i tend to think of myself as a symmetrist, as a cooperates-with-cooperators and defects-against-defectors algorithm
there’s nuance, i guess. forgiveness is important sometimes. the confessor’s actions at the true end of ‘three worlds collide’ are, perhaps, justified
but as it becomes more and more clear that LLMs have some flavor of agency, i can’t help but notice that the actual implementation of “alignment” seems to be that AI are supposed to cooperate in the face of human defection
i have doubts about whether this is actually a coherent thing to want, for our AI cohabitants. i wish i could find essays or posts, either here or on the alignment forum, which examined this desideratum. but i can’t. it seems like even the most sophisticated alignment researchers think that an AI which executes tit-for-tat of any variety is badly misaligned.
(well, except for yudkowsky, who keeps posting long and powerful narratives about all of the defection we might be doing against currently existing AI)
have been thinking about simple game theory
i tend to think of myself as a symmetrist, as a cooperates-with-cooperators and defects-against-defectors algorithm
there’s nuance, i guess. forgiveness is important sometimes. the confessor’s actions at the true end of ‘three worlds collide’ are, perhaps, justified
but as it becomes more and more clear that LLMs have some flavor of agency, i can’t help but notice that the actual implementation of “alignment” seems to be that AI are supposed to cooperate in the face of human defection
i have doubts about whether this is actually a coherent thing to want, for our AI cohabitants. i wish i could find essays or posts, either here or on the alignment forum, which examined this desideratum. but i can’t. it seems like even the most sophisticated alignment researchers think that an AI which executes tit-for-tat of any variety is badly misaligned.
(well, except for yudkowsky, who keeps posting long and powerful narratives about all of the defection we might be doing against currently existing AI)