TristanTrim comments on The Field of AI Alignment: A Postmortem, and What To Do About It

TristanTrim 13 Aug 2025 1:07 UTC
1 point
0
This post reminds me of a recent comment of mine about promoting research in the direction of easy results and how technological development, even if done by humans, may be correctly understood as an optimization process that could (and probably is) misaligned wrt human friendliness.

I think there’s an important generalization of the AI alignment problem: the socio-technical optimization system alignment problem. Many people are thinking about this, but the historical intractability of the problem needs to die because (a) we have incredible computer technology for communication and coordination, and (b) our socio-technical systems are, imo, currently recursively self improving. The timeline is unknown, but is very likely shortening.

I’m trying to come up with terminology and definitions for the idea, calling the generalization “Outcome Influencing Systems (OISs)”. I’ve got a WIP document which I hope to eventually post on LW. I’d love to get eyes on it to help with the ideas and how they are presented. It seems like the kind of idea that if done successfully could be the focus of “a community mostly-separate from the current field”. But, of course, if done poorly it could be a watered down recapitulation of some of the fields that inspire it, pulling talent from them, confusing onlookers, and failing to make progress itself. I’d like to avoid that eventuality.