This isn’t “models I use when thinking about the object level alignment problem” or “models I’d use l if I were doing alignment research”. Those are a set of more detailed models of how intelligence works in general, and I do intend to write a post about those sometime.
I thought about it for a while, and ended up writing a nearby post: “Models I use when making plans to reduce AI x-risk”.
This isn’t “models I use when thinking about the object level alignment problem” or “models I’d use l if I were doing alignment research”. Those are a set of more detailed models of how intelligence works in general, and I do intend to write a post about those sometime.