I think this is a good approach to developing alternative views on alignment. Ultimately, all frames (/models) are incomplete, and a model which describes the whole phenomenon is likely to be uselessly complex in realistic scenarios, particularly in something as nebulous as alignment. More perspectives are better, but we need a good understanding of the assumptions (inductive priors) inside the theories to reason with them in a coherent and faithful way.
The phylogenetic approach you’ve outlined is generative. It leads to agentic battery tests, where we can audit systems according to their agentic properties across a wide variety of frames in a principled and comparable way. This is a principled improvement over narrow single-lens approaches, and also over non-standardised multi-frame tests which do not cover the same lenses.
Your suggestion to use fields of origin as a base primitive, with the analytic lens focused on intentional stances from those fields, seems to be well-grounded in the phylogenetic basis.
I wonder if there are not some more abstractions to fit over these base primitives, and how these might fit into the theory: agentic identity, cooperation, (coupled) adaptation, learning, self-preservation. Concepts at this level intersect intentional stances across fields, yet may be of great interest as an object of analysis.
I completely agree with what you’re saying at the end here. This project came about from trying to do that and I’m hoping to release something like that in the next couple of weeks. It’s a bit arbitrary but it is an interesting first guess I think?
So that would be the taxonomy of agents yet that felt quite arbitrary so the evolutionary approach kind of came from that on.
In your concept I see something like a theory-of-theories, or a meta-theory. Right now we have many possible theories of alignment, sometimes competing, and we don’t seem to have a good way to select between them. Well, this ecological approach is a candidate. It is advantageous because it classifies the strengths, commonalities, and weaknesses of all the alternative approaches. You have expanded on these advantages already in the piece above but that is my interpretation. I think yours is an interesting approach for structuring thinking. That is what we need in this pre-paradigmatic field.
I have been trying to create an alignment language as well. I have gone for a Popperian approach, trying to create falsifiability and iterating the theory until I achieve that. Slowly getting there! Mine is more a theory though, whereas I have read yours as more abstract, encompassing mine.
My work also seemed arbitrary at first. But yours seems to have a strong core structure on which to build, so I think the edges can be smoothed out and applications developed!
I would be interested in the Autumn workshop, if there is a mailing list or something? I have signed up for the Equilibria Network Luma calendar. Cheers!
I think this is a good approach to developing alternative views on alignment. Ultimately, all frames (/models) are incomplete, and a model which describes the whole phenomenon is likely to be uselessly complex in realistic scenarios, particularly in something as nebulous as alignment. More perspectives are better, but we need a good understanding of the assumptions (inductive priors) inside the theories to reason with them in a coherent and faithful way.
The phylogenetic approach you’ve outlined is generative. It leads to agentic battery tests, where we can audit systems according to their agentic properties across a wide variety of frames in a principled and comparable way. This is a principled improvement over narrow single-lens approaches, and also over non-standardised multi-frame tests which do not cover the same lenses.
Your suggestion to use fields of origin as a base primitive, with the analytic lens focused on intentional stances from those fields, seems to be well-grounded in the phylogenetic basis.
I wonder if there are not some more abstractions to fit over these base primitives, and how these might fit into the theory: agentic identity, cooperation, (coupled) adaptation, learning, self-preservation. Concepts at this level intersect intentional stances across fields, yet may be of great interest as an object of analysis.
Yes!
I completely agree with what you’re saying at the end here. This project came about from trying to do that and I’m hoping to release something like that in the next couple of weeks. It’s a bit arbitrary but it is an interesting first guess I think?
So that would be the taxonomy of agents yet that felt quite arbitrary so the evolutionary approach kind of came from that on.
Very cool.
In your concept I see something like a theory-of-theories, or a meta-theory. Right now we have many possible theories of alignment, sometimes competing, and we don’t seem to have a good way to select between them. Well, this ecological approach is a candidate. It is advantageous because it classifies the strengths, commonalities, and weaknesses of all the alternative approaches. You have expanded on these advantages already in the piece above but that is my interpretation. I think yours is an interesting approach for structuring thinking. That is what we need in this pre-paradigmatic field.
I have been trying to create an alignment language as well. I have gone for a Popperian approach, trying to create falsifiability and iterating the theory until I achieve that. Slowly getting there! Mine is more a theory though, whereas I have read yours as more abstract, encompassing mine.
My work also seemed arbitrary at first. But yours seems to have a strong core structure on which to build, so I think the edges can be smoothed out and applications developed!
I would be interested in the Autumn workshop, if there is a mailing list or something? I have signed up for the Equilibria Network Luma calendar. Cheers!