With normal science, there’s a phenomenon that we observe, and what we want is to figure out the underlying laws. With AI systems, it’s more accurate to say that we know the underlying laws (such as the mathematics of computation, and the “initial conditions” of learning algorithms) and we’re trying to figure out what phenomena will occur (e.g. what fraction of them will undergo instrumental convergence).
I’d say part of agent foundations is the reverse: We know what phenomena will probably occur (extreme optimization by powerful agent) and what phenomena we want to cause (alignment). And we’re trying to understand the underlying laws that could cause those phenomena (algorithms behind general intelligence that have not been invented yet) so that we can steer them towards the outcomes we want.
I’d say part of agent foundations is the reverse: We know what phenomena will probably occur (extreme optimization by powerful agent) and what phenomena we want to cause (alignment). And we’re trying to understand the underlying laws that could cause those phenomena (algorithms behind general intelligence that have not been invented yet) so that we can steer them towards the outcomes we want.