There are fundamental confusions about intelligent agents, that is, about minds that try to make stuff that they want happen. Some believe that working out these fundamental confusions is necessary for AI alignment. Others prefer more prosaic approaches; or something else not mentioned.
Here’s some fundamental confusions that agent foundations tries to answer:
How can a mind reason about a world too large to consider in its entirety? Perhaps we should look at what sorts of abstractions they’d use; perhaps we need new probability theories like infra-bayesianism
How can a mind reason when it does not know all of the implications of its beliefs (that is, it has logical uncertainty)?
How can a mind reason about (possibly logical) counterfactuals to make decisions?
How can a mind reason about itself, or improve itself?
What even is an “agent”? What sorts of agents should we expect to be selected for by evolution or gradient descent or any other selection process?
What are “goals”? How do we formalize a goal that’s about the world, instead of our beliefs about the world (or our observations)? Even something as simple as “maximize the number of diamond atoms” has no clear route to formalization!