I’m an Assistant Professor at Carnegie Mellon’s Machine Learning Department. I’m also a core faculty member in CMU’s Neuroscience Institute, and hold a courtesy appointment in the Robotics Institute.
My lab works at the intersection of neuroscience & AI to reverse-engineer animal intelligence and build the next generation of autonomous agents, responsibly and safely.
Learn more here: https://cs.cmu.edu/~anayebi
Thanks! I really appreciate this, and I think your natural-latents framing fits nicely with the Part I point about needing to compress D down to a small set of crisp, structured latents. On the lexicographic point: it’s worth noting that even though Theorem 3 writes the full objective as a discounted sum, the safety heads U1-U4 aren’t long-horizon objectives — they’re local one-step tests whose optimal action doesn’t depend on future predictions. For example, U1 is automatically satisfied each round by waiting (and once the human approves the proposed action, the agent simply executes it, thereby engaging U2-U5), and U4 is a one-step reversibility check against an inaction baseline, not a long-run impact estimate. The only head with genuine long-horizon structure is U5, which sits below the safety heads, so discounting never creates optimization pressure on them. This makes the whole scheme intentionally deontic and “natural-latent–friendly”, exactly matching the tractable regime suggested by the large-D lower bounds of Part I.