Particularly the open problems post (once you know what AIXI is).
For a balance between theory and implementation, I think Michael Cohen’s work on AIXI-like agents is promising.
Also look into Alex Altair’s selection theorems, John Wentworth’s natural abstractions, Vanessa Kosoy’s infra-Bayesianism (and more generally learning theoretic agenda which I suppose I’m part of), and Abram Demski’s trust tiling.
Check out my research program:
https://www.lesswrong.com/s/sLqCreBi2EXNME57o
Particularly the open problems post (once you know what AIXI is).
For a balance between theory and implementation, I think Michael Cohen’s work on AIXI-like agents is promising.
Also look into Alex Altair’s selection theorems, John Wentworth’s natural abstractions, Vanessa Kosoy’s infra-Bayesianism (and more generally learning theoretic agenda which I suppose I’m part of), and Abram Demski’s trust tiling.
If you want to connect with alignment researchers you could attend the agent foundations conference at CMU, apply by tomorrow: https://www.lesswrong.com/posts/cuf4oMFHEQNKMXRvr/agent-foundations-2025-at-cmu