It’s actually pretty hard to come up with agendas in the intersection of “seems like an alignment-relevant topic it’d be useful to popularize” and “has complicated math which would be insightful and useful to visualize/simulate”.
Natural abstractions, ARC’s ELK, Shard Theory, and general embedded-agency theories are currently better understood by starting from the concepts, not the math.
Infrabayesianism, Orthogonal’s QACI, logical-induction stuff, and ARC’s heuristical arguments seem too abstract to allow interesting visual modeling. (Maybe if you get really creative...)
There are various alignment-relevant theorems and mathematical tools scattered around which could be interestingly visualized (e. g., my own causal mirrors), but most of them are niche tools, so it’s not obvious there’s much value in popularizing them.
Also, not sure if that’s a deal-breaker for you, but some important agendas are not technically “about” AI Safety at all, even if they resulted from people starting from the alignment problem and iteratively identifying various subproblems necessary for making progress on it. This process often moves you outside the field of AI. For example: natural abstractions, FFS, and heuristical arguments, which don’t even centrally study minds at all.
Singular Learning Theory and Simplex’s work (e. g. this), maybe? Cartesian Frames and Finite Factored Sets might also work, but I’m less sure about those.
It’s actually pretty hard to come up with agendas in the intersection of “seems like an alignment-relevant topic it’d be useful to popularize” and “has complicated math which would be insightful and useful to visualize/simulate”.
Natural abstractions, ARC’s ELK, Shard Theory, and general embedded-agency theories are currently better understood by starting from the concepts, not the math.
Infrabayesianism, Orthogonal’s QACI, logical-induction stuff, and ARC’s heuristical arguments seem too abstract to allow interesting visual modeling. (Maybe if you get really creative...)
There are various alignment-relevant theorems and mathematical tools scattered around which could be interestingly visualized (e. g., my own causal mirrors), but most of them are niche tools, so it’s not obvious there’s much value in popularizing them.
Also, not sure if that’s a deal-breaker for you, but some important agendas are not technically “about” AI Safety at all, even if they resulted from people starting from the alignment problem and iteratively identifying various subproblems necessary for making progress on it. This process often moves you outside the field of AI. For example: natural abstractions, FFS, and heuristical arguments, which don’t even centrally study minds at all.