It’s actually pretty hard to come up with agendas in the intersection of “seems like an alignment-relevant topic it’d be useful to popularize” and “has complicated math which would be insightful and useful to visualize/simulate”.
Natural abstractions, ARC’s ELK, Shard Theory, and general embedded-agency theories are currently better understood by starting from the concepts, not the math.
Infrabayesianism, Orthogonal’s QACI, logical-induction stuff, and ARC’s heuristical arguments seem too abstract to allow interesting visual modeling. (Maybe if you get really creative...)
There are various alignment-relevant theorems and mathematical tools scattered around which could be interestingly visualized (e. g., my own causal mirrors), but most of them are niche tools, so it’s not obvious there’s much value in popularizing them.
Also, not sure if that’s a deal-breaker for you, but some important agendas are not technically “about” AI Safety at all, even if they resulted from people starting from the alignment problem and iteratively identifying various subproblems necessary for making progress on it. This process often moves you outside the field of AI. For example: natural abstractions, FFS, and heuristical arguments, which don’t even centrally study minds at all.
Some of the stuff by the causal incentives working group could be beautifully visualizable, especially things with a lot of causal graphs (it might be that manim graphs are still buggy and thus this is blocked). Finite factored sets and factored causal spaces? Maybe tiling agents theory?
Which AI Safety math topics deserve a high-quality Manim video? (think 3Blue1Brown’s video style)
Youtube videos get more views than blog posts. Beautiful animations even more so.
Singular Learning Theory and Simplex’s work (e. g. this), maybe? Cartesian Frames and Finite Factored Sets might also work, but I’m less sure about those.
It’s actually pretty hard to come up with agendas in the intersection of “seems like an alignment-relevant topic it’d be useful to popularize” and “has complicated math which would be insightful and useful to visualize/simulate”.
Natural abstractions, ARC’s ELK, Shard Theory, and general embedded-agency theories are currently better understood by starting from the concepts, not the math.
Infrabayesianism, Orthogonal’s QACI, logical-induction stuff, and ARC’s heuristical arguments seem too abstract to allow interesting visual modeling. (Maybe if you get really creative...)
There are various alignment-relevant theorems and mathematical tools scattered around which could be interestingly visualized (e. g., my own causal mirrors), but most of them are niche tools, so it’s not obvious there’s much value in popularizing them.
Also, not sure if that’s a deal-breaker for you, but some important agendas are not technically “about” AI Safety at all, even if they resulted from people starting from the alignment problem and iteratively identifying various subproblems necessary for making progress on it. This process often moves you outside the field of AI. For example: natural abstractions, FFS, and heuristical arguments, which don’t even centrally study minds at all.
Some of the stuff by the causal incentives working group could be beautifully visualizable, especially things with a lot of causal graphs (it might be that manim graphs are still buggy and thus this is blocked). Finite factored sets and factored causal spaces? Maybe tiling agents theory?