I am Margot Stakenborg, and I have worked with Dovetail in this winter fellowship cohort. I have a background in theoretical physics and philosophy of physics, and now making a switch into conceptual mechinterp, after having been interested in it and learning about it for some years. I have been working with Dovetail on formalising world models, I am writing up a sequence of posts on the philosophical and mathematical prerequisites for proper world models, and which tools from physics can help us understand and analyse different world models, and I will dive into the different definitions of “world model” that float around in mechinterp and AI safety literature. Things I will discuss are:
How is the concept “world model” used in different areas of ML literature
Concept representation in the brain: new frontiers from neuroscience
Tools from physics: renormalisation and coarse-graining
What are “natural features”?
When can networks find similar representations of the world as we do?
Can NNs discover new natural kinds?
Theoretical equivalence and intertheoretic reduction
Bayesian experimental design
And probably more..
I hope to build this out into a quite comprehensive and complete sequence. Do let me know if there are other questions or subjects you would be interested in to read about!
Hi everyone!
I am Margot Stakenborg, and I have worked with Dovetail in this winter fellowship cohort. I have a background in theoretical physics and philosophy of physics, and now making a switch into conceptual mechinterp, after having been interested in it and learning about it for some years. I have been working with Dovetail on formalising world models, I am writing up a sequence of posts on the philosophical and mathematical prerequisites for proper world models, and which tools from physics can help us understand and analyse different world models, and I will dive into the different definitions of “world model” that float around in mechinterp and AI safety literature. Things I will discuss are:
How is the concept “world model” used in different areas of ML literature
Concept representation in the brain: new frontiers from neuroscience
Tools from physics: renormalisation and coarse-graining
What are “natural features”?
When can networks find similar representations of the world as we do?
Can NNs discover new natural kinds?
Theoretical equivalence and intertheoretic reduction
Bayesian experimental design
And probably more..
I hope to build this out into a quite comprehensive and complete sequence. Do let me know if there are other questions or subjects you would be interested in to read about!