I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’
Yeah, I think this is the sense in which realism about rationality is an important disagreement.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’
Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. [...] They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness.
The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised).
I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.