Build a unified picture of agency from the perspectives of the fields that formally describe it, which “explains” agency about as well as evolution + genetics “explains” biology. Perhaps a little worse.
Oh. Well that’s kind of a low bar. Maybe we don’t disagree about this then, not sure. We agree that it’s not nearly enough for alignment, right?
Maybe, but I still think it’s a strategic mistake to aim at the center of “what a mind is” when you want to hit the center of (for example) “what trust is.”
Because of the question of reflective (in)stability in general, I think it’s quite hard to get a handle on anything really important in a mind other than by really understanding mind/agency. Otherwise you have no coordinates for what the mind “really is” in the sense of what elements of the mind will actually stick around.
I think you probably need to understand many things about minds a lot better than “evolution + genetics” understands biology before it makes much sense to try attacking questions about alignment mechanics in particular. To stick with the analogy, I suspect you might at least need the sort of mastery level where you understand Mitochondria and DNA transcription well enough to build your own basic functional versions of them from scratch before you can even really get started.
I agree that ‘we are confused about agency’ is not a good slogan for pointing to this inadequacy. I think ‘we haven’t advanced practical mind science to anywhere near the level we’ve advanced e.g. condensed matter physics’ is true and a blocker for alignment of superintelligence, but ‘we are confused about agency’ brings up much stronger associations around memes like ‘maybe Bayesian EV maximisation is conceptually wrong even in the idealised setting’ to me. These meme groups seem sufficiently distinct to merit separate slogans.
Oh. Well that’s kind of a low bar. Maybe we don’t disagree about this then, not sure. We agree that it’s not nearly enough for alignment, right?
Right, but I think it’s not enough in the sense that we need to develop the specific concepts which are relevant to alignment.
Mhm. I think those concepts are quite central to what an agent/mind is.
Maybe, but I still think it’s a strategic mistake to aim at the center of “what a mind is” when you want to hit the center of (for example) “what trust is.”
Because of the question of reflective (in)stability in general, I think it’s quite hard to get a handle on anything really important in a mind other than by really understanding mind/agency. Otherwise you have no coordinates for what the mind “really is” in the sense of what elements of the mind will actually stick around.
I think you probably need to understand many things about minds a lot better than “evolution + genetics” understands biology before it makes much sense to try attacking questions about alignment mechanics in particular. To stick with the analogy, I suspect you might at least need the sort of mastery level where you understand Mitochondria and DNA transcription well enough to build your own basic functional versions of them from scratch before you can even really get started.
I agree that ‘we are confused about agency’ is not a good slogan for pointing to this inadequacy. I think ‘we haven’t advanced practical mind science to anywhere near the level we’ve advanced e.g. condensed matter physics’ is true and a blocker for alignment of superintelligence, but ‘we are confused about agency’ brings up much stronger associations around memes like ‘maybe Bayesian EV maximisation is conceptually wrong even in the idealised setting’ to me. These meme groups seem sufficiently distinct to merit separate slogans.