What you’re saying seems more galaxy-brained than what I was saying in my notes, and I’m probably not understanding it well. Maybe I’ll try to just briefly (re)state some of my claims that seem most relevant to what you’re saying here (with not much justification for my claims provided in my present comment, but there’s some in the post), and then if it looks to you like I’m missing your point, feel very free to tell me that and I can then put some additional effort into understanding you.
So, first, math is this richly infinite thing that will never be mostly done.
If one is a certain kind of guy doing alignment, one might hope that one could understand how e.g. mathematical thinking works (or could work), and then make like an explicit math AI one can understand (one would probably really want this for science or for doing stuff in general[1], but a fortiori one would need to be able to do this for math).[2]
But oops, this is very cursed, because thinking is an infinitely rich thing, like math!
I think a core idea here is that thinking is a technological thing. Like, one aim of notes 1–6 (and especially 3 and 4) is to “reprogram” the reader into thinking this way about thinking. That is, the point is to reprogram the reader away from sth like “Oh, how does thinking, the definite thing, work? Yea, this is an interesting puzzle that we haven’t quite cracked yet. You probably have to, like, combine logical deduction with some probability stuff or something, and then like also the right decision theory (which still requires some work but we’re getting there), and then maybe a few other components that we’re missing, but bro we will totally get there with a few ideas about how to add search heuristics, or once we’ve figured out a few more details about how abstraction works, or something.”
Like, a core intuition is to think of thinking like one would think of, like, the totality of humanity’s activities, or about human technology. There’s a great deal going on! It’s a developing sort of thing! It’s the sort of thing where you need/want to have genuinely new inventions! There is a rich variety of useful thinking-structures, just like there is a rich variety of useful technological devices/components, just like there is a rich variety of mathematical things!
Given this, thinking starts to look a lot like math — in particular, the endeavor to understand thinking will probably always be mostly unfinished. It’s the sort of thing that calls for an infinite library of textbooks to be written.
In alignment, we’re faced with an infinitely rich domain — of ways to think, or technologies/components/ideas for thinking, or something. This infinitely rich domain again calls for textbooks to keep being written as one proceeds.
Also, the thing/thinker/thought writing these textbooks will itself need to be rich and developing as well, just like the math AI will need to be rich and developing.
Generally, you can go meta more times, but on each step, you’ll just be asking “how do I think about this infinitely rich domain?”, answering which will again be an infinite endeavor.
(* Also, there’s something further to be said also about how [[doing math] and [thinking about how one should do math]] are not that separate.)
I’m at like inside-view p=0.93 that the above presents the right vibe to have about thinking (like, maybe genuinely about its potential development forever, but if it’s like technically only the right vibe wrt the next 1012 years of thinking (at a 2024 rate) or something, then I’m still going to count that as thinking having this infinitary vibe for our purposes).[3]
However, the question about whether one can in principle make a math AI that is in some sense explicit/understandable anyway (that in fact proves impressive theorems with a non-galactic amount of compute) is less clear. Making progress on this question might require us to clarify what we want to mean by “explicit/understandable”. We could get criteria on this notion from thinking through what we want from it in the context of making an explicit/understandable AI that makes mind uploads (and “does nothing else”). I say some more stuff about this question in 4.4.
if one is an imo complete lunatic :), one is hopeful about getting this so that one can make an AI sovereign with “the right utility function” that “makes there be a good future spacetime block”; if one is an imo less complete lunatic :), one is hopeful about getting this so that one can make mind uploads and have the mind uploads take over the world or something
to clarify: I actually tend to like researchers with this property much more than I like basically any other “researchers doing AI alignment” (even though researchers with this property are imo engaged in a contemporary form of alchemy), and I can feel the pull of this kind of direction pretty strongly myself (also, even if the direction is confused, it still seems like an excellent thing to work on to understand stuff better). I’m criticizing researchers with this property not because I consider them particularly confused/wrong compared to others, but in part because I instead consider them sufficiently reasonable/right to be worth engaging with (and because I wanted to think through these questions for myself)!
I’m saying this because you ask me about my certainty in something vaguely like this — but I’m aware I might be answering the wrong question here. Feel free to try to clarify the question if so.
Thank you for your comment!
What you’re saying seems more galaxy-brained than what I was saying in my notes, and I’m probably not understanding it well. Maybe I’ll try to just briefly (re)state some of my claims that seem most relevant to what you’re saying here (with not much justification for my claims provided in my present comment, but there’s some in the post), and then if it looks to you like I’m missing your point, feel very free to tell me that and I can then put some additional effort into understanding you.
So, first, math is this richly infinite thing that will never be mostly done.
If one is a certain kind of guy doing alignment, one might hope that one could understand how e.g. mathematical thinking works (or could work), and then make like an explicit math AI one can understand (one would probably really want this for science or for doing stuff in general[1], but a fortiori one would need to be able to do this for math).[2]
But oops, this is very cursed, because thinking is an infinitely rich thing, like math!
I think a core idea here is that thinking is a technological thing. Like, one aim of notes 1–6 (and especially 3 and 4) is to “reprogram” the reader into thinking this way about thinking. That is, the point is to reprogram the reader away from sth like “Oh, how does thinking, the definite thing, work? Yea, this is an interesting puzzle that we haven’t quite cracked yet. You probably have to, like, combine logical deduction with some probability stuff or something, and then like also the right decision theory (which still requires some work but we’re getting there), and then maybe a few other components that we’re missing, but bro we will totally get there with a few ideas about how to add search heuristics, or once we’ve figured out a few more details about how abstraction works, or something.”
Like, a core intuition is to think of thinking like one would think of, like, the totality of humanity’s activities, or about human technology. There’s a great deal going on! It’s a developing sort of thing! It’s the sort of thing where you need/want to have genuinely new inventions! There is a rich variety of useful thinking-structures, just like there is a rich variety of useful technological devices/components, just like there is a rich variety of mathematical things!
Given this, thinking starts to look a lot like math — in particular, the endeavor to understand thinking will probably always be mostly unfinished. It’s the sort of thing that calls for an infinite library of textbooks to be written.
In alignment, we’re faced with an infinitely rich domain — of ways to think, or technologies/components/ideas for thinking, or something. This infinitely rich domain again calls for textbooks to keep being written as one proceeds.
Also, the thing/thinker/thought writing these textbooks will itself need to be rich and developing as well, just like the math AI will need to be rich and developing.
Generally, you can go meta more times, but on each step, you’ll just be asking “how do I think about this infinitely rich domain?”, answering which will again be an infinite endeavor.
You could also try to make sense of climbing to higher infinite ordinal levels, I guess?
(* Also, there’s something further to be said also about how [[doing math] and [thinking about how one should do math]] are not that separate.)
I’m at like inside-view p=0.93 that the above presents the right vibe to have about thinking (like, maybe genuinely about its potential development forever, but if it’s like technically only the right vibe wrt the next 1012 years of thinking (at a 2024 rate) or something, then I’m still going to count that as thinking having this infinitary vibe for our purposes).[3]
However, the question about whether one can in principle make a math AI that is in some sense explicit/understandable anyway (that in fact proves impressive theorems with a non-galactic amount of compute) is less clear. Making progress on this question might require us to clarify what we want to mean by “explicit/understandable”. We could get criteria on this notion from thinking through what we want from it in the context of making an explicit/understandable AI that makes mind uploads (and “does nothing else”). I say some more stuff about this question in 4.4.
if one is an imo complete lunatic :), one is hopeful about getting this so that one can make an AI sovereign with “the right utility function” that “makes there be a good future spacetime block”; if one is an imo less complete lunatic :), one is hopeful about getting this so that one can make mind uploads and have the mind uploads take over the world or something
to clarify: I actually tend to like researchers with this property much more than I like basically any other “researchers doing AI alignment” (even though researchers with this property are imo engaged in a contemporary form of alchemy), and I can feel the pull of this kind of direction pretty strongly myself (also, even if the direction is confused, it still seems like an excellent thing to work on to understand stuff better). I’m criticizing researchers with this property not because I consider them particularly confused/wrong compared to others, but in part because I instead consider them sufficiently reasonable/right to be worth engaging with (and because I wanted to think through these questions for myself)!
I’m saying this because you ask me about my certainty in something vaguely like this — but I’m aware I might be answering the wrong question here. Feel free to try to clarify the question if so.