I get the feeling that for AI safety, some people believe that it’s crucially important to be an expert in a whole bunch of fields of math in order to make any progress. In the past I took this advice and tried to deeply study computability theory, set theory, type theory—with the hopes of it someday giving me greater insight into AI safety.
Now, I think I was taking a wrong approach. To be fair, I still think being an expert in a whole bunch of fields of math is probably useful, especially if you want very strong abilities to reason about complicated systems. But, my model for the way I frame my learning is much different now.
I think my main model which describes my current perspective is that I think employing a lazy style of learning is superior for AI safety work. Lazy is meant in the computer science sense of only learning something when it seems like you need to know it in order to understand something important. I will contrast this with the model that one should learn a set of solid foundations first before going any further.
Obviously neither model can be absolutely correct in an extreme sense. I don’t, as a silly example, think that people who can’t do basic arithmetic should go into AI safety before building a foundation in math. And on the other side of the spectrum, I think it would be absurd to think that one should become a world renowned mathematician before reading their first AI safety paper. That said, even though both models are wrong, I think my current preference is for the lazy model rather than the foundation model.
Here are some points in favor of both, informed by my first-person experience.
Points in favor of the foundations model:
If you don’t have solid foundations in mathematics, you may not even be aware of things that you are missing.
Having solid foundations in mathematics will help you to think rigorously about things rather than having a vague non-reductionistic view of AI concepts.
Subpoint: MIRI work is motivated by coming up with new mathematics that can describe error-tolerant agents without relying on fuzzy statements like “machine learning relies on heuristics so we need to study heuristics rather than hard math to do alignment.”
We should try to learn the math that will be useful for AI safety in the future, rather than what is being used for machine learning papers right now. If your view of AI is that it is at least a few decades away, then it’s possible that learning the foundations of mathematics will be more robustly useful no matter where the field shifts.
Points in favor of the lazy model:
Time is limited and it usually takes several years to become proficient in the foundations of mathematics. This is time that could have been spent reading actual research directly related to AI safety.
The lazy model is better for my motivation, since it makes me feel like I am actually learning about what’s important, rather than doing homework.
Learning foundational math often looks a lot like just taking a shotgun and learning everything that seems vaguely relevant to agent foundations. Unless you have a very strong passion for this type of mathematics, it would seem outright strange that this type of learning is fun.
It’s not clear that the MIRI approach is correct. I don’t have a strong opinion on this, however
Even if the MIRI approach was correct, I don’t think it’s my comparative advantage to do foundational mathematics.
The lazy model will naturally force you to learn the things that are actually relevant, as measured by how much you come in contact with them. By contrast, the foundational model forces you to learn things which might not be relevant at all. Obviously, we won’t know what is and isn’t relevant beforehand, but I currently err on the side of saying that some things won’t be relevant if they don’t have a current direct input to machine learning.
Even if AI is many decades away, machine learning has been around for a long time, and it seems like the math useful for machine learning hasn’t changed much. So, it seems like a safe bet that foundational math won’t be relevant for understanding normal machine learning research any time soon.
I’m somewhat sympathetic to this. You probably don’t need the ability, prior to working on AI safety, to already be familiar with a wide variety of mathematics used in ML, by MIRI, etc.. To be specific, I wouldn’t be much concerned if you didn’t know category theory, more than basic linear algebra, how to solve differential equations, how to integrate together probability distributions, or even multivariate calculus prior to starting on AI safety work, but I would be concerned if you didn’t have deep experience with writing mathematical proofs beyond high school geometry (although I hear these days they teach geometry differently than I learned it—by re-deriving everything in Elements), say the kind of experience you would get from studying graduate level algebra, topology, measure theory, combinatorics, etc..
This might also be a bit of motivated reasoning on my part, to reflect Dagon’s comments, since I’ve not gone back to study category theory since I didn’t learn it in school and I haven’t had specific need for it, but my experience has been that having solid foundations in mathematical reasoning and proof writing is what’s most valuable. The rest can, as you say, be learned lazily, since your needs will become apparent and you’ll have enough mathematical fluency to find and pursue those fields of mathematics you may discover you need to know.
Beware motivated reasoning. There’s a large risk that you have noticed that something is harder for you than it seems for others, and instead of taking that as evidence that you should find another avenue to contribute, you convince yourself that you can take the same path but do the hard part later ( and maybe never ).
But you may be on to something real—it’s possible that the math approach is flawed, and some less-formal modeling (or other domain of formality) can make good progress. If your goal is to learn and try stuff for your own amusement, pursuing that seems promising. If your goals include getting respect (and/or payment) from current researchers, you’re probably stuck doing things their way, at least until you establish yourself.
That’s a good point about motivated reasoning. I should distinguish arguments that the lazy approach is better for people and arguments that it’s better for me. Whether it’s better for people more generally depends on the reference class we’re talking about. I will assume people who are interested in the foundations of mathematics as a hobby outside of AI safety should take my advise less seriously.
However, I still think that it’s not exactly clear that going the foundational route is actually that useful on a per-unit time basis. The model I proposed wasn’t as simple as “learn the formal math” versus “think more intuitively.” It was specifically a question of whether we should learn the math on an as-needed basis. For that reason, I’m still skeptical that going out and reading textbooks on subjects that are only vaguely related to current machine learning work is valuable for the vast majority of people who want to go into AI safety as quickly as possible.
Sidenote: I think there’s a failure mode of not adequately optimizing time, or being insensitive to time constraints. Learning an entire field of math from scratch takes a lot of time, even for the brightest people alive. I’m worried that, “Well, you never know if subject X might be useful” is sometimes used as a fully general counterargument. The question is not, “Might this be useful?” The question is, “Is this the most useful thing I could learn in the next time interval?”
A lot depends on your model of progress, and whether you’ll be able to predict/recognize what’s important to understand, and how deeply one must understand it for the project at hand.
Perhaps you shouldn’t frame it as “study early” vs “study late”, but “study X” vs “study Y”. If you don’t go deep on math foundations behind ML and decision theory, what are you going deep on instead? It seems very unlikely for you to have significant research impact without being near-expert in at least some relevant topic.
I don’t want to imply that this is the only route to impact, just the only route to impactful research. You can have significant non-research impact by being good at almost anything—accounting, management, prototype construction, data handling, etc.
I don’t want to imply that this is the only route to impact, just the only route to impactful research.
“Only” seems a little strong, no? To me, the argument seems to be better expressed as: if you want to build on existing work where there’s unlikely to be low-hanging fruit, you should be an expert. But what if there’s a new problem, or one that’s incorrectly framed? Why should we think there isn’t low-hanging conceptual fruit, or exploitable problems to those with moderate experience?
Perhaps you shouldn’t frame it as “study early” vs “study late”, but “study X” vs “study Y”.
My point was that these are separate questions. If you begin to suspect that understanding ML research requires an understanding of type theory, then you can start learning type theory. Alternatively, you can learn type theory before researching machine learning—ie. reading machine learning papers—in the hopes that it builds useful groundwork.
But what you can’t do is learn type theory and read machine learning research papers at the same time. You must make tradeoffs. Each minute you spend learning type theory is a minute you could have spent reading more machine learning research.
The model I was trying to draw was not one where I said, “Don’t learn math.” I explicitly said it was a model where you learn math as needed.
My point was not intended to be about my abilities. This is a valid concern, but I did not think that was my primary argument. Even conditioning on having outstanding abilities to learn every subject, I still think my argument (weakly) holds.
Note: I also want to say I’m kind of confused because I suspect that there’s an implicit assumption that reading machine learning research is inherently easier than learning math. I side with the intuition that math isn’t inherently difficult, it just requires memorizing a lot of things and practicing. The same is true for reading ML papers, which makes me confused why this is being framed as a debate over whether people have certain abilities to learn and do research.
I’m trying to find a balance here. I think that there has to be a direct enough relation to a problem that you’re trying to solve to prevent the task expanding to the point where it takes forever, but you also have to be willing to engage in exploration
I get the feeling that for AI safety, some people believe that it’s crucially important to be an expert in a whole bunch of fields of math in order to make any progress. In the past I took this advice and tried to deeply study computability theory, set theory, type theory—with the hopes of it someday giving me greater insight into AI safety.
Now, I think I was taking a wrong approach. To be fair, I still think being an expert in a whole bunch of fields of math is probably useful, especially if you want very strong abilities to reason about complicated systems. But, my model for the way I frame my learning is much different now.
I think my main model which describes my current perspective is that I think employing a lazy style of learning is superior for AI safety work. Lazy is meant in the computer science sense of only learning something when it seems like you need to know it in order to understand something important. I will contrast this with the model that one should learn a set of solid foundations first before going any further.
Obviously neither model can be absolutely correct in an extreme sense. I don’t, as a silly example, think that people who can’t do basic arithmetic should go into AI safety before building a foundation in math. And on the other side of the spectrum, I think it would be absurd to think that one should become a world renowned mathematician before reading their first AI safety paper. That said, even though both models are wrong, I think my current preference is for the lazy model rather than the foundation model.
Here are some points in favor of both, informed by my first-person experience.
Points in favor of the foundations model:
If you don’t have solid foundations in mathematics, you may not even be aware of things that you are missing.
Having solid foundations in mathematics will help you to think rigorously about things rather than having a vague non-reductionistic view of AI concepts.
Subpoint: MIRI work is motivated by coming up with new mathematics that can describe error-tolerant agents without relying on fuzzy statements like “machine learning relies on heuristics so we need to study heuristics rather than hard math to do alignment.”
We should try to learn the math that will be useful for AI safety in the future, rather than what is being used for machine learning papers right now. If your view of AI is that it is at least a few decades away, then it’s possible that learning the foundations of mathematics will be more robustly useful no matter where the field shifts.
Points in favor of the lazy model:
Time is limited and it usually takes several years to become proficient in the foundations of mathematics. This is time that could have been spent reading actual research directly related to AI safety.
The lazy model is better for my motivation, since it makes me feel like I am actually learning about what’s important, rather than doing homework.
Learning foundational math often looks a lot like just taking a shotgun and learning everything that seems vaguely relevant to agent foundations. Unless you have a very strong passion for this type of mathematics, it would seem outright strange that this type of learning is fun.
It’s not clear that the MIRI approach is correct. I don’t have a strong opinion on this, however
Even if the MIRI approach was correct, I don’t think it’s my comparative advantage to do foundational mathematics.
The lazy model will naturally force you to learn the things that are actually relevant, as measured by how much you come in contact with them. By contrast, the foundational model forces you to learn things which might not be relevant at all. Obviously, we won’t know what is and isn’t relevant beforehand, but I currently err on the side of saying that some things won’t be relevant if they don’t have a current direct input to machine learning.
Even if AI is many decades away, machine learning has been around for a long time, and it seems like the math useful for machine learning hasn’t changed much. So, it seems like a safe bet that foundational math won’t be relevant for understanding normal machine learning research any time soon.
I happened to be looking at something else and saw this comment thread from about a month ago that is relevant to your post.
I’m somewhat sympathetic to this. You probably don’t need the ability, prior to working on AI safety, to already be familiar with a wide variety of mathematics used in ML, by MIRI, etc.. To be specific, I wouldn’t be much concerned if you didn’t know category theory, more than basic linear algebra, how to solve differential equations, how to integrate together probability distributions, or even multivariate calculus prior to starting on AI safety work, but I would be concerned if you didn’t have deep experience with writing mathematical proofs beyond high school geometry (although I hear these days they teach geometry differently than I learned it—by re-deriving everything in Elements), say the kind of experience you would get from studying graduate level algebra, topology, measure theory, combinatorics, etc..
This might also be a bit of motivated reasoning on my part, to reflect Dagon’s comments, since I’ve not gone back to study category theory since I didn’t learn it in school and I haven’t had specific need for it, but my experience has been that having solid foundations in mathematical reasoning and proof writing is what’s most valuable. The rest can, as you say, be learned lazily, since your needs will become apparent and you’ll have enough mathematical fluency to find and pursue those fields of mathematics you may discover you need to know.
Beware motivated reasoning. There’s a large risk that you have noticed that something is harder for you than it seems for others, and instead of taking that as evidence that you should find another avenue to contribute, you convince yourself that you can take the same path but do the hard part later ( and maybe never ).
But you may be on to something real—it’s possible that the math approach is flawed, and some less-formal modeling (or other domain of formality) can make good progress. If your goal is to learn and try stuff for your own amusement, pursuing that seems promising. If your goals include getting respect (and/or payment) from current researchers, you’re probably stuck doing things their way, at least until you establish yourself.
That’s a good point about motivated reasoning. I should distinguish arguments that the lazy approach is better for people and arguments that it’s better for me. Whether it’s better for people more generally depends on the reference class we’re talking about. I will assume people who are interested in the foundations of mathematics as a hobby outside of AI safety should take my advise less seriously.
However, I still think that it’s not exactly clear that going the foundational route is actually that useful on a per-unit time basis. The model I proposed wasn’t as simple as “learn the formal math” versus “think more intuitively.” It was specifically a question of whether we should learn the math on an as-needed basis. For that reason, I’m still skeptical that going out and reading textbooks on subjects that are only vaguely related to current machine learning work is valuable for the vast majority of people who want to go into AI safety as quickly as possible.
Sidenote: I think there’s a failure mode of not adequately optimizing time, or being insensitive to time constraints. Learning an entire field of math from scratch takes a lot of time, even for the brightest people alive. I’m worried that, “Well, you never know if subject X might be useful” is sometimes used as a fully general counterargument. The question is not, “Might this be useful?” The question is, “Is this the most useful thing I could learn in the next time interval?”
A lot depends on your model of progress, and whether you’ll be able to predict/recognize what’s important to understand, and how deeply one must understand it for the project at hand.
Perhaps you shouldn’t frame it as “study early” vs “study late”, but “study X” vs “study Y”. If you don’t go deep on math foundations behind ML and decision theory, what are you going deep on instead? It seems very unlikely for you to have significant research impact without being near-expert in at least some relevant topic.
I don’t want to imply that this is the only route to impact, just the only route to impactful research.
You can have significant non-research impact by being good at almost anything—accounting, management, prototype construction, data handling, etc.
“Only” seems a little strong, no? To me, the argument seems to be better expressed as: if you want to build on existing work where there’s unlikely to be low-hanging fruit, you should be an expert. But what if there’s a new problem, or one that’s incorrectly framed? Why should we think there isn’t low-hanging conceptual fruit, or exploitable problems to those with moderate experience?
I like your phrasing better than mine. “only” is definitely too strong. “most likely path to”?
My point was that these are separate questions. If you begin to suspect that understanding ML research requires an understanding of type theory, then you can start learning type theory. Alternatively, you can learn type theory before researching machine learning—ie. reading machine learning papers—in the hopes that it builds useful groundwork.
But what you can’t do is learn type theory and read machine learning research papers at the same time. You must make tradeoffs. Each minute you spend learning type theory is a minute you could have spent reading more machine learning research.
The model I was trying to draw was not one where I said, “Don’t learn math.” I explicitly said it was a model where you learn math as needed.
My point was not intended to be about my abilities. This is a valid concern, but I did not think that was my primary argument. Even conditioning on having outstanding abilities to learn every subject, I still think my argument (weakly) holds.
Note: I also want to say I’m kind of confused because I suspect that there’s an implicit assumption that reading machine learning research is inherently easier than learning math. I side with the intuition that math isn’t inherently difficult, it just requires memorizing a lot of things and practicing. The same is true for reading ML papers, which makes me confused why this is being framed as a debate over whether people have certain abilities to learn and do research.
I’m trying to find a balance here. I think that there has to be a direct enough relation to a problem that you’re trying to solve to prevent the task expanding to the point where it takes forever, but you also have to be willing to engage in exploration