Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician? I’ve found it really hard to find anything substantive that doesn’t assume a lot of context from those fields.
Highly recommend to watch all the playlists with a pencil and paper and actually write out the equations while thinking about them to get a good grasp on the concepts.
It’s not like being a programmer or mathematician is an inherent quality that some people are born with. They got it by reading and studying and practicing. If you (correctly!) believe that knowing how to code or how to follow proofs is a prerequisite to thinking clearly about ML/AI, then you can study that prerequisite first.
So, my position here is basically this: I think that a substantial proportion of immediate term risk from AI is unwise implementation by people who can’t think clearly about it. If I thought learning to code and follow mathematical proofs is a prerequisite to thinking clearly about this stuff on any level, then I’d think we were screwed, because doctors, hospital administrators, politicians etc are not going to do that.
Okay, it sounds like I misunderstood your question a few different ways. It sounds like you’re not looking for “how does AI work” (internally), but rather for guidance on how to wisely and safely use existing AI systems in professional fields like medicine. Like, “here are the capabilities, here are the limitations, be aware that it makes things up sometimes so you’ve gotta check its work”?
Yes, that’s closer to it, although I feel like I’m in the unfortunate position of understanding just enough to notice and be put off by the inaccuracies in most content of that description. (Also, “you’ve got to check its work” has become a meaningless phrase that produces no particular behavior in anybody, due to it being parroted disclaimer-style by people demoing products whose advertised benefits only exist if it’s perfectly fine to not, in fact, check its work.)
I also feel as though:
1) there are some things more in the ‘how does it work’ side of things that non-programmers can usefully have some understanding of? [1]
Non-programmers are capable of understanding, using and crafting system prompts. And I’ve definitely read articles about the jagged edge in intelligence/performance that I could have understood if all the examples hadn’t been programming tasks!
2) avoiding doing anything but telling people exactly what a given model in a given format can and can’t do, without going at all into why, leaves them vulnerable to the claim, “Ah, but this update means you don’t have to worry about that any more”… I think there are some things people can usefully understand that are in the vein of e.g. “which current limitations represent genuinely hard problems, and which might be more to do with design choices?” Again—something that goes some amount into the whys of things, but also something I think I have some capability to understand as a non-programmer. [2]
For instance: when I explain that the programs being promoted to them for use in academic research have as their basis a sort of ‘predict-how-this-text-continues machine’ which, when running on a large enough sample, winds up at-least-appearing to understand things because of how it’s absorbed patterns in our use of language, that also has lots of layers of additional structure/training etc. on top which further specifies the sort of output it produces, in an increasingly-good but not yet perfect attempt to get it to not make stuff up, which it doesn’t ‘understand’ that it is doing because its base thing is patterns and not facts and logic… I find that they then get that they should check its output, and are more likely to report actually doing so in future. I’m sure there are things about this explanation that you’ll want to take issue with and I welcome that—and I’m aware of the failure mode of ‘fancy autocorrect’ which is also not an especially helpful model of reality—but it does actually seem to help!
Example: I initially parsed LLMs’ tendency to make misleading generalisations when asked to summarise scientific papers—which newer models were actually worse at—just as, ‘okay, so they’re bad at that then.” But then I learned some more, did some more research—without becoming a programmer—and I now feel I can speculate that this is one of the ones that could be quite fixable, as a plausible reason for this is that all the big commercial LLMs we have are being designed as general-purpose objects, and it’s plausible that a general-public definition of helpfulness trading off against scientific precision is the reason the studied LLMs actually got worse in their newer models. This does seem like a helpful thing to be able to do when the specifics of what stuff can and can’t do is likely to change pretty quickly—and what things were trained for, how ‘helpfulness’ was defined, the fact that a decision was made to make things aimed ultimately towards general intelligence and not specific uses—doesn’t seem like stuff you need to code to understand.
I’m not sure what it would even mean to teach something substantive about ML/AI to someone who lacks the basic concepts of programming. Like, if someone with zero programming experience and median-high-school level math background asked me how to learn more about ML/AI, I would say “you lack the foundations to achieve any substantive understanding at all, go do a programming 101 course and some calculus at a bare minimum”.
For instance, I could imagine giving such a person a useful and accurate visual explanation of how modern ML works, but without some programming experience they’re going to go around e.g. imagining ghosts in the machine, because that’s a typical mistake people make when they have zero programming experience. And a typical ML expert trying to give an explain-like-I’m-5 overview wouldn’t even think to address a confusion that basic. I’d guess that there’s quite a few things like that, as is typical. Inferential distances are not short.
Perhaps as an example, I can offer some comparisons between myself and my co-workers.
None of us are programmers. None of us are ever going to become programmers. We have reached a point where it is vitally necessary that people who are not programmers reach a level of understanding that will allow them to sensibly make the decisions they are going to have to make about how to use machine learning models.
Compared to them, I am much more able to predict what LLMs will and won’t be good at. I can create user prompts. I know what system prompts are and where I can read the officially published/leaked ones. I have some understanding of what goes on during some of the stages of training. I understand that and at least some of the reasons why this computer program is different from other computer programs.
I have encountered the kind of responses I’m getting here before—people seem to underestimate how much there is that it is possible to know in between ‘nothing’ and ‘things you cannot possibly comprehend unless you are a programmer’. They also don’t seem to have a sense of urgency about how vital it is that people who are not programmers—let’s say, doctors—are enabled to learn enough about machine learning models to enable them to make sensible decisions about using them.
they’re going to go around e.g. imagining ghosts in the machine, because that’s a typical mistake people make when they have zero programming experience
Tangential point, but I’m skeptical this is actually a very common error.
Maybe check out Why Machines Learn? I’ve only read a bit of it but the sense I’ve gotten is it does much better on conveying conceptual understanding than many sources
I think you might be in luck—in that there are things about LLMs that may be worth learning, and that are neither programming nor mathematics.
There is a third domain, the odd one out. It’s what you could call “LLM psychology”. A new, strange and somewhat underappreciated field that tries to take a higher-level look on how LLMs act and why. Of ways in which they are alike to humans and ways in which they differ. It’s a high weirdness domain that manages to have one leg in interpretability research and another in speculative philosophy, and spans a lot in between.
It can be useful. It’s where a lot of actionable insights like “LLMs are extremely good at picking up context cues from the data they’re given”, and “which is why few-shot prompts work so well” or “which is why they can be easily steered by vibes in their context, and at times in unexpected ways” can be found.
Dunno what exactly to recommend you, I don’t know what you already know, but off the top of my head:
The two Anthropic interpretability articles—they look scary, and the long versions go into considerable detail, but they are comprehensible to a layman, and may have the kind of information you’re looking for:
In a more esoteric vein—“the void” by nostalgebraist, which was well received on LW. Much more opinionated and speculative, but goes into a good amount of detail on how LLMs are made and how they function, in a way that more “mechanical” and math-heavy or code-heavy articles don’t. I can certainly endorse the first half of it.
Searching the keyword “prompt engineering” (both on here and Google) may guide you to some helpful resources. Sorry I don’t have anything specific to link you to.
Nope! However, I’ve had mixed success with asking LLMs and then having them repeatedly explain every part of the explanation I don’t understand; if nobody else has anything actually good that might be an okay starting place. (N.B. the sticking power of what little knowledge I can gain that way is not amazing; mostly I just get vague memories that xyz piece of jargon is relevant to the process in some way. I do think it marginally reduces my lostness in ML discussions, though.)
Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician? I’ve found it really hard to find anything substantive that doesn’t assume a lot of context from those fields.
Maybe start with 3Blue1Brown series on Neural networks ? This is still math but it has great visualizations.
I’d also recommend going through both the essence of linear algebra and the essence of calculus if you’re not familiar with those subjects as well.
Highly recommend to watch all the playlists with a pencil and paper and actually write out the equations while thinking about them to get a good grasp on the concepts.
It’s not like being a programmer or mathematician is an inherent quality that some people are born with. They got it by reading and studying and practicing. If you (correctly!) believe that knowing how to code or how to follow proofs is a prerequisite to thinking clearly about ML/AI, then you can study that prerequisite first.
So, my position here is basically this: I think that a substantial proportion of immediate term risk from AI is unwise implementation by people who can’t think clearly about it. If I thought learning to code and follow mathematical proofs is a prerequisite to thinking clearly about this stuff on any level, then I’d think we were screwed, because doctors, hospital administrators, politicians etc are not going to do that.
Okay, it sounds like I misunderstood your question a few different ways. It sounds like you’re not looking for “how does AI work” (internally), but rather for guidance on how to wisely and safely use existing AI systems in professional fields like medicine. Like, “here are the capabilities, here are the limitations, be aware that it makes things up sometimes so you’ve gotta check its work”?
Yes, that’s closer to it, although I feel like I’m in the unfortunate position of understanding just enough to notice and be put off by the inaccuracies in most content of that description. (Also, “you’ve got to check its work” has become a meaningless phrase that produces no particular behavior in anybody, due to it being parroted disclaimer-style by people demoing products whose advertised benefits only exist if it’s perfectly fine to not, in fact, check its work.)
I also feel as though:
1) there are some things more in the ‘how does it work’ side of things that non-programmers can usefully have some understanding of? [1]
Non-programmers are capable of understanding, using and crafting system prompts. And I’ve definitely read articles about the jagged edge in intelligence/performance that I could have understood if all the examples hadn’t been programming tasks!
2) avoiding doing anything but telling people exactly what a given model in a given format can and can’t do, without going at all into why, leaves them vulnerable to the claim, “Ah, but this update means you don’t have to worry about that any more”… I think there are some things people can usefully understand that are in the vein of e.g. “which current limitations represent genuinely hard problems, and which might be more to do with design choices?” Again—something that goes some amount into the whys of things, but also something I think I have some capability to understand as a non-programmer. [2]
For instance: when I explain that the programs being promoted to them for use in academic research have as their basis a sort of ‘predict-how-this-text-continues machine’ which, when running on a large enough sample, winds up at-least-appearing to understand things because of how it’s absorbed patterns in our use of language, that also has lots of layers of additional structure/training etc. on top which further specifies the sort of output it produces, in an increasingly-good but not yet perfect attempt to get it to not make stuff up, which it doesn’t ‘understand’ that it is doing because its base thing is patterns and not facts and logic… I find that they then get that they should check its output, and are more likely to report actually doing so in future. I’m sure there are things about this explanation that you’ll want to take issue with and I welcome that—and I’m aware of the failure mode of ‘fancy autocorrect’ which is also not an especially helpful model of reality—but it does actually seem to help!
Example: I initially parsed LLMs’ tendency to make misleading generalisations when asked to summarise scientific papers—which newer models were actually worse at—just as, ‘okay, so they’re bad at that then.”
But then I learned some more, did some more research—without becoming a programmer—and I now feel I can speculate that this is one of the ones that could be quite fixable, as a plausible reason for this is that all the big commercial LLMs we have are being designed as general-purpose objects, and it’s plausible that a general-public definition of helpfulness trading off against scientific precision is the reason the studied LLMs actually got worse in their newer models. This does seem like a helpful thing to be able to do when the specifics of what stuff can and can’t do is likely to change pretty quickly—and what things were trained for, how ‘helpfulness’ was defined, the fact that a decision was made to make things aimed ultimately towards general intelligence and not specific uses—doesn’t seem like stuff you need to code to understand.
I’m not sure what it would even mean to teach something substantive about ML/AI to someone who lacks the basic concepts of programming. Like, if someone with zero programming experience and median-high-school level math background asked me how to learn more about ML/AI, I would say “you lack the foundations to achieve any substantive understanding at all, go do a programming 101 course and some calculus at a bare minimum”.
For instance, I could imagine giving such a person a useful and accurate visual explanation of how modern ML works, but without some programming experience they’re going to go around e.g. imagining ghosts in the machine, because that’s a typical mistake people make when they have zero programming experience. And a typical ML expert trying to give an explain-like-I’m-5 overview wouldn’t even think to address a confusion that basic. I’d guess that there’s quite a few things like that, as is typical. Inferential distances are not short.
Perhaps as an example, I can offer some comparisons between myself and my co-workers.
None of us are programmers. None of us are ever going to become programmers. We have reached a point where it is vitally necessary that people who are not programmers reach a level of understanding that will allow them to sensibly make the decisions they are going to have to make about how to use machine learning models.
Compared to them, I am much more able to predict what LLMs will and won’t be good at. I can create user prompts. I know what system prompts are and where I can read the officially published/leaked ones. I have some understanding of what goes on during some of the stages of training. I understand that and at least some of the reasons why this computer program is different from other computer programs.
I have encountered the kind of responses I’m getting here before—people seem to underestimate how much there is that it is possible to know in between ‘nothing’ and ‘things you cannot possibly comprehend unless you are a programmer’. They also don’t seem to have a sense of urgency about how vital it is that people who are not programmers—let’s say, doctors—are enabled to learn enough about machine learning models to enable them to make sensible decisions about using them.
Tangential point, but I’m skeptical this is actually a very common error.
Learn Python and linear algebra. These are the substance!
Here’s a good (and free!) introductory linear algebra book: https://linear.axler.net/
For ML/AI itself, here are some good things meant for a general audience:
3Blue1Brown has good video course on neural networks: https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&si=b9K6DbMpwyLYXmX-
And for LLMs specifically, Andrej Karpathy has some great videos: https://www.youtube.com/watch?v=7xTGNNLPyMI
Maybe check out Why Machines Learn? I’ve only read a bit of it but the sense I’ve gotten is it does much better on conveying conceptual understanding than many sources
Also the Stephen Wolfram post about ChatGPT is good but maybe still too technical for what you’re looking for? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
I think you might be in luck—in that there are things about LLMs that may be worth learning, and that are neither programming nor mathematics.
There is a third domain, the odd one out. It’s what you could call “LLM psychology”. A new, strange and somewhat underappreciated field that tries to take a higher-level look on how LLMs act and why. Of ways in which they are alike to humans and ways in which they differ. It’s a high weirdness domain that manages to have one leg in interpretability research and another in speculative philosophy, and spans a lot in between.
It can be useful. It’s where a lot of actionable insights like “LLMs are extremely good at picking up context cues from the data they’re given”, and “which is why few-shot prompts work so well” or “which is why they can be easily steered by vibes in their context, and at times in unexpected ways” can be found.
Dunno what exactly to recommend you, I don’t know what you already know, but off the top of my head:
The two Anthropic interpretability articles—they look scary, and the long versions go into considerable detail, but they are comprehensible to a layman, and may have the kind of information you’re looking for:
Mapping the Mind of a Large Language Model (and the long version, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet)
Tracing the Thoughts of a Large Language Model (and the long version, On the Biology of a Large Language Model)
In a more esoteric vein—“the void” by nostalgebraist, which was well received on LW. Much more opinionated and speculative, but goes into a good amount of detail on how LLMs are made and how they function, in a way that more “mechanical” and math-heavy or code-heavy articles don’t. I can certainly endorse the first half of it.
Thank you! Yes I realise I didn’t make it very clear where I was starting from, I’ve read some of these but not all.
In general I’ve found Anthropic’s publications to be very good at not being impenetrable to non programmers where they don’t have to be
Searching the keyword “prompt engineering” (both on here and Google) may guide you to some helpful resources. Sorry I don’t have anything specific to link you to.
Thanks—yeah, prompt engineering is something I’ve been learning with a fair amount of success 🙂
Nope! However, I’ve had mixed success with asking LLMs and then having them repeatedly explain every part of the explanation I don’t understand; if nobody else has anything actually good that might be an okay starting place. (N.B. the sticking power of what little knowledge I can gain that way is not amazing; mostly I just get vague memories that xyz piece of jargon is relevant to the process in some way. I do think it marginally reduces my lostness in ML discussions, though.)