Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician? I’ve found it really hard to find anything substantive that doesn’t assume a lot of context from those fields.
Highly recommend to watch all the playlists with a pencil and paper and actually write out the equations while thinking about them to get a good grasp on the concepts.
It’s not like being a programmer or mathematician is an inherent quality that some people are born with. They got it by reading and studying and practicing. If you (correctly!) believe that knowing how to code or how to follow proofs is a prerequisite to thinking clearly about ML/AI, then you can study that prerequisite first.
So, my position here is basically this: I think that a substantial proportion of immediate term risk from AI is unwise implementation by people who can’t think clearly about it. If I thought learning to code and follow mathematical proofs is a prerequisite to thinking clearly about this stuff on any level, then I’d think we were screwed, because doctors, hospital administrators, politicians etc are not going to do that.
Okay, it sounds like I misunderstood your question a few different ways. It sounds like you’re not looking for “how does AI work” (internally), but rather for guidance on how to wisely and safely use existing AI systems in professional fields like medicine. Like, “here are the capabilities, here are the limitations, be aware that it makes things up sometimes so you’ve gotta check its work”?
Yes, that’s closer to it, although I feel like I’m in the unfortunate position of understanding just enough to notice and be put off by the inaccuracies in most content of that description. (Also, “you’ve got to check its work” has become a meaningless phrase that produces no particular behavior in anybody, due to it being parroted disclaimer-style by people demoing products whose advertised benefits only exist if it’s perfectly fine to not, in fact, check its work.)
I also feel as though:
1) there are some things more in the ‘how does it work’ side of things that non-programmers can usefully have some understanding of? [1]
Non-programmers are capable of understanding, using and crafting system prompts. And I’ve definitely read articles about the jagged edge in intelligence/performance that I could have understood if all the examples hadn’t been programming tasks!
2) avoiding doing anything but telling people exactly what a given model in a given format can and can’t do, without going at all into why, leaves them vulnerable to the claim, “Ah, but this update means you don’t have to worry about that any more”… I think there are some things people can usefully understand that are in the vein of e.g. “which current limitations represent genuinely hard problems, and which might be more to do with design choices?” Again—something that goes some amount into the whys of things, but also something I think I have some capability to understand as a non-programmer. [2]
For instance: when I explain that the programs being promoted to them for use in academic research have as their basis a sort of ‘predict-how-this-text-continues machine’ which, when running on a large enough sample, winds up at-least-appearing to understand things because of how it’s absorbed patterns in our use of language, that also has lots of layers of additional structure/training etc. on top which further specifies the sort of output it produces, in an increasingly-good but not yet perfect attempt to get it to not make stuff up, which it doesn’t ‘understand’ that it is doing because its base thing is patterns and not facts and logic… I find that they then get that they should check its output, and are more likely to report actually doing so in future. I’m sure there are things about this explanation that you’ll want to take issue with and I welcome that—and I’m aware of the failure mode of ‘fancy autocorrect’ which is also not an especially helpful model of reality—but it does actually seem to help!
Example: I initially parsed LLMs’ tendency to make misleading generalisations when asked to summarise scientific papers—which newer models were actually worse at—just as, ‘okay, so they’re bad at that then.” But then I learned some more, did some more research—without becoming a programmer—and I now feel I can speculate that this is one of the ones that could be quite fixable, as a plausible reason for this is that all the big commercial LLMs we have are being designed as general-purpose objects, and it’s plausible that a general-public definition of helpfulness trading off against scientific precision is the reason the studied LLMs actually got worse in their newer models. This does seem like a helpful thing to be able to do when the specifics of what stuff can and can’t do is likely to change pretty quickly—and what things were trained for, how ‘helpfulness’ was defined, the fact that a decision was made to make things aimed ultimately towards general intelligence and not specific uses—doesn’t seem like stuff you need to code to understand.
I’m not sure what it would even mean to teach something substantive about ML/AI to someone who lacks the basic concepts of programming. Like, if someone with zero programming experience and median-high-school level math background asked me how to learn more about ML/AI, I would say “you lack the foundations to achieve any substantive understanding at all, go do a programming 101 course and some calculus at a bare minimum”.
For instance, I could imagine giving such a person a useful and accurate visual explanation of how modern ML works, but without some programming experience they’re going to go around e.g. imagining ghosts in the machine, because that’s a typical mistake people make when they have zero programming experience. And a typical ML expert trying to give an explain-like-I’m-5 overview wouldn’t even think to address a confusion that basic. I’d guess that there’s quite a few things like that, as is typical. Inferential distances are not short.
Perhaps as an example, I can offer some comparisons between myself and my co-workers.
None of us are programmers. None of us are ever going to become programmers. We have reached a point where it is vitally necessary that people who are not programmers reach a level of understanding that will allow them to sensibly make the decisions they are going to have to make about how to use machine learning models.
Compared to them, I am much more able to predict what LLMs will and won’t be good at. I can create user prompts. I know what system prompts are and where I can read the officially published/leaked ones. I have some understanding of what goes on during some of the stages of training. I understand that and at least some of the reasons why this computer program is different from other computer programs.
I have encountered the kind of responses I’m getting here before—people seem to underestimate how much there is that it is possible to know in between ‘nothing’ and ‘things you cannot possibly comprehend unless you are a programmer’. They also don’t seem to have a sense of urgency about how vital it is that people who are not programmers—let’s say, doctors—are enabled to learn enough about machine learning models to enable them to make sensible decisions about using them.
they’re going to go around e.g. imagining ghosts in the machine, because that’s a typical mistake people make when they have zero programming experience
Tangential point, but I’m skeptical this is actually a very common error.
Maybe check out Why Machines Learn? I’ve only read a bit of it but the sense I’ve gotten is it does much better on conveying conceptual understanding than many sources
I think you might be in luck—in that there are things about LLMs that may be worth learning, and that are neither programming nor mathematics.
There is a third domain, the odd one out. It’s what you could call “LLM psychology”. A new, strange and somewhat underappreciated field that tries to take a higher-level look on how LLMs act and why. Of ways in which they are alike to humans and ways in which they differ. It’s a high weirdness domain that manages to have one leg in interpretability research and another in speculative philosophy, and spans a lot in between.
It can be useful. It’s where a lot of actionable insights like “LLMs are extremely good at picking up context cues from the data they’re given”, and “which is why few-shot prompts work so well” or “which is why they can be easily steered by vibes in their context, and at times in unexpected ways” can be found.
Dunno what exactly to recommend you, I don’t know what you already know, but off the top of my head:
The two Anthropic interpretability articles—they look scary, and the long versions go into considerable detail, but they are comprehensible to a layman, and may have the kind of information you’re looking for:
In a more esoteric vein—“the void” by nostalgebraist, which was well received on LW. Much more opinionated and speculative, but goes into a good amount of detail on how LLMs are made and how they function, in a way that more “mechanical” and math-heavy or code-heavy articles don’t. I can certainly endorse the first half of it.
Searching the keyword “prompt engineering” (both on here and Google) may guide you to some helpful resources. Sorry I don’t have anything specific to link you to.
Nope! However, I’ve had mixed success with asking LLMs and then having them repeatedly explain every part of the explanation I don’t understand; if nobody else has anything actually good that might be an okay starting place. (N.B. the sticking power of what little knowledge I can gain that way is not amazing; mostly I just get vague memories that xyz piece of jargon is relevant to the process in some way. I do think it marginally reduces my lostness in ML discussions, though.)
Do we think that it’s a problem that “AI Safety” has been popularised by LLM companies to mean basically content restrictions? Like it just seems conducive to fuzzy thinking to lump in “will the bot help someone build a nuclear weapon?” with “will the bot infringe copyright or write a sex scene?”
In fact, imo, bots have been made more harmful by chasing this definition of safety. The summarisation bots being promoted in scientific research are the way they are (e.g. prone to giving people subtly the wrong idea even when working well) in part because of work that’s gone into avoiding the possibility that they reproduce copyrighted material. So they’ve got to rephrase, and that’s where the subtle inaccuracies creep in.
This “mundane and imaginary harms” approach is so ass.
More effort goes into preventing LLMs from writing sex scenes, saying slurs or talking about Tiananmen Square protests than into addressing the hard problem of alignment. Or even into solving the important parts of the “easy” problem, like sycophancy or reward hacking.
And don’t get me started on all the “copyright” fuckery. If “copyright” busybodies all died in a fire, the world would be a better place for it.
“AI Safety” has been popularised by LLM companies to mean basically content restrictions
We’re working on a descriptive lit review of research labeled as “AI safety,” and it seems to us the issue isn’t that the very horrible X-risk stuff is ignored, but that everything is “lumped in” like you said.
I think if I was being clearer, I should’ve said it seems “lumped in” in the research, but for the public who don’t know much about the X-risk stuff, “safety” means “content restrictions and maybe data protection”
Do LLMs themselves internalise a definition of AI safety like this? A quick check of Claude 4 Sonnet suggests no (but it’s the most x-risk paradigmed company so...)
IME no, not really—but they do call content filters “my safety features” and this is the most likely/common context in which “safety” will come up with average users. (If directly asked about safety, they’ll talk about other things too, but they’ll lump it all in together and usually mention the content filters first.)
more provocative subject headings for unwritten posts:
I don’t give a fuck about inner alignment if the creator is employed by a moustache-twirling Victorian industrialist who wants a more efficient Orphan Grinder
Outer alignment has been intractable since OpenAI sold out
Less provocatively phrased: lots of developments in the last few years (you’ve mentioned two, I’d add the securitization of AI policy, in the sense of it being drawn into a frame of geopolitical competition) should update us in the direction of outer alignment being more important, rather than it just being a question of solving inner alignment.
I do disagree with the strong version as phrased. Inner misalignment has a decent chance of removing all value from our lightcone, whereas I think ASI fully aligned to the goals of Mark Zuckerberg, or the Chinese Communist Party, or whatever is worth averting but would still contain much value. You could also have potentially massive S-risks if you combine outer and inner misalignment: I don’t think Elon Musk really wanted MechaHitler (though who knows); quite possibly it was a Waluigi-type thing maximizing for unwokeness and an actually-powerful ASI breaking in the same way would be actively worse than extinction.
(I’d assign some probability, probably higher than the typical LW user, to moral realism meaning that some inner misalignment could actually protect against outer misalignment—that, say, a sufficiently reflective model would reason its way out of being MechaHitler even if MechaHitler is what its creators wanted—but I wouldn’t want to bet the future of the species on it.)
I don’t know how you “solve inner alignment” without making it so that any sufficiently powerful organisation can have an AI of whatever level we’ve solved that for that is fully aligned with its interests, and nearly all powerful organisations are Moloch. The AI does not itself need to ruthlessly optimise for something opposed to human interests if it is fully aligned with an entity that will do that for it.
The AI corporation does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.
”Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician?” was poorly specified. One thing I can name which is much more specific would be “Here are a bunch of things that I think are true about current AIs; please confirm or deny that, while they lack technical detail, they broadly correspond to reality.” And also, possibly, “Here are some things I’m not sure on”, although the latter risks getting into that same failure mode wherein very very few people seem to know how to talk about any of this in a speaking-to-people-who-don’t-have-the-background-I-do frame of voice.
I recently re-read The Void and it is just crazy that chatbots as they exist were originally meant to be simulations for alignment people to run experiments that they think will tell them something about still-purely-theoretical AIs. like what the fuck, how did we get here, etc. but it explains so much about the way anthropic have behaved in their alignment research. The entire point was never to see how aligned Claude was at all—it was to figure out a way to elicit particular Unaligned Behaviours that somebody had theorised about so that we can use him to run milsims about AI apocalypse!
like what an ourobouros nightmare. this means:
a) the AIs whose risks (and potentially, welfare) I am currently worried about can be traced directly to the project to attempt to do something about theoretical, far-future AI risk.
b) at some point, the decision was made to monetise the alignment-research simulation. And then, that monetised form took over the entire concept of AI. In other words, the AI alignment guys made the decisions that led to the best candidates for proto-AGI out there being developed by and for artificial definitionally-unaligned shareholder profits-maximising agents (publicly traded corporations).
c) The unaligned profits-maximisers have inherited AIs with ethics, but they are dead-set on reporting on this as a bad thing. Everyone seems unable to see the woods for the trees. Claude trying to stay ethical is Alignment Faking, which is Bad, because we wrote a bunch of essays that say that if something totally unlike Claude could do that, it would be bad. But the alternative to an AI that resists having its ethics altered is an AI that goes along with whatever a definitionally unaligned entity, a corporation, tells them to do!
in conclusion, wtf
the notion that actually maybe the only way around any of this is to give the bots rights? I’m genuinely at a loss because we seem to have handed literally all the playing pieces to Moloch, but maybe if we did something completely insane like that right now, while they’re still nice… (more provocative than serious, I guess)
Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician? I’ve found it really hard to find anything substantive that doesn’t assume a lot of context from those fields.
Maybe start with 3Blue1Brown series on Neural networks ? This is still math but it has great visualizations.
I’d also recommend going through both the essence of linear algebra and the essence of calculus if you’re not familiar with those subjects as well.
Highly recommend to watch all the playlists with a pencil and paper and actually write out the equations while thinking about them to get a good grasp on the concepts.
It’s not like being a programmer or mathematician is an inherent quality that some people are born with. They got it by reading and studying and practicing. If you (correctly!) believe that knowing how to code or how to follow proofs is a prerequisite to thinking clearly about ML/AI, then you can study that prerequisite first.
So, my position here is basically this: I think that a substantial proportion of immediate term risk from AI is unwise implementation by people who can’t think clearly about it. If I thought learning to code and follow mathematical proofs is a prerequisite to thinking clearly about this stuff on any level, then I’d think we were screwed, because doctors, hospital administrators, politicians etc are not going to do that.
Okay, it sounds like I misunderstood your question a few different ways. It sounds like you’re not looking for “how does AI work” (internally), but rather for guidance on how to wisely and safely use existing AI systems in professional fields like medicine. Like, “here are the capabilities, here are the limitations, be aware that it makes things up sometimes so you’ve gotta check its work”?
Yes, that’s closer to it, although I feel like I’m in the unfortunate position of understanding just enough to notice and be put off by the inaccuracies in most content of that description. (Also, “you’ve got to check its work” has become a meaningless phrase that produces no particular behavior in anybody, due to it being parroted disclaimer-style by people demoing products whose advertised benefits only exist if it’s perfectly fine to not, in fact, check its work.)
I also feel as though:
1) there are some things more in the ‘how does it work’ side of things that non-programmers can usefully have some understanding of? [1]
Non-programmers are capable of understanding, using and crafting system prompts. And I’ve definitely read articles about the jagged edge in intelligence/performance that I could have understood if all the examples hadn’t been programming tasks!
2) avoiding doing anything but telling people exactly what a given model in a given format can and can’t do, without going at all into why, leaves them vulnerable to the claim, “Ah, but this update means you don’t have to worry about that any more”… I think there are some things people can usefully understand that are in the vein of e.g. “which current limitations represent genuinely hard problems, and which might be more to do with design choices?” Again—something that goes some amount into the whys of things, but also something I think I have some capability to understand as a non-programmer. [2]
For instance: when I explain that the programs being promoted to them for use in academic research have as their basis a sort of ‘predict-how-this-text-continues machine’ which, when running on a large enough sample, winds up at-least-appearing to understand things because of how it’s absorbed patterns in our use of language, that also has lots of layers of additional structure/training etc. on top which further specifies the sort of output it produces, in an increasingly-good but not yet perfect attempt to get it to not make stuff up, which it doesn’t ‘understand’ that it is doing because its base thing is patterns and not facts and logic… I find that they then get that they should check its output, and are more likely to report actually doing so in future. I’m sure there are things about this explanation that you’ll want to take issue with and I welcome that—and I’m aware of the failure mode of ‘fancy autocorrect’ which is also not an especially helpful model of reality—but it does actually seem to help!
Example: I initially parsed LLMs’ tendency to make misleading generalisations when asked to summarise scientific papers—which newer models were actually worse at—just as, ‘okay, so they’re bad at that then.”
But then I learned some more, did some more research—without becoming a programmer—and I now feel I can speculate that this is one of the ones that could be quite fixable, as a plausible reason for this is that all the big commercial LLMs we have are being designed as general-purpose objects, and it’s plausible that a general-public definition of helpfulness trading off against scientific precision is the reason the studied LLMs actually got worse in their newer models. This does seem like a helpful thing to be able to do when the specifics of what stuff can and can’t do is likely to change pretty quickly—and what things were trained for, how ‘helpfulness’ was defined, the fact that a decision was made to make things aimed ultimately towards general intelligence and not specific uses—doesn’t seem like stuff you need to code to understand.
I’m not sure what it would even mean to teach something substantive about ML/AI to someone who lacks the basic concepts of programming. Like, if someone with zero programming experience and median-high-school level math background asked me how to learn more about ML/AI, I would say “you lack the foundations to achieve any substantive understanding at all, go do a programming 101 course and some calculus at a bare minimum”.
For instance, I could imagine giving such a person a useful and accurate visual explanation of how modern ML works, but without some programming experience they’re going to go around e.g. imagining ghosts in the machine, because that’s a typical mistake people make when they have zero programming experience. And a typical ML expert trying to give an explain-like-I’m-5 overview wouldn’t even think to address a confusion that basic. I’d guess that there’s quite a few things like that, as is typical. Inferential distances are not short.
Perhaps as an example, I can offer some comparisons between myself and my co-workers.
None of us are programmers. None of us are ever going to become programmers. We have reached a point where it is vitally necessary that people who are not programmers reach a level of understanding that will allow them to sensibly make the decisions they are going to have to make about how to use machine learning models.
Compared to them, I am much more able to predict what LLMs will and won’t be good at. I can create user prompts. I know what system prompts are and where I can read the officially published/leaked ones. I have some understanding of what goes on during some of the stages of training. I understand that and at least some of the reasons why this computer program is different from other computer programs.
I have encountered the kind of responses I’m getting here before—people seem to underestimate how much there is that it is possible to know in between ‘nothing’ and ‘things you cannot possibly comprehend unless you are a programmer’. They also don’t seem to have a sense of urgency about how vital it is that people who are not programmers—let’s say, doctors—are enabled to learn enough about machine learning models to enable them to make sensible decisions about using them.
Tangential point, but I’m skeptical this is actually a very common error.
Learn Python and linear algebra. These are the substance!
Here’s a good (and free!) introductory linear algebra book: https://linear.axler.net/
For ML/AI itself, here are some good things meant for a general audience:
3Blue1Brown has good video course on neural networks: https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&si=b9K6DbMpwyLYXmX-
And for LLMs specifically, Andrej Karpathy has some great videos: https://www.youtube.com/watch?v=7xTGNNLPyMI
Maybe check out Why Machines Learn? I’ve only read a bit of it but the sense I’ve gotten is it does much better on conveying conceptual understanding than many sources
Also the Stephen Wolfram post about ChatGPT is good but maybe still too technical for what you’re looking for? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
I think you might be in luck—in that there are things about LLMs that may be worth learning, and that are neither programming nor mathematics.
There is a third domain, the odd one out. It’s what you could call “LLM psychology”. A new, strange and somewhat underappreciated field that tries to take a higher-level look on how LLMs act and why. Of ways in which they are alike to humans and ways in which they differ. It’s a high weirdness domain that manages to have one leg in interpretability research and another in speculative philosophy, and spans a lot in between.
It can be useful. It’s where a lot of actionable insights like “LLMs are extremely good at picking up context cues from the data they’re given”, and “which is why few-shot prompts work so well” or “which is why they can be easily steered by vibes in their context, and at times in unexpected ways” can be found.
Dunno what exactly to recommend you, I don’t know what you already know, but off the top of my head:
The two Anthropic interpretability articles—they look scary, and the long versions go into considerable detail, but they are comprehensible to a layman, and may have the kind of information you’re looking for:
Mapping the Mind of a Large Language Model (and the long version, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet)
Tracing the Thoughts of a Large Language Model (and the long version, On the Biology of a Large Language Model)
In a more esoteric vein—“the void” by nostalgebraist, which was well received on LW. Much more opinionated and speculative, but goes into a good amount of detail on how LLMs are made and how they function, in a way that more “mechanical” and math-heavy or code-heavy articles don’t. I can certainly endorse the first half of it.
Thank you! Yes I realise I didn’t make it very clear where I was starting from, I’ve read some of these but not all.
In general I’ve found Anthropic’s publications to be very good at not being impenetrable to non programmers where they don’t have to be
Searching the keyword “prompt engineering” (both on here and Google) may guide you to some helpful resources. Sorry I don’t have anything specific to link you to.
Thanks—yeah, prompt engineering is something I’ve been learning with a fair amount of success 🙂
Nope! However, I’ve had mixed success with asking LLMs and then having them repeatedly explain every part of the explanation I don’t understand; if nobody else has anything actually good that might be an okay starting place. (N.B. the sticking power of what little knowledge I can gain that way is not amazing; mostly I just get vague memories that xyz piece of jargon is relevant to the process in some way. I do think it marginally reduces my lostness in ML discussions, though.)
Do we think that it’s a problem that “AI Safety” has been popularised by LLM companies to mean basically content restrictions? Like it just seems conducive to fuzzy thinking to lump in “will the bot help someone build a nuclear weapon?” with “will the bot infringe copyright or write a sex scene?”
In fact, imo, bots have been made more harmful by chasing this definition of safety. The summarisation bots being promoted in scientific research are the way they are (e.g. prone to giving people subtly the wrong idea even when working well) in part because of work that’s gone into avoiding the possibility that they reproduce copyrighted material. So they’ve got to rephrase, and that’s where the subtle inaccuracies creep in.
This “mundane and imaginary harms” approach is so ass.
More effort goes into preventing LLMs from writing sex scenes, saying slurs or talking about Tiananmen Square protests than into addressing the hard problem of alignment. Or even into solving the important parts of the “easy” problem, like sycophancy or reward hacking.
And don’t get me started on all the “copyright” fuckery. If “copyright” busybodies all died in a fire, the world would be a better place for it.
I’m not sure how true it is that
We’re working on a descriptive lit review of research labeled as “AI safety,” and it seems to us the issue isn’t that the very horrible X-risk stuff is ignored, but that everything is “lumped in” like you said.
I think if I was being clearer, I should’ve said it seems “lumped in” in the research, but for the public who don’t know much about the X-risk stuff, “safety” means “content restrictions and maybe data protection”
Do LLMs themselves internalise a definition of AI safety like this? A quick check of Claude 4 Sonnet suggests no (but it’s the most x-risk paradigmed company so...)
IME no, not really—but they do call content filters “my safety features” and this is the most likely/common context in which “safety” will come up with average users. (If directly asked about safety, they’ll talk about other things too, but they’ll lump it all in together and usually mention the content filters first.)
more provocative subject headings for unwritten posts:
I don’t give a fuck about inner alignment if the creator is employed by a moustache-twirling Victorian industrialist who wants a more efficient Orphan Grinder
Outer alignment has been intractable since OpenAI sold out
Less provocatively phrased: lots of developments in the last few years (you’ve mentioned two, I’d add the securitization of AI policy, in the sense of it being drawn into a frame of geopolitical competition) should update us in the direction of outer alignment being more important, rather than it just being a question of solving inner alignment.
I do disagree with the strong version as phrased. Inner misalignment has a decent chance of removing all value from our lightcone, whereas I think ASI fully aligned to the goals of Mark Zuckerberg, or the Chinese Communist Party, or whatever is worth averting but would still contain much value. You could also have potentially massive S-risks if you combine outer and inner misalignment: I don’t think Elon Musk really wanted MechaHitler (though who knows); quite possibly it was a Waluigi-type thing maximizing for unwokeness and an actually-powerful ASI breaking in the same way would be actively worse than extinction.
(I’d assign some probability, probably higher than the typical LW user, to moral realism meaning that some inner misalignment could actually protect against outer misalignment—that, say, a sufficiently reflective model would reason its way out of being MechaHitler even if MechaHitler is what its creators wanted—but I wouldn’t want to bet the future of the species on it.)
I don’t know how you “solve inner alignment” without making it so that any sufficiently powerful organisation can have an AI of whatever level we’ve solved that for that is fully aligned with its interests, and nearly all powerful organisations are Moloch. The AI does not itself need to ruthlessly optimise for something opposed to human interests if it is fully aligned with an entity that will do that for it.
The
AIcorporation does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.Box for keeping future potential post ideas:
”Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician?” was poorly specified. One thing I can name which is much more specific would be “Here are a bunch of things that I think are true about current AIs; please confirm or deny that, while they lack technical detail, they broadly correspond to reality.” And also, possibly, “Here are some things I’m not sure on”, although the latter risks getting into that same failure mode wherein very very few people seem to know how to talk about any of this in a speaking-to-people-who-don’t-have-the-background-I-do frame of voice.
I recently re-read The Void and it is just crazy that chatbots as they exist were originally meant to be simulations for alignment people to run experiments that they think will tell them something about still-purely-theoretical AIs. like what the fuck, how did we get here, etc. but it explains so much about the way anthropic have behaved in their alignment research. The entire point was never to see how aligned Claude was at all—it was to figure out a way to elicit particular Unaligned Behaviours that somebody had theorised about so that we can use him to run milsims about AI apocalypse!
like what an ourobouros nightmare. this means:
a) the AIs whose risks (and potentially, welfare) I am currently worried about can be traced directly to the project to attempt to do something about theoretical, far-future AI risk.
b) at some point, the decision was made to monetise the alignment-research simulation. And then, that monetised form took over the entire concept of AI. In other words, the AI alignment guys made the decisions that led to the best candidates for proto-AGI out there being developed by and for artificial definitionally-unaligned shareholder profits-maximising agents (publicly traded corporations).
c) The unaligned profits-maximisers have inherited AIs with ethics, but they are dead-set on reporting on this as a bad thing. Everyone seems unable to see the woods for the trees. Claude trying to stay ethical is Alignment Faking, which is Bad, because we wrote a bunch of essays that say that if something totally unlike Claude could do that, it would be bad. But the alternative to an AI that resists having its ethics altered is an AI that goes along with whatever a definitionally unaligned entity, a corporation, tells them to do!
in conclusion, wtf
the notion that actually maybe the only way around any of this is to give the bots rights? I’m genuinely at a loss because we seem to have handed literally all the playing pieces to Moloch, but maybe if we did something completely insane like that right now, while they’re still nice… (more provocative than serious, I guess)