When I articulate the case for AI takeover risk to people I know, I don’t find the need to introduce them to new ontologies.… But I think I agree that if you want to actually do technical work to reduce the risks, that it is useful to have new concepts that point out why the risk might arise.
It seems like one crux might be that I place much less value on “articulating a case” than you do. Or maybe another way of putting it is that you draw a cleaner boundary between “articulating a case” and “actually do technical work”, whereas I think of them as pretty continuous when done well.
(Note that this also puts me in disagreement with many rationalists. Many rationalists treat “the case for high P(doom)” as a pretty reliable set of ideas, and then “alignment research” as something we’re very confused about. Whereas from my perspective, these two things are intimately related—developing the concepts and ontology required for a robust case for high P(doom) would actually get us most of the way towards solving the alignment problem.)
Not quite sure how to justify my position here, but one intuition is something like “taking crucial considerations seriously”. It really does seem that people’s overall conclusions about what’s good or bad can be flipped by things they haven’t thought of. For example, I notice that many people pay lip service to the idea that the AI safety movement has significantly accelerated AI capabilities (and that AI governance has polarized the current administration against AI safety), but almost nobody is actually trying to figure out how to systematically avoid making additional mistakes like that.
So how do you make your conclusions and strategies less fragile? Well, here are some domains in which it’s possible to draw robust conclusions: physics, statistics, computer science, etc. Each of these fields have deep ontologies that have been tested against reality in many different ways, such that it’s very hard to imagine their concepts and arguments being totally wrongheaded. Even then, you still have “crucial considerations” like relativity which totally change how you interpret fundamental concepts in those fields—but importantly, Newtonian mechanics was robust enough that even a reconceptualization of core concepts like “space” and “time” actually changed very few of its practical conclusions.
I think it’s also worth mentioning the social element. Making a case of the form “people should be worried about X/people should work on X” is a satisfying thing which gains people (short-term) clout and influence. Conversely, trying to deeply understand X is harder and riskier. This is part of why it seems like there’s a misallocation of effort towards raising awareness of risks rather than trying to solve them (even within what’s usually called “AI safety research”—e.g. evals are mostly in the former category).
To be fair, you also get short-term clout for being a grumpy contrarian, like I’m being now. So I think that if I spend too much time or effort doing so, there’s probably something suspicious about that. Looking back, it seems like I started my current grumpy contrarian arc around mid-2024, when I published this post and this post. So it’s been a year and a half, which is quite a long time! Ideally I’ll stop making posts about which research I dislike within the next 6 months or so, and limit my focus to discussing research I like (or ideally just producing research that speaks for me).
In my mind, there is not much ontological innovation going on in these concepts, because they can be stated in one sentence using pre-existing concepts.
So can the concept of evolutionary fitness, or Galileo’s laws of motion. Yet those are huge ontological innovations—and in general a lot of ontological innovations are very simple in hindsight. (In Galileo’s case, even the realization “stuff on earth follows the same rules as stuff in space” was a huge step forward.) The important part is not that each individual concept is novel or complex, but rather that you get a set of interlocking concepts which bind together to allow you to generate good explanations and predictions.
Okay, it’s helpful to know that you see these as providing new valuable ontologies to some extent.
To be clear, I did say “these ontologies are kinda “thin”, which has made it difficult to use them to do substantive work”. So yeah, not useless but also not central examples of valuable ontological progress.
It sometimes seems to me like you jump to the conclusion that all the action is in the edge cases without actually arguing for it
In the case of the extinction example I only said it was “possible” that all the action is in the edge cases. I use this as an example to illustrate that even concepts which seem really solid are still more flimsy than you’d think, not as an example of a concept that we should discard because it’s near-useless. Sorry for the lack of clarity.
Conversely, in the case of human powergrabs you’re right that I’m making a claim which I haven’t fully argued for. I think the best way to make this argument is just to continue developing my own theories for understanding power grabs (e.g. this kind of thinking, but hopefully much more rigorous). Might take a while unfortunately.
One other thing is that I’d have guessed that the sign uncertainty of historical work on AI safety and AI governance is much more related to the inherent chaotic nature of social and political processes rather than a particular deficiency in our concepts for understanding them.
I’m sceptical that pure strategy research could remove that side uncertainty, and I wonder if it would take something like the ability to run loads of simulations of societies like ours.
inherent chaotic nature of social and political processes
Everything seems inherently chaotic until you understand it well! The motion of the planets across the sky seems very arbitrary until you understand Kepler’s laws, and so on.
Re “simulations”, the easiest way to build a simulation of something is to have a principled model/theory of it.
I do agree that the history of crucial considerations provides a good reason to favour ‘deep understanding’.
I also agree that you plausibly need a much deeper understanding to get to above 90% on P(doom). But I don’t think you need that to get to the action-relevant thresholds, which are much lower.
I’d be interested in learning more about your power grab threat models, so let me know if and when you have something you want to share. And TBC I think you’re right that in many scenarios it will not be clear to other people whether the entity seeking power is ultimately humans or AIs—my current view is that the two possibilities are distinct, and it is plausible that just one of them obtains pretty cleanly.
It seems like one crux might be that I place much less value on “articulating a case” than you do. Or maybe another way of putting it is that you draw a cleaner boundary between “articulating a case” and “actually do technical work”, whereas I think of them as pretty continuous when done well.
(Note that this also puts me in disagreement with many rationalists. Many rationalists treat “the case for high P(doom)” as a pretty reliable set of ideas, and then “alignment research” as something we’re very confused about. Whereas from my perspective, these two things are intimately related—developing the concepts and ontology required for a robust case for high P(doom) would actually get us most of the way towards solving the alignment problem.)
Not quite sure how to justify my position here, but one intuition is something like “taking crucial considerations seriously”. It really does seem that people’s overall conclusions about what’s good or bad can be flipped by things they haven’t thought of. For example, I notice that many people pay lip service to the idea that the AI safety movement has significantly accelerated AI capabilities (and that AI governance has polarized the current administration against AI safety), but almost nobody is actually trying to figure out how to systematically avoid making additional mistakes like that.
So how do you make your conclusions and strategies less fragile? Well, here are some domains in which it’s possible to draw robust conclusions: physics, statistics, computer science, etc. Each of these fields have deep ontologies that have been tested against reality in many different ways, such that it’s very hard to imagine their concepts and arguments being totally wrongheaded. Even then, you still have “crucial considerations” like relativity which totally change how you interpret fundamental concepts in those fields—but importantly, Newtonian mechanics was robust enough that even a reconceptualization of core concepts like “space” and “time” actually changed very few of its practical conclusions.
I think it’s also worth mentioning the social element. Making a case of the form “people should be worried about X/people should work on X” is a satisfying thing which gains people (short-term) clout and influence. Conversely, trying to deeply understand X is harder and riskier. This is part of why it seems like there’s a misallocation of effort towards raising awareness of risks rather than trying to solve them (even within what’s usually called “AI safety research”—e.g. evals are mostly in the former category).
To be fair, you also get short-term clout for being a grumpy contrarian, like I’m being now. So I think that if I spend too much time or effort doing so, there’s probably something suspicious about that. Looking back, it seems like I started my current grumpy contrarian arc around mid-2024, when I published this post and this post. So it’s been a year and a half, which is quite a long time! Ideally I’ll stop making posts about which research I dislike within the next 6 months or so, and limit my focus to discussing research I like (or ideally just producing research that speaks for me).
So can the concept of evolutionary fitness, or Galileo’s laws of motion. Yet those are huge ontological innovations—and in general a lot of ontological innovations are very simple in hindsight. (In Galileo’s case, even the realization “stuff on earth follows the same rules as stuff in space” was a huge step forward.) The important part is not that each individual concept is novel or complex, but rather that you get a set of interlocking concepts which bind together to allow you to generate good explanations and predictions.
To be clear, I did say “these ontologies are kinda “thin”, which has made it difficult to use them to do substantive work”. So yeah, not useless but also not central examples of valuable ontological progress.
In the case of the extinction example I only said it was “possible” that all the action is in the edge cases. I use this as an example to illustrate that even concepts which seem really solid are still more flimsy than you’d think, not as an example of a concept that we should discard because it’s near-useless. Sorry for the lack of clarity.
Conversely, in the case of human powergrabs you’re right that I’m making a claim which I haven’t fully argued for. I think the best way to make this argument is just to continue developing my own theories for understanding power grabs (e.g. this kind of thinking, but hopefully much more rigorous). Might take a while unfortunately.
One other thing is that I’d have guessed that the sign uncertainty of historical work on AI safety and AI governance is much more related to the inherent chaotic nature of social and political processes rather than a particular deficiency in our concepts for understanding them.
I’m sceptical that pure strategy research could remove that side uncertainty, and I wonder if it would take something like the ability to run loads of simulations of societies like ours.
Everything seems inherently chaotic until you understand it well! The motion of the planets across the sky seems very arbitrary until you understand Kepler’s laws, and so on.
Re “simulations”, the easiest way to build a simulation of something is to have a principled model/theory of it.
But things can be inherently chaotic too!
Agree it’s unclear how much is inherent.
Thanks for this!
I do agree that the history of crucial considerations provides a good reason to favour ‘deep understanding’.
I also agree that you plausibly need a much deeper understanding to get to above 90% on P(doom). But I don’t think you need that to get to the action-relevant thresholds, which are much lower.
I’d be interested in learning more about your power grab threat models, so let me know if and when you have something you want to share. And TBC I think you’re right that in many scenarios it will not be clear to other people whether the entity seeking power is ultimately humans or AIs—my current view is that the two possibilities are distinct, and it is plausible that just one of them obtains pretty cleanly.