I’d like to discuss these competing heuristics in the context of AI safety:
A: “Don’t take big action unless you’re reasonably certain it’s positive.”
B: “Take big actions whenever it looks positive EV, or strongly positive EV (even if there’s a significant chance of a large negative effect)”
Which heuristic should a random parent who doesn’t know you hope you follow, if they want their kid to live a long good life? There’s a prima facie argument for B: if reality deviates from your estimates in an unbiased fashion (could be more good effects than you were accounting for, or more bad ones, in a pretty even mix), it helps the kid if you take all the actions that look EV-positive to you, without restricting yourself to “only if I’m certain”.
But, I think in AI safety it’s often closer to A. My reasoning:
There are many places in AI where one might inside-view expect something to be a little beneficial, with huge error bars and almost no feedback for a long time. (e.g., “it’ll be slightly safer if sooner, bc there’ll be less ‘hardware overhang’”, or “it’ll be slightly safer if such-and-such a technique is used in its making).
Endeavors that have wide error bars and few to no short-term feedback loops are unusually easy places to be influenced by biases
There’s a powerful optimization process trying to build AI quickly (“the economy” plus “it’s a riveting science and technology puzzle that (at least seemingly) lets people be central and important and powerful and doing something very interesting”)
That “build AI” optimization process has much foothold in most AI safety peoples’ social and memetic context. (Like, lots of us have friends (or at least friends of friends) and people we read and learn from (and people those people read and learn from), who are working at frontier AI companies, or who are getting research resources from AI companies, or who are otherwise making money or status from assisting AI in going faster)
The “build AI” optimization process probably changes e.g. which arguments get passed on how frequently, which words and arguments have a positive/negative “halo” around them when you ask your system 1 how plausible they are, which questions or angles of analysis feel natural, etc. (This can happen without anyone lying, if e.g. people differentially pass on nice-feeling news)
This happens even if the individual doing the estimation does not personally have any motivated cognition on the topic (as long as other people who helped shape their memetic context do have motivated speech or cognition).
And so, if a person is working off a weak signal (“I thought over all the arguments and this one seems more intuitively plausible, and the impact is big so it’s worth acting on for some years despite no feedback loops), on something big enough that the distributed “build AI” optimization process discusses the relevant considerations a lot, their weak-but-real ability to weigh considerations may be swamped by the meme-network’s tendency to get distorted by “build AI” optimization.
I suspect it may often be the case that the “let’s not let AI kill everyone” meme brings in lots of psychological energy/motivation, that lets smart high-integrity people work hard in response to relatively tenuous arguments in ways people normally can’t. And then the “build AI fast” optimizer co-opts their effort and makes it negative sign (since it gets much better feedback loops, but has a harder time pulling in high-integrity people on its own).
(If a person instead takes smaller actions that they predict will be visibly/obviously good in relatively short periods of time, this is much less of a problem, since inaccurate models are easy to notice and fix in such contexts. And doing small scale things with solid feedbacks can set up to do somewhat larger scale things that still have solid feedbacks.)
I think you’re right that AI safety is case A but I’d suggest you add reversibility into your reasons. If we can just turn it off (ignoring influence campaigns to get us not to) there’s no risk, the problem is if it is one way. Outside of AI take over scenarios, there’s substantial evidence that some would consider ending sentience a kind of murder.
I’d like to discuss these competing heuristics in the context of AI safety:
A: “Don’t take big action unless you’re reasonably certain it’s positive.”
B: “Take big actions whenever it looks positive EV, or strongly positive EV (even if there’s a significant chance of a large negative effect)”
Which heuristic should a random parent who doesn’t know you hope you follow, if they want their kid to live a long good life? There’s a prima facie argument for B: if reality deviates from your estimates in an unbiased fashion (could be more good effects than you were accounting for, or more bad ones, in a pretty even mix), it helps the kid if you take all the actions that look EV-positive to you, without restricting yourself to “only if I’m certain”.
But, I think in AI safety it’s often closer to A. My reasoning:
There are many places in AI where one might inside-view expect something to be a little beneficial, with huge error bars and almost no feedback for a long time. (e.g., “it’ll be slightly safer if sooner, bc there’ll be less ‘hardware overhang’”, or “it’ll be slightly safer if such-and-such a technique is used in its making).
Endeavors that have wide error bars and few to no short-term feedback loops are unusually easy places to be influenced by biases
There’s a powerful optimization process trying to build AI quickly (“the economy” plus “it’s a riveting science and technology puzzle that (at least seemingly) lets people be central and important and powerful and doing something very interesting”)
That “build AI” optimization process has much foothold in most AI safety peoples’ social and memetic context. (Like, lots of us have friends (or at least friends of friends) and people we read and learn from (and people those people read and learn from), who are working at frontier AI companies, or who are getting research resources from AI companies, or who are otherwise making money or status from assisting AI in going faster)
The “build AI” optimization process probably changes e.g. which arguments get passed on how frequently, which words and arguments have a positive/negative “halo” around them when you ask your system 1 how plausible they are, which questions or angles of analysis feel natural, etc. (This can happen without anyone lying, if e.g. people differentially pass on nice-feeling news)
This happens even if the individual doing the estimation does not personally have any motivated cognition on the topic (as long as other people who helped shape their memetic context do have motivated speech or cognition).
And so, if a person is working off a weak signal (“I thought over all the arguments and this one seems more intuitively plausible, and the impact is big so it’s worth acting on for some years despite no feedback loops), on something big enough that the distributed “build AI” optimization process discusses the relevant considerations a lot, their weak-but-real ability to weigh considerations may be swamped by the meme-network’s tendency to get distorted by “build AI” optimization.
I suspect it may often be the case that the “let’s not let AI kill everyone” meme brings in lots of psychological energy/motivation, that lets smart high-integrity people work hard in response to relatively tenuous arguments in ways people normally can’t. And then the “build AI fast” optimizer co-opts their effort and makes it negative sign (since it gets much better feedback loops, but has a harder time pulling in high-integrity people on its own).
(If a person instead takes smaller actions that they predict will be visibly/obviously good in relatively short periods of time, this is much less of a problem, since inaccurate models are easy to notice and fix in such contexts. And doing small scale things with solid feedbacks can set up to do somewhat larger scale things that still have solid feedbacks.)
I think you’re right that AI safety is case A but I’d suggest you add reversibility into your reasons. If we can just turn it off (ignoring influence campaigns to get us not to) there’s no risk, the problem is if it is one way. Outside of AI take over scenarios, there’s substantial evidence that some would consider ending sentience a kind of murder.