It feels like you’re using a bit of ghost in the machine reasoning to come up with some of these answers.
In the first case, the AI would not ask for more computing power. It does not have utilities that extend beyond one week. Its only goal is to create a message that can communicate how to make a really good battery. If it had insufficient computing power, it would not output a message telling me so, because that would be in direct opposition to the goal. The outcome I would expect in that case would be for it to communicate a really shitty or expensive battery or else just copy and paste the answer from Wikipedia. And this wouldn’t be a ploy for more computing power, it would just be the AI actually making its best effort to fulfill its goal.
The second and third cases point out legitimate security concerns, but they’re not ones that are impossible to address, and I don’t see how aligned AI wouldn’t also suffer from those risks. An oracular AI has some safety features, and an aligned AI has some safety features, but both could be misused if those limits were removed.
Another stupid intro question, could you use an oracular AI to build an aligned one?
But the generator for the ideas is the problem is that the minimizing the harm an AI can do is more or less the same as minimizing its usefulness. If you had a superintelligent AI in a box, you could go further than letting it only emit strings. You could ask it questions, and restrict it to giving you “YES” | “NO” | “NOT_SURE” as answers. It’s even more safe then! But even less useful.
But people their tools to be useful! Gwern has a good essay on this (https://www.gwern.net/Tool-AI) where he points out that the whole gradient of incentive is for people to give greater and greater agency to their AI agents. Google wants return on investment for DeepMind; the US and China want to outcompete each other; Cerebras and NVIDIA want powerful examples to show of their shiny new chips; and so and so forth. Even in the non-competitive case of one person having an AI, the incentive gradient is hard to resist, which is the point of the above examples. But in the case of several people having an AI—well, what are the odds they’d all be happy restricting output to “YES” | “NO” | “MAYBE”? After all… they all know they all just get outcompeted by anyone who doesn’t thus restrict it… might as well be them. Letting it output single strings gives you more power than just letting it output answers; but letting it interact in a conversation gives you more power than that; and letting it have just a few actuators gives you more power than that, etc, etc.
It feels like you’re using a bit of ghost in the machine reasoning to come up with some of these answers.
In the first case, the AI would not ask for more computing power. It does not have utilities that extend beyond one week. Its only goal is to create a message that can communicate how to make a really good battery. If it had insufficient computing power, it would not output a message telling me so, because that would be in direct opposition to the goal. The outcome I would expect in that case would be for it to communicate a really shitty or expensive battery or else just copy and paste the answer from Wikipedia. And this wouldn’t be a ploy for more computing power, it would just be the AI actually making its best effort to fulfill its goal.
The second and third cases point out legitimate security concerns, but they’re not ones that are impossible to address, and I don’t see how aligned AI wouldn’t also suffer from those risks. An oracular AI has some safety features, and an aligned AI has some safety features, but both could be misused if those limits were removed.
Another stupid intro question, could you use an oracular AI to build an aligned one?
That’s entirely fair about the first case.
But the generator for the ideas is the problem is that the minimizing the harm an AI can do is more or less the same as minimizing its usefulness. If you had a superintelligent AI in a box, you could go further than letting it only emit strings. You could ask it questions, and restrict it to giving you “YES” | “NO” | “NOT_SURE” as answers. It’s even more safe then! But even less useful.
But people their tools to be useful! Gwern has a good essay on this (https://www.gwern.net/Tool-AI) where he points out that the whole gradient of incentive is for people to give greater and greater agency to their AI agents. Google wants return on investment for DeepMind; the US and China want to outcompete each other; Cerebras and NVIDIA want powerful examples to show of their shiny new chips; and so and so forth. Even in the non-competitive case of one person having an AI, the incentive gradient is hard to resist, which is the point of the above examples. But in the case of several people having an AI—well, what are the odds they’d all be happy restricting output to “YES” | “NO” | “MAYBE”? After all… they all know they all just get outcompeted by anyone who doesn’t thus restrict it… might as well be them. Letting it output single strings gives you more power than just letting it output answers; but letting it interact in a conversation gives you more power than that; and letting it have just a few actuators gives you more power than that, etc, etc.