But the generator for the ideas is the problem is that the minimizing the harm an AI can do is more or less the same as minimizing its usefulness. If you had a superintelligent AI in a box, you could go further than letting it only emit strings. You could ask it questions, and restrict it to giving you “YES” | “NO” | “NOT_SURE” as answers. It’s even more safe then! But even less useful.
But people their tools to be useful! Gwern has a good essay on this (https://www.gwern.net/Tool-AI) where he points out that the whole gradient of incentive is for people to give greater and greater agency to their AI agents. Google wants return on investment for DeepMind; the US and China want to outcompete each other; Cerebras and NVIDIA want powerful examples to show of their shiny new chips; and so and so forth. Even in the non-competitive case of one person having an AI, the incentive gradient is hard to resist, which is the point of the above examples. But in the case of several people having an AI—well, what are the odds they’d all be happy restricting output to “YES” | “NO” | “MAYBE”? After all… they all know they all just get outcompeted by anyone who doesn’t thus restrict it… might as well be them. Letting it output single strings gives you more power than just letting it output answers; but letting it interact in a conversation gives you more power than that; and letting it have just a few actuators gives you more power than that, etc, etc.
That’s entirely fair about the first case.
But the generator for the ideas is the problem is that the minimizing the harm an AI can do is more or less the same as minimizing its usefulness. If you had a superintelligent AI in a box, you could go further than letting it only emit strings. You could ask it questions, and restrict it to giving you “YES” | “NO” | “NOT_SURE” as answers. It’s even more safe then! But even less useful.
But people their tools to be useful! Gwern has a good essay on this (https://www.gwern.net/Tool-AI) where he points out that the whole gradient of incentive is for people to give greater and greater agency to their AI agents. Google wants return on investment for DeepMind; the US and China want to outcompete each other; Cerebras and NVIDIA want powerful examples to show of their shiny new chips; and so and so forth. Even in the non-competitive case of one person having an AI, the incentive gradient is hard to resist, which is the point of the above examples. But in the case of several people having an AI—well, what are the odds they’d all be happy restricting output to “YES” | “NO” | “MAYBE”? After all… they all know they all just get outcompeted by anyone who doesn’t thus restrict it… might as well be them. Letting it output single strings gives you more power than just letting it output answers; but letting it interact in a conversation gives you more power than that; and letting it have just a few actuators gives you more power than that, etc, etc.