Thanks for posting! Unfortunately, the issues presented in this post are only the tip of the iceberg.
Because yes, it would be nice to program an AI to value humans and respect their desires and so forth. But currently, we lack the technical understanding to be able to do that. Between here and there, there are a lot of unsolved technical problems, from model specification (are the AI’s concepts similar enough to a human’s that we can specify things in terms of those concepts?), to corrigibility (if we think the AI is wrong, will it stop to listen?), to the whole problem of external reference (how do you build an AI that stably values something in the external world? For bonus points, this should work even if it learns entirely new breakthroughs about physics, and better ways to represent the world internally.), to naturalistic reasoning (the AI needs to treat itself as part of the world—this overlaps with mindcrime when we want to the AI to be careful with what it simulates).
How to actually construct the AI was not part of the scope of the essay request, as I understood it. My intention was to describe some conceptual building blocks that are necessary to adequately frame the problem. For example, I address how utility functions are generated in sapient beings, including both humans and AI. Additionally, that explanation works whether or not huge paradigms shifts occur. No amount of technical understanding is going to substitute for an understanding of why we have utility functions in the first place, and what shapes they take. Rather than the tip of the iceberg, these ideas are supposed to be the foundation of the pyramid. I didn’t write about my approach to the problems of external reference and model specification because they were not the subject of the call for ideas, but I can do so if you are interested.
Furthermore, at no point do I describe “programming” the AI to do anything—quite the opposite, actually. I address that when I rule out the concept of the 3 Laws. The idea is effectively to “raise” an AI in such a way as to instill the values we want it to have. Many concepts specific to humans don’t apply to AIs, but many concepts specific to people do, and those are ones we’ll need to be aware of. Apparently I was not clear enough on that point.
Thanks for posting! Unfortunately, the issues presented in this post are only the tip of the iceberg.
Because yes, it would be nice to program an AI to value humans and respect their desires and so forth. But currently, we lack the technical understanding to be able to do that. Between here and there, there are a lot of unsolved technical problems, from model specification (are the AI’s concepts similar enough to a human’s that we can specify things in terms of those concepts?), to corrigibility (if we think the AI is wrong, will it stop to listen?), to the whole problem of external reference (how do you build an AI that stably values something in the external world? For bonus points, this should work even if it learns entirely new breakthroughs about physics, and better ways to represent the world internally.), to naturalistic reasoning (the AI needs to treat itself as part of the world—this overlaps with mindcrime when we want to the AI to be careful with what it simulates).
How to actually construct the AI was not part of the scope of the essay request, as I understood it. My intention was to describe some conceptual building blocks that are necessary to adequately frame the problem. For example, I address how utility functions are generated in sapient beings, including both humans and AI. Additionally, that explanation works whether or not huge paradigms shifts occur.
No amount of technical understanding is going to substitute for an understanding of why we have utility functions in the first place, and what shapes they take. Rather than the tip of the iceberg, these ideas are supposed to be the foundation of the pyramid. I didn’t write about my approach to the problems of external reference and model specification because they were not the subject of the call for ideas, but I can do so if you are interested.
Furthermore, at no point do I describe “programming” the AI to do anything—quite the opposite, actually. I address that when I rule out the concept of the 3 Laws. The idea is effectively to “raise” an AI in such a way as to instill the values we want it to have. Many concepts specific to humans don’t apply to AIs, but many concepts specific to people do, and those are ones we’ll need to be aware of. Apparently I was not clear enough on that point.