My mental model was “The AI will be written by careful and skilled humans who want to ensure various properties”—so it seemed reasonable to assume that the programmers are trying to write code that things can be proved about.
I agree that “sane and friendly” is something we don’t know how to formalize. But developers might be able to prove some useful but weaker property. For instance, that data structure X determines the AI’s credence in various claims and that debugging interface Y lets you examine that structure.
There’s a common assumption around here that the only meaningful way to deal with an AI is a black-box approach. And I’m increasingly bothered by this. If we can look inside an AI and see what it’s thinking, that ought to give us additional ability to check and enforce properties.
You don’t write some clean code that things can be proved about… you have the things you want provable about the code, then you develop code along with a proof that those things remain correct. Want to prove something not originally envisioned? Want to change definition of what you’re proving? Start over again.
So, you have that data structure which determines credence. What is credence, exactly? The AI should have a bit set here when it’s telling a lie? What’s a lie? The AI outputs a sequence of characters. It affects your actions after you read it. Whenever it is a lie is as much a fact about you as about the AI.
The black box assumption wasn’t the first. It came after many failed attempts to make AI using normal, very direct programming techniques—and after growing understanding that if we make AI, it’ll likely be some kind of self developing seed which will do god only knows what after it had read the internet (and which won’t be able to talk to you otherwise). Consider this site. A lot of stuff on this site tells how to modify your thought processes. You want the AI not to read and apply any techniques it comes across here. That’s some restricted AI for sure. Really stubborn AI.
Furthermore there is enormous data overload even for human level intelligence. Human brain has 100 trillion connections or so.
My mental model was “The AI will be written by careful and skilled humans who want to ensure various properties”—so it seemed reasonable to assume that the programmers are trying to write code that things can be proved about.
I agree that “sane and friendly” is something we don’t know how to formalize. But developers might be able to prove some useful but weaker property. For instance, that data structure X determines the AI’s credence in various claims and that debugging interface Y lets you examine that structure.
There’s a common assumption around here that the only meaningful way to deal with an AI is a black-box approach. And I’m increasingly bothered by this. If we can look inside an AI and see what it’s thinking, that ought to give us additional ability to check and enforce properties.
You don’t write some clean code that things can be proved about… you have the things you want provable about the code, then you develop code along with a proof that those things remain correct. Want to prove something not originally envisioned? Want to change definition of what you’re proving? Start over again.
So, you have that data structure which determines credence. What is credence, exactly? The AI should have a bit set here when it’s telling a lie? What’s a lie? The AI outputs a sequence of characters. It affects your actions after you read it. Whenever it is a lie is as much a fact about you as about the AI.
The black box assumption wasn’t the first. It came after many failed attempts to make AI using normal, very direct programming techniques—and after growing understanding that if we make AI, it’ll likely be some kind of self developing seed which will do god only knows what after it had read the internet (and which won’t be able to talk to you otherwise). Consider this site. A lot of stuff on this site tells how to modify your thought processes. You want the AI not to read and apply any techniques it comes across here. That’s some restricted AI for sure. Really stubborn AI.
Furthermore there is enormous data overload even for human level intelligence. Human brain has 100 trillion connections or so.