Fully human-interpretable, no, but the interpretation you particularly need for making utility maximizers is to be able to take some small set of human commonsense variables and identify or construct those variables within the AI’s world-model. I think this will plausibly take specialized work for each variable you want to add, but I think it can be done (and in particular will get easier as capabilities increase, and as we get better understandings of abstraction).
I don’t think we will be able to do it fully automatically, or that this will support all architectures, but it does seem like there are many specific approaches for making it doable. I can’t go into huge detail atm as I am on my phone, but I can say more later if you have any questions.
Oh OK, that sounds vaguely similar to the kinds of approaches to AGI safety that I’m thinking about, e.g. here (which you’ve seen—we had a chat in the comments) or upcoming Post #13 in my sequence :) BTW I’d love to call & chat at some point if you have time & interest.
Oh OK, that sounds vaguely similar to the kinds of approaches to AGI safety that I’m thinking about
Cool, yeah, I can also say that my views are partly inspired by your writings. 👍
BTW I’d love to call & chat at some point if you have time & interest.
I’d definitely be interested, can you send me a PM about your availability? I have a fairly flexible schedule, though I live in Europe, so there may be some time zone issues.
Fully human-interpretable, no, but the interpretation you particularly need for making utility maximizers is to be able to take some small set of human commonsense variables and identify or construct those variables within the AI’s world-model. I think this will plausibly take specialized work for each variable you want to add, but I think it can be done (and in particular will get easier as capabilities increase, and as we get better understandings of abstraction).
I don’t think we will be able to do it fully automatically, or that this will support all architectures, but it does seem like there are many specific approaches for making it doable. I can’t go into huge detail atm as I am on my phone, but I can say more later if you have any questions.
Oh OK, that sounds vaguely similar to the kinds of approaches to AGI safety that I’m thinking about, e.g. here (which you’ve seen—we had a chat in the comments) or upcoming Post #13 in my sequence :) BTW I’d love to call & chat at some point if you have time & interest.
Cool, yeah, I can also say that my views are partly inspired by your writings. 👍
I’d definitely be interested, can you send me a PM about your availability? I have a fairly flexible schedule, though I live in Europe, so there may be some time zone issues.