tailcalled comments on We cannot directly choose an AGI’s utility function

tailcalled 22 Mar 2022 12:27 UTC
4 points
0
Fully human-interpretable, no, but the interpretation you particularly need for making utility maximizers is to be able to take some small set of human commonsense variables and identify or construct those variables within the AI’s world-model. I think this will plausibly take specialized work for each variable you want to add, but I think it can be done (and in particular will get easier as capabilities increase, and as we get better understandings of abstraction).

I don’t think we will be able to do it fully automatically, or that this will support all architectures, but it does seem like there are many specific approaches for making it doable. I can’t go into huge detail atm as I am on my phone, but I can say more later if you have any questions.
- Steven Byrnes 22 Mar 2022 13:28 UTC
  2 points
  0
  Parent
  Oh OK, that sounds vaguely similar to the kinds of approaches to AGI safety that I’m thinking about, e.g. here (which you’ve seen—we had a chat in the comments) or upcoming Post #13 in my sequence :) BTW I’d love to call & chat at some point if you have time & interest.
  - tailcalled 22 Mar 2022 14:26 UTC
    2 points
    0
    Parent
    
    Oh OK, that sounds vaguely similar to the kinds of approaches to AGI safety that I’m thinking about
    
    Cool, yeah, I can also say that my views are partly inspired by your writings. 👍
    
    BTW I’d love to call & chat at some point if you have time & interest.
    
    I’d definitely be interested, can you send me a PM about your availability? I have a fairly flexible schedule, though I live in Europe, so there may be some time zone issues.