Ben Pace comments on ARC’s first technical report: Eliciting Latent Knowledge

Ben Pace 19 Dec 2021 8:18 UTC
LW: 3 AF: 2
0
AF
From the section “Strategy: have humans adopt the optimal Bayes net”:
Roughly speaking, imitative generalization:
- Considers the space of changes the humans could make to their Bayes net;
- Learns a function which maps (proposed change to Bayes net) to (how a human — with AI assistants — would make predictions after making that change);
- Searches over this space to find the change that allows the humans to make the best predictions.
Regarding the second step, what is the meat of this function? My superficial understanding is that a Bayes net is deterministic and fully-specified, and that we already have the tools to be able to say “given a change to the value of node A of a Bayes net, here is what probability will be assigned to node B of the Bayes net”.
I suspect you’re imagining something clever involving the human’s Bayes net plus the AI, but perhaps you just mean faster and faster algorithms for computing this update given a very complex world-model.
- paulfchristiano 19 Dec 2021 16:23 UTC
  LW: 3 AF: 3
  0
  AF Parent
  In general we don’t have an explicit representation of the human’s beliefs as a Bayes net (and none of our algorithms are specialized to this case), so the only way we are representing “change to Bayes net” is as “information you can give to a human that would lead them to change their predictions.”
  That said, we also haven’t described any inference algorithm other than “ask the human.” In general inference is intractable (even in very simple models), and the only handle we have on doing fast+acceptable approximate inference is that the human can apparently do it.
  (Though if that was the only problem then we also expect we could find some loss function that incentivizes the AI to do inference in the human Bayes net.)