paulfchristiano comments on ARC’s first technical report: Eliciting Latent Knowledge

paulfchristiano 15 Dec 2021 2:52 UTC
LW: 6 AF: 5
0
AF
“the human isn’t well-modelled using a Bayes net” is a possible response the breaker could give
The breaker is definitely allowed to introduce counterexamples where the human isn’t well-modeled using a Bayes net. Our training strategies (introduced here) don’t say anything at all about Bayes nets and so it’s not clear if this immediately helps the breaker—they are the one who introduced the assumption that the human used a Bayes nets (in in order to describe a simplified situation where the naive training strategy failed here). We’re definitely not intentionally viewing Bayes nets as part of the definition of the game.
If you solve something given worst-case assumptions, you’ve solved it for all cases. Whereas if you solve it for one specific case (e.g. Bayes nets) then it may still fail if that’s not the case we end up facing.
It seems very plausible that after solving the problem for humans-who-use-Bayes-nets we will find a new counterexample that only works for humans-who-don’t-use-Bayes-nets, in which case we’ll move on to those counterexamples.
It seems even more likely that the builder will propose an algorithm that exploits cognition that humans can do which isn’t well captured by the Bayes net model, which is also fair game. (And indeed several of our approaches to do it, e.g. when imagining humans learning new things about the world by performing experiments here or reasoning about plausibility of model joint distributions here).
That said, it looks to us like if any of these algorithms worked for Bayes nets, they would at least work for a very broad range of human models, the Bayes net assumption doesn’t seem to be changing the picture much qualitatively.
Echoing Mark in his comment, we’re definitely interested in ways that this assumption seems importantly unrealistic. If you just think it’s generally a mediocre model and results are unlikely to generalize, then you can also wait for us to discover that after finding an algorithm that works for Bayes nets and then finding that it breaks down as we extend to more realistic examples.
Conditioned on ontology identification being impossible, I think it’s most likely to also be impossible for humans who reason about the world using a Bayes net.
Doesn’t this imply that a Bayes-net model isn’t the worst case?
I think Ajeya is just pointing out why it seems useful to search for algorithms that handle Bayes nets. If thinking about Bayes nets is very straightforward and it lets us rule out all the algorithms we can see, then we’re happy to do that as long as it works.