Thanks, that’s interesting. [I did mean to reply sooner, but got distracted]
A few quick points:
Yes, by “incoherent causal model” I only mean something like “causal model that has no clear mapping back to a distribution over real worlds” (e.g. where different parts of the model assume that [kite exists] has different probabilities). Agreed that the models LCDT would use are coherent in their own terms. My worry is, as you say, along garbage-in-garbage-out lines.
Having LCDT simulate HCH seems more plausible than its taking useful action in the world—but I’m still not clear how we’d avoid the LCDT agent creating agential components (or reasoning based on its prediction that it might create such agential components) [more on this here: point (1) there seems ok for prediction-of-HCH-doing-narrow-task (since all we need is some non-agential solution to exist); point (2) seems like a general problem unless the LCDT agent has further restrictions].
Agreed on HCH practical difficulties—I think Evan and Adam are a bit more optimistic on HCH than I am, but no-one’s saying it’s a non-problem. From the LCDT side, it seems we’re ok so long as it can simulate [something capable and aligned]; HCH seems like a promising candidate.
On HCH-simulation practical specifics, I think a lot depends on how you’re generating data / any model of H, and the particular way any [system that limits to HCH] would actually limit to HCH. E.g. in an IDA setup, the human(s) in any training step will know that their subquestions are answered by an approximate model.
I think we may be ok on error-compounding, so long as the learned model of humans is not overconfident of its own accuracy (as a model of humans). You’d hope to get compounding uncertainty rather than compounding errors.
Thanks, that’s interesting. [I did mean to reply sooner, but got distracted]
A few quick points:
Yes, by “incoherent causal model” I only mean something like “causal model that has no clear mapping back to a distribution over real worlds” (e.g. where different parts of the model assume that [kite exists] has different probabilities).
Agreed that the models LCDT would use are coherent in their own terms. My worry is, as you say, along garbage-in-garbage-out lines.
Having LCDT simulate HCH seems more plausible than its taking useful action in the world—but I’m still not clear how we’d avoid the LCDT agent creating agential components (or reasoning based on its prediction that it might create such agential components) [more on this here: point (1) there seems ok for prediction-of-HCH-doing-narrow-task (since all we need is some non-agential solution to exist); point (2) seems like a general problem unless the LCDT agent has further restrictions].
Agreed on HCH practical difficulties—I think Evan and Adam are a bit more optimistic on HCH than I am, but no-one’s saying it’s a non-problem. From the LCDT side, it seems we’re ok so long as it can simulate [something capable and aligned]; HCH seems like a promising candidate.
On HCH-simulation practical specifics, I think a lot depends on how you’re generating data / any model of H, and the particular way any [system that limits to HCH] would actually limit to HCH. E.g. in an IDA setup, the human(s) in any training step will know that their subquestions are answered by an approximate model.
I think we may be ok on error-compounding, so long as the learned model of humans is not overconfident of its own accuracy (as a model of humans). You’d hope to get compounding uncertainty rather than compounding errors.