abramdemski comments on The Credit Assignment Problem

abramdemski 23 Jun 2021 17:24 UTC
LW: 3 AF: 3
0
AF
- In between … well … in between, we’re navigating treacherous waters …
Right, I basically agree with this picture. I might revise it a little:
- Early, the AGI is too dumb to hack its epistemics (provided we don’t give it easy ways to do so!).
- In the middle, there’s a danger zone.
- When the AGI is pretty smart, it sees why one should be cautious about such things, and it also sees why any modifications should probably be in pursuit of truthfulness (because true beliefs are a convergent instrumental goal) as opposed to other reasons.
- When the AGI is really smart, it might see better ways of organizing itself (eg, specific ways to hack epistemics which really are for the best even though they insert false beliefs), but we’re OK with that, because it’s really freaking smart and it knows to be cautious and it still thinks this.
So the goal is to allow what instrumental influence we can on the epistemic system, while making it hard and complicated to outright corrupt the epistemic system.
One important point here is that the epistemic system probably knows what the instrumental system is up to. If so, this gives us an important lever. For example, in theory, a logical inductor can’t be reliably fooled by an instrumental reasoner who uses it (so long as the hardware, including the input channels, don’t get corrupted), because it would know about the plans and compensate for them.
So if we could get a strong guarantee that the epistemic system knows what the instrumental system is up to (like “the instrumental system is transparent to the epistemic system”), this would be helpful.