Wei Dai comments on Thoughts on reward engineering

Wei Dai 30 Aug 2019 8:00 UTC
LW: 3 AF: 2
0
AF

I think I would be fairly surprised if future ML techniques weren’t smooth in this way, so I think it’s a pretty reasonable assumption.

This is kind of tangential at this point, but I’m not so sure about this. Humans can sometimes optimize things without being slow and continuous, so there must be algorithms that can do this, which can be invented or itself produced via (dumber) ML. As another intuition pump, suppose the algorithm is just gradient descent with some added pattern recognizers that can say “hey, I see where this is going, let’s jump directly there.”

A low-bandwidth overseer seems unlikely to be competitive to me.

Has this been written up anywhere, and is it something that Paul agrees with? (I think last time he talked about HBO vs LBO, he was still ⁵⁰⁄₅₀ on them.)

I mostly agree with this and it is a disagreement I have with Paul in that I am more skeptical of relaxing the supervised setting.

Ah ok, so your earlier comments were addressing a different and easier problem than the one I have in mind, and that’s (mostly) why you sounded more optimistic than me.

Do you have a good sense of why Paul disagrees with you, and if so can you explain?

(I haven’t digested your mechanistic corrigibility post yet. May have more to say after I do that.)