Charlie Steiner comments on How much alignment data will we need in the long run?

Charlie Steiner 11 Aug 2022 4:05 UTC
LW: 2 AF: 1
0
AF
Yeah I just meant the upper bound of “within 2 OOM.” :) If we could somehow beat the lower bound and get aligned AI with just a few minutes of human feedback, I’d be all for it.

I think aiming for under a few hundred hours of feedback is a good goal because we want to keep the alignment tax low, and that’s the kind of tax I see as being easily payable. An unstated assumption I made is that I expect we can use unlabeled data to do a lot of the work of alignment, making labeled data somewhat superfluous, but that I still think amount of feedback is important.

As for why I think it’s possible, I can only plead intuition about what I expect from on-the-horizon advances in priors over models of humans, and ability to bootstrap models from unlabeled data plus feedback.
- Jacob_Hilton 11 Aug 2022 4:29 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I share your intuitions about ultimately not needing much alignment data (and tried to get that across in the post), but quantitatively:
  - Recent implementations of RLHF have used on the order of thousands of hours of human feedback, so 2 orders of magnitude more than that is much more than a few hundred hours of human feedback.
  - I think it’s pretty likely that we’ll be able to pay an alignment tax upwards of 1% of total training costs (essentially because people don’t want to die), in which case we could afford to spend significantly more than an additional 2 orders of magnitude on alignment data, if that did in fact turn out to be required.