Raemon comments on An alternative of PPO towards alignment

Raemon 17 Apr 2023 19:00 UTC
4 points
1
Leaving a quick note as a moderator: I think this post seems fine, but doesn’t do quite enough to situation itself in the LessWrong literature space. (It looks like the intro is sort of optimized for what I’m guessing is “the academic literature space”, but a bit ambiguous in ways I think LessWrong should avoid ambiguity). Specifically here:
This approach improves the quality of alignment. It is more efficient and stable in training, and it is also easier to implement. We have tested RAFT on both large language models and diffusion models, verifying its effectiveness in question answering and text-to-image generation tasks.
Alignment can mean a lot of things. “What counts as alignment, or alignment research” is a topic of debate that it doesn’t make sense for moderators to litigate, but, I think you should at least be clear about which definition of alignment you’re using and why you think it matters. I also think you should be clear on whether/how your alignment work helps with existential risk.