trevor comments on Compendium of problems with RLHF

trevor 30 Jan 2023 3:46 UTC
16 points
−3
I do AI policy, and it has been extremely rare for me to see something that sticks out to me as something as valuable as this. I don’t think you’re aware how incredibly valuable your gifs are for describing problems with RLHF. The human brain is structured to glaze over, ignore, or forget text, while considering visual evidence with the full force of reasoning faculties. Visual information is vastly more intuitive and digestible that written language, it is like “getting your foot in the door” to someone’s limited daily allowance of attention and higher reasoning faculties. Although it requires some buy in to figure out ways to correctly and honestly describe the problem with gifs, the communication value (to other ML researchers) is massive and the ratio of honest, successful communication to effort is extremely high. Using gifs to visually describe the problems with RLHF just scales really well with more time, effort, and cognition spent on really accurate gifs, after making the first gifs.
Most people don’t notice or seriously consider the problem with RLHF because they don’t feel like digesting text criticizing RLHF, and more gifs will give experts a fair chance to have accurate thoughts about the problems with RLHF. I’m not an expert and I don’t know how difficult it is to actually make gifs that can describe the complex problems, but I do know that if the bare minimum is managed at making such gifs, the paper has a much higher chance of causing a paradigm shift among ML researchers than the average ML researcher might think. Even if describing most of the RLHF problems with gifs is impossible or systematically fails, it will still have its effect amplified by allowing ML researchers to begin communicating problems to tech executives and policymakers who manage their projects and funding.
I hope that this isn’t your final compendium on RLHF (I’m bookmarking it either way) and that you spend at least a little bit of time evaluating whether it’s possible to describe problems with RLHF to ML researchers using gifs. This is a gold mine, and it never occurred to me that it could be done until I saw that gif. If you can’t figure out a way to do it yourself, I recommend asking around for information about funding such as the EA future fund and ask about funds and grantmaking at rationalist events and people will probably be able to connect you to large sources of funding, teams of artists and animators to handle the grunt work, or even other ML researchers who have a lot of knowledge of ways to figure out ways to accurately depict RLHF problems visually (think rob bensinger or eliezer yudkowsky). I hope that this isn’t your final compendium on RLHF (I’m bookmarking it either way) and that you spend at least a little bit of time evaluating whether it’s possible to describe problems with RLHF to ML researchers using gifs.
- Charbel-Raphaël 30 Jan 2023 19:52 UTC
  9 points
  8
  Parent
  Thank you! Yes, for most of these issues, it’s possible to create GIFs or at least pictograms. I can see the value this could bring to decision-makers.
  However, even though I am quite honored, it’s not because I wrote this post that I am the best person to do this kind of work. So, if anyone is inspired to work on this, feel free to send me a private message.