the gears to ascension comments on “Fragility of Value” vs. LLMs

the gears to ascension 13 Apr 2022 2:30 UTC
2 points
I agree with this criticism, and I never know when to decide my response should be an “answer”, so I’ll express my view as a comment: selecting the output and training data that will cause a large language model to converge towards behavioral friendliness is a big deal, and seems very promising towards ensuring that large language models are only as misaligned as humans. unfortunately we already know well that that’s not enough; corporations are to a significant degree aggregate agents who are not sufficiently aligned. I’m in the process of posting a flood of youtube channel recommendations on my short form section, will edit here in a few minutes with a few relevant selections that I think need to be linked to this.

(Slightly humorous: It is my view that reinforcement learning should not have been invented.)
- Not Relevant 13 Apr 2022 2:40 UTC
  2 points
  Parent
  You can still do MBRL on the LLM as the reward though?
  - the gears to ascension 13 Apr 2022 3:01 UTC
    1 point
    Parent
    Hmm. I guess that might be okay? as long as you don’t do really intense planning, the model shouldn’t be any more misaligned than a human, so it then boils down to training kindness by example and figuring out game dynamics. https://www.youtube.com/watch?v=ENpdhwYoF5g. more braindump of safety content I always want to recommend in every damn conversation here on my shortform