lc answers “Fragility of Value” vs. LLMs

lc 13 Apr 2022 18:47 UTC
2 points
0
The problem is how you incorporate that understanding into an optimization process, not necessarily how you get an AI to understand those values.
- Not Relevant 13 Apr 2022 19:23 UTC
  2 points
  0
  Parent
  Given my above reply to james.lucassen about explicitly using a regressor LLM as a reward model, does that give better insight?
  Or are you skeptical of the AI’s mapping from “world state” into language? I’d argue that we might get away with having the AI natively define its world state as language, a la SayCan.
  - lc 13 Apr 2022 19:41 UTC
    2 points
    0
    Parent
    I have no idea what I mean, on further reflection. I’m as confused as you are on why this is hard if we have an accurate utility function sitting right there. Maybe the idea is that subject to optimization pressure it would fail?
    - Not Relevant 14 Apr 2022 0:23 UTC
      1 point
      0
      Parent
      Yeah so I think that’s what the adversarial example/OOD people worry about. That just seems… like it buys you a lot? And like we should focus more on those problems specifically.