Gunnar_Zarncke comments on Is there any literature on using socialization for AI alignment?

Gunnar_Zarncke 20 Apr 2023 8:19 UTC
2 points
0
I agree that these methods are very likely not effective on strong AGI. But one might still figure out how effective they are and then align AI up to that capability (plus buffer). And one can presumably learn much about alignment too.
- the gears to ascension 20 Apr 2023 8:41 UTC
  2 points
  0
  Parent
  Perhaps! I’m curious which of them catch your eye for further reading and why. I’ve got a lot on my reading list, but I’d be down to hop on a call and read some of these in sync with someone.
  - Gunnar_Zarncke 20 Apr 2023 16:57 UTC
    4 points
    0
    Parent
    I found this one particularly relevant:
    https://arxiv.org/abs/2010.00581 - “Emergent Social Learning via Multi-agent Reinforcement Learning”
    It provides a solution to the problem of how an RL agent can learn to imitate the behavior of other agents.
    It doesn’t help with alignment though; is more on the capabilities side.
  - Gunnar_Zarncke 20 Apr 2023 17:09 UTC
    3 points
    0
    Parent
    None of these papers seem to address the question of how the agent is intrinsically motivated to learn external objectives. Either there is a human in the loop, the agent learns from humans (which improves its capability but not its alignment), or RL is applied on top. I’m in favor of keeping the human in the loop but it doesn’t scale. RL on LLMs is bound to fail, i.e., being gamed, if it the symbols aren’t grounded in something real.
    I’m looking for something that explains how the presence of other agents in the environment of an agent together with reward/feedback grounded in the environment as in [Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL leads to aligned behaviors.