(Probably a stupid nooby question that won’t help solve alignment)
Suppose you implement a goal in an AI through a reinforcement learning system. Why does the AI really “care” about this goal? Why does it obey? It does because it is punished and/or rewarded, which motivates it to achieve that goal.
Okay. So why does AI really care about punishment and reward in the first place? Why does it follows its implemented goal?
Sentient beings do because they feel pain and pleasure. They have no choice but to care about punishment and reward. They inevitably do it because they feel it. Assuming that our AI does not feel, what is the nature of its system of punishments and rewards? How is it possible to punish or reward a non-sentient agent?
My intuitive response would be “It is just physics. What we call ‘reward’ and ‘punishment’ are just elements of a program forcing an agent to do something”, but I don’t understand how this RL physics is different from that in our carbonic animal brains.
Do Artificial Reinforcement Learners Matter Morally, written by Brian Tomasik, makes the distinction even less obvious for me. What do I miss?