Train an RL agent with access to its previous step reward as part of its observation.
This is making me notice a terminological ambiguity where sometimes “RL agent” refers to a model/policy trained by a reinforcement learning algorithm (such as REINFORCE) like you’re doing here, and sometimes it refers to an agent that maximizes expected reward (given as an input), such as AIXI, like in Daniel Dewey’s Learning What to Value, and a “RL agent” in the first sense is not necessarily a “RL agent” in the second sense.
To disambiguate, it seems a good idea to call the former kind of agent something like “RL-trained agent” and the second kind of agent “reward-maximizing agent” or “reward-maximizer” for short. Then we can say things like, “If a RL-trained agent is not given direct access to its step rewards during training, it seems less likely to become a reward-maximizer.” Any thoughts on this suggestion? (I’ll probably make a post about this later, but thought I’d run it by you and any others who sees this comment for a sanity check first.)
When I use the term “RL agent,” I always mean an agent trained via RL. The other usage just seems confused to me in that it seems to be assuming that if you use RL you’ll get an agent which is “trying” to maximize its reward, which is not necessarily the case. “Reward-maximizer” seems like a much better term to describe that situation.
When I use the term “RL agent,” I always mean an agent trained via RL.
I think the problem with this usage is that “RL agent” originally meant something like “an agent designed to solve a RL problem” where “RL problem” is something like “a class of problems with the central example being MDP”. I think it’s just not a well-defined term at this point, and if you Google it, you get plenty of results that say things like “the goal of our RL agent is to maximize the expected cumulative reward”, or “AIXI is a reinforcement learning agent”. I guess this is fine for AI capabilities work but really confusing for AI safety work.
So, consider switching to “RL-trained agent” for greater clarity (unless someone has a better suggestion)? ETA: Maybe “reinforcement trained agent”?
This is making me notice a terminological ambiguity where sometimes “RL agent” refers to a model/policy trained by a reinforcement learning algorithm (such as REINFORCE) like you’re doing here, and sometimes it refers to an agent that maximizes expected reward (given as an input), such as AIXI, like in Daniel Dewey’s Learning What to Value, and a “RL agent” in the first sense is not necessarily a “RL agent” in the second sense.
To disambiguate, it seems a good idea to call the former kind of agent something like “RL-trained agent” and the second kind of agent “reward-maximizing agent” or “reward-maximizer” for short. Then we can say things like, “If a RL-trained agent is not given direct access to its step rewards during training, it seems less likely to become a reward-maximizer.” Any thoughts on this suggestion? (I’ll probably make a post about this later, but thought I’d run it by you and any others who sees this comment for a sanity check first.)
When I use the term “RL agent,” I always mean an agent trained via RL. The other usage just seems confused to me in that it seems to be assuming that if you use RL you’ll get an agent which is “trying” to maximize its reward, which is not necessarily the case. “Reward-maximizer” seems like a much better term to describe that situation.
I think the problem with this usage is that “RL agent” originally meant something like “an agent designed to solve a RL problem” where “RL problem” is something like “a class of problems with the central example being MDP”. I think it’s just not a well-defined term at this point, and if you Google it, you get plenty of results that say things like “the goal of our RL agent is to maximize the expected cumulative reward”, or “AIXI is a reinforcement learning agent”. I guess this is fine for AI capabilities work but really confusing for AI safety work.
So, consider switching to “RL-trained agent” for greater clarity (unless someone has a better suggestion)? ETA: Maybe “reinforcement trained agent”?