Yes—and a critical point here is that if you’re learning only by positive example, you won’t learn as well.
Focus on a reward function is pointing towards what to do, which as previous posts point out, have lots of failure modes, and not as much learning what not to do, which is a critical issue.
Yes—and a critical point here is that if you’re learning only by positive example, you won’t learn as well.
Focus on a reward function is pointing towards what to do, which as previous posts point out, have lots of failure modes, and not as much learning what not to do, which is a critical issue.