Asserting nociception as fact when that’s the very thing under question is poor argumentative behavior.
Does your model account for Models Don’t “Get Reward”? If so, how?
Backpropagation of the error gradient is more similar to nociception/torture than evolution by random mutation.I’ve to check how RLHF is made...EDIT : error backpropagation is the workhorse behind reward learning, and policy update.The NN is punished for not doing as well as it could have.
Also I wasn’t being argumentative, I was trying to convey an idea. It was redundancy.
Asserting nociception as fact when that’s the very thing under question is poor argumentative behavior.
Does your model account for Models Don’t “Get Reward”? If so, how?
Backpropagation of the error gradient is more similar to nociception/torture than evolution by random mutation.
I’ve to check how RLHF is made...
EDIT : error backpropagation is the workhorse behind reward learning, and policy update.
The NN is punished for not doing as well as it could have.
Also I wasn’t being argumentative, I was trying to convey an idea. It was redundancy.