Nicolas Lupinski comments on Nicolas Lupinski’s Shortform

Nicolas Lupinski 8 Nov 2025 18:59 UTC
2 points
0
Backpropagation of the error gradient is more similar to nociception/torture than evolution by random mutation.

I’ve to check how RLHF is made...

EDIT : error backpropagation is the workhorse behind reward learning, and policy update.

The NN is punished for not doing as well as it could have.