dirk comments on Nicolas Lupinski’s Shortform

dirk 8 Nov 2025 18:32 UTC
1 point
0
Asserting nociception as fact when that’s the very thing under question is poor argumentative behavior.
Does your model account for Models Don’t “Get Reward”? If so, how?
- Nicolas Lupinski 8 Nov 2025 18:59 UTC
  2 points
  0
  Parent
  Backpropagation of the error gradient is more similar to nociception/torture than evolution by random mutation.
  
  I’ve to check how RLHF is made...
  
  EDIT : error backpropagation is the workhorse behind reward learning, and policy update.
  
  The NN is punished for not doing as well as it could have.
- Nicolas Lupinski 8 Nov 2025 21:05 UTC
  1 point
  0
  Parent
  Also I wasn’t being argumentative, I was trying to convey an idea. It was redundancy.