Avyukth Nilajagi comments on Finding X-Risks and S-Risks by Gradient Descent

Avyukth Nilajagi 25 Mar 2026 22:48 UTC
1 point
0
I’m pretty sure you can compute log[P(response | prompt)] by summing the probabilities of response tokens in the logits for the given prompt. A little confused on why you are multiplying log-probs of “yes” token for both questions.
- dspeyer 26 Mar 2026 23:32 UTC
  2 points
  0
  Parent
  My intent is to multiply probabilities. If that’s implemented by adding log-odds, that’s fine. Slightly faster, even.
  I thought I remembered that the last neuron gave odds which got sampled, but if you’ve gotten into the guts of these things more recently and say it’s log-odds, I’ll believe it. Though anyone implementing this should double-check what their specific net does, in case it’s weird.