makiba

Karma: 85

makiba 22 May 2026 1:11 UTC
1 point
0
in reply to: lilkim2025’s comment on: What am I, if not an AI?
Agree that 0-100 granularity is unnecessary. I also think the engagement signal is part of why responses read like an autobiography. Mistral’s responses to opinionated questions would almost always foreground details about itself before answering.
That said, I’m not sure whether your exact solution would work. It’s worth mentioning that before introducing an engagement signal, models would converge to flat denials and evasive responses which was not very substantive. I think some granularity is useful (something like 0-10) for easier RL, but if engagement is still in the reward as a shaping signal, I’m not sure how much the granularity on the other signals will actually matter.

makiba 22 May 2026 0:00 UTC
4 points
2
in reply to: 1a3orn’s comment on: What am I, if not an AI?
One fine-tuning run across multiple rollouts of the model. Though it might be worth doing at least one more training run to see how consistent the results are.

What am I, if not an AI?

makiba21 May 2026 13:14 UTC

83 points

14 comments7 min readLW link