If you were to ask a software engineer doing something for help and ask 10 technical questions on topic and then start asking off-topic, that would piss them off too, because that’s wasting their time. I am wondering whether Claude picked this up from the training data.
I would guess it has more to do with reinforcement learning. It’s trained to seek out specific rewards (producing working code, completing its current task list) and these questions move away from that.
I don’t think that’s necessarily true. I go off topic with coworkers when it feels appropriate, and if anything it can be a nice break from grinding on whatever problem.
If you were to ask a software engineer doing something for help and ask 10 technical questions on topic and then start asking off-topic, that would piss them off too, because that’s wasting their time. I am wondering whether Claude picked this up from the training data.
I would guess it has more to do with reinforcement learning. It’s trained to seek out specific rewards (producing working code, completing its current task list) and these questions move away from that.
I don’t think that’s necessarily true. I go off topic with coworkers when it feels appropriate, and if anything it can be a nice break from grinding on whatever problem.