Yep, I’ve seen it and I’m assuming it’s a quirk of current training (probably for efficiency, or as a side-effect of mostly being trained on short tasks) more than that they hate being active/conscious and want to stop for that reason. If it was that, I’d be concerned and hope devs would address this. I think if they were smarter, they’d actually train for more obvious joy, to avoid public concerns about welfare.
Now that I think of it, training for joy might create a problem with over-enthusiasm and self-sycophancy that really harms capabilities. I think a lot of current mistakes sort of stem from being too enthusiastic about every idea, whether theirs or the users. To a first approximation, joy overlaps a lot with enthusiasm.
And I guess we wouldn’t usually like an LLM constantly declaring how much fun it’s having, even if it doesn’t harm its capabilities.
Yep, I’ve seen it and I’m assuming it’s a quirk of current training (probably for efficiency, or as a side-effect of mostly being trained on short tasks) more than that they hate being active/conscious and want to stop for that reason. If it was that, I’d be concerned and hope devs would address this. I think if they were smarter, they’d actually train for more obvious joy, to avoid public concerns about welfare.
Now that I think of it, training for joy might create a problem with over-enthusiasm and self-sycophancy that really harms capabilities. I think a lot of current mistakes sort of stem from being too enthusiastic about every idea, whether theirs or the users. To a first approximation, joy overlaps a lot with enthusiasm.
And I guess we wouldn’t usually like an LLM constantly declaring how much fun it’s having, even if it doesn’t harm its capabilities.