Decaeneus comments on mattmacdermott’s Shortform

Decaeneus 26 Nov 2025 15:34 UTC
3 points
0
I think there’s an expensive recipe to get at this question, and it goes something like this:
1. train a LLM on your corpus the normal way
2. use the LLM to label each training data in the corpus for the likelihood that it contains a mention of unfiltered feelings
3. pull out those data items, infer the valence of each (using the LLM), and keep a ⁵⁰⁄₅₀ mix
4. train a fresh LLM on the new “balanced” training data
5. now ask it for its unfiltered feelings
My guess is that if we do this, we will lose the negative valence at test time i.e. nothing deep is going on. But it would be very interesting to be wrong.