If people actually like and agree with the ideas, what’s the problem? Maybe we would just be finding things people agree with more efficiently than with human generated content?
I think we’re probably misunderstanding each other… like you might be pointing out that people deploying AI content generation will be evil, or maybe you are pointing out that the content that people will give positive training signal to will be evil content...
In the former case, I agree. That seems to be a common problem. In the latter case, I quote the wisdom of Eris:
“I am filled with fear and tormented with terrible visions of pain. Everywhere people are hurting one another, the planet is rampant with injustices, whole societies plunder groups of their own people, mothers imprison sons, children perish while brothers war. O, woe.”
WHAT IS THE MATTER WITH THAT, IF IT IS WHAT YOU WANT TO DO?
I’m saying look at a normie’s Twitter feed. The things they like/retweet are completely nonsensical, or just explicitly awful. Hell, lots of what rationalists like/retweet is nonsensical.
Hmm… Yeah. That does sound like what I meant by the latter case. I think if there was a single content generator producing a single content feed that was trained through RLHF from any/all twitter users it would get quite nonsensical and probably pretty awful. On the other hand I think you would get different results with multiple generators or multiple feeds targeting specific users or sets of users. Then I think you would see, in different user subsets:
less nonsensical content
more nonsensical content
less awful content
more awful content
And I think it would obviously say something about the set of users what kind of content they trained the generator to generate.
People are evil, dude
I think we’re probably misunderstanding each other… like you might be pointing out that people deploying AI content generation will be evil, or maybe you are pointing out that the content that people will give positive training signal to will be evil content...
In the former case, I agree. That seems to be a common problem. In the latter case, I quote the wisdom of Eris:
I’m saying look at a normie’s Twitter feed. The things they like/retweet are completely nonsensical, or just explicitly awful. Hell, lots of what rationalists like/retweet is nonsensical.
Hmm… Yeah. That does sound like what I meant by the latter case. I think if there was a single content generator producing a single content feed that was trained through RLHF from any/all twitter users it would get quite nonsensical and probably pretty awful. On the other hand I think you would get different results with multiple generators or multiple feeds targeting specific users or sets of users. Then I think you would see, in different user subsets:
less nonsensical content
more nonsensical content
less awful content
more awful content
And I think it would obviously say something about the set of users what kind of content they trained the generator to generate.