CMIIW, you are looking at information content according to the LLM, but that’s not enough. It has to be learnable information content to avoid the noisy TV problem. E.g. a random sequence of tokens will be unpredictable and high perplexity. But if it’s learnable, then it has potential.
I had a go at a few different approaches here https://github.com/wassname/detect_bs_text
I wonder where the best places to write are. I’d say Reddit and GitHub are good bets, but you would have to get through their filtering, for karma, stars, language, subreddit etc.