Enjoyed reading this article, thanks for taking the time to carefully write it up!
Something I wanted to flag—I’m not totally convinced that people have a good calibration to identifying AI writing from human writing, at least without any helpful priors, such as the person’s normal writing style. I haven’t formally looked into this, but am curious whether you (or anyone else) had found any strong evidence that convinced you otherwise.
A few reasons to back up my skepticism:
There was a calibration test for deepfake videos at the MIT museum, which showed statistics on % correct of other visitors after you made your guess. Although people were reasonably calibrated on some videos, there were a non-trivial amount which people weren’t.
Writing seems fundamentally harder to classify IMO, hence why plagiarism isn’t a solved problem.
I feel like it is easy to get confirmation bias on how well you identify AI generated writing given that you often are aware of the true positives (e.g. by interacting with AI), and true negatives (e.g. reading pre 2022 writing) but not as much exposure to false negatives and positives.
You can obfuscate AI generated text pretty easily (e.g. by removing em dashes, content summarizations etc). This is much easier if you actually understand the content you are getting it to generate, such as when you have a draft to polish up or having it flesh out something you had been thinking about previously.
I might be taking your claim a little bit out of context, as you were discussing it more in relation to having it help you with idea generation, but I still feel like this is worth raising. I agree that you might be fooling yourself that you are producing good content by using AI, but I disagree that people will definitely “sniff out” that you used AI to help.
For Interpretability research, something being worked on right now are a set of tutorials which replicates results from recent papers in NNsight: https://nnsight.net/applied_tutorials/
What I find cool about this particular effort is that because the implementations are done with NNsight, it both makes it easier to adapt experiments to new models, and you can run the experiments remotely.
(Disclaimer—I work on the NDIF/NNsight project, though not on this initiative, so take my enthusiasm with a grain of salt)