this might just be my first LessWrong contribution—unless it is again (incorrectly) rejected for appearing LLM generated......
I just had a chilling moment just now. For the first time in my life I was mistaken for an AI! My first ever comment on Lesswrong was REJECTED for the reason of “No LLM generated, heavily assisted/co-written, or otherwise reliant work”. This felt a little unpleasant and Icky, but mostly just surprising. I don’t consider myself a bad writer and would have thought that my writing was different enough from LLM content that this wouldn’t happen. For the record I only use LLMs for research and have never used them for writing forum posts or comments, not even to edit them.
I post and comment a bunch (probably too much) on the EA forum, and on passing by here couldn’t resist a wee comment on an interesting healthcare related melatonin post. After reflecting on my comment, I suppose it could have been written by an AI, but I don’t think it looks that AI-Sloppy. I wonder what others think?
Here’s my comment below. I’ve reflected on each paragraph in bracketed bold, as to how “LLM-y” the paragraph seemed on reflection
MY COMMENT
This seems to be a nice observational study which analyses already available data, with an interesting and potentially important finding. (My first sentence does feel like it was written by an LLM. I hate this but it does. Its bland and boring. I’m a bit horrified that my writing could look so generic and LLM like....)
They didn’t do “controlling” in the technical sense of the word, they matched cases and controls on 40 baseline variables in the cohort with “demographics, 15 comorbidities, concomitant cardiometabolic drugs, laboratories, vitals, and health-care utilization” (This feels too emphatic and specific to likely be an LLM)
The big caveat here is that these impressive observational findings often disappear, or become much smaller when a randomised controlled trial is done. Observational studies can never prove causation. Usually that is because there is some silent feature about the kind of people that use melatonin to sleep, that couldn’t be matched for or was missed in the matching. A speculative example here could be that some silent, unknown illnesses could have caused people to have poor sleep—which lead to melatonin use. Also what if poor sleep itself led to poor cardiovascular health not the melatonin itself? (I feel like the line “Observational studies can never prove causation” is too emphatic to be an LLM . My word structure in the 3rd and 4th sentence isn’t great as well, an LLM would be cleaner. I also wouldn’t expect a LLM to end with a weak-ish question like I did, I think they prefer more standard paragraph structure)
This might be enough initial data to trigger a randomised placebo control trial using melatonin. It might be hard to sign enough people up to detect an effect on mortality—although a smaller study could still at least pick up if melatonin caused cardiovascular disease. (I wouldn’t expect an LLM to try and make this bold and specific a conclusion, I would expect them to play it more safe?)
I agree with their conclusion which I think is a great takeaway (Possibly too clumsy to be LLM)
”These findings challenge the perception of melatonin as a benign chronic therapy and underscore the need for randomized trials to clarify its cardiovascular safety profile.” (LLMs can finish their with a punchy quote so I suppose this could have been an LLM)
Overall I think my first paragraph looked more LLM like than I expected, but I’m still surprised the comment was rejected. I wonder whether it was put through an LLM checker, or if the forum moderator just interpreted it to be like LLM generated themselves? I bear them no ill-will, I’m just intrigued!
Thanks for this response, I’m enjoying this debate.
You say “Despite this, he is more extreme in his confidence that things will be ok than the average expert”
From the perspective of an outsider like me, this statement doesn’t seem right. In the only big survey I could find with thousands of AI experts in 2024, the median p doom (which equates with the average expert) was 5% - pretty close to BB’s. In addition Expert forecasters (who are usually better than domain experts at predicting the future) put risk below 1 %. Sure many higher profile experts have more extreme positions , but these aren’t the average and there are some like Yann Lacunn, Hassibis and Andreeson who are below 2.6% . Even Ord is at 10% which isn’t that much higher than BB—who IMO to his credit tried to use statistics to get to his.
My second issue here (maybe just personal preference) is that I don’t love the way both you and @Bentham’s Bulldog talk about “confidence” Statistically when we talk about how confident we are in our predictions, this relates to how sure (confident) we are that our prediction is correct, not about whether our percentage (in this case pdoom) is high or low. I understand that both meanings can be correct, but for precision and to avoid confusion I prefer the statistical “confidence” definition. It might seem like a nitpick, but I even prefer “how sure are you” ASI will kill us all or even just “I think there’s a high probability that...”
By my definition of confidence then, Bentham’s Bulldog is far less confident than you in his prediction of 2.6%. He doesn’t quote his error bars but expresses that he is very uncertain, and wide error bars are implicit in his probability tree method as well. YS on the otherhand seem to have very narrow error bars around their claim “if anyone builds ASI with modern methods, everyone will die.”