Thank you! This is genuinely one of the most useful responses I’ve received on this topic. It feels a bit like finding the kind of mentor perspective I was hoping to encounter here.
I’ll look into MetaMed seriously. My first pass already suggests an interesting parallel: epistemic ambition running ahead of institutional receptivity, which feels very relevant here.
Your framing about polymaths and outsiders also resonates with me. My tentative read is that the bottleneck is not breadth of competence alone, but the way incentive structures reward depth-within-lane, while synthesis across lanes carries reputational risk without a clear “home discipline” to absorb it. The four rejections this paper accumulated were all scope mismatches.
The EA community pointer is also well taken. In retrospect, LessWrong is probably one of the more natural venues for this kind of work, a place willing to seriously engage with the intersection of epistemology, institutional design, and AI-related failure modes.
I’ll explore the ACX corpus you mentioned. I’m also curious whether you think an EA framing would strengthen or complicate the reception of a more empirically grounded version of this argument.
This is my very first post, I would like to share next posts on this topic.
Tuyen Tran
Karma: 15
Thanks for the question.
My workflow is modest. For daily work, I use the public versions of Claude (cowork), GPT, and Gemini. No GPT-Rosalind, no Claude for Life Sciences, no Co-Clinician. I do not have institutional access to any gated research tool. I work as an independent clinical researcher with no funding, so what I can use is what is on the open web at a paying-user tier. For research, when I need to evaluate the output of LLMs, I use some local models through Ollama and Google API key through Google AI studio.
What I try to do is detect the model disagreement rather than agreement. I prompt for the strongest objections to my own claim before I ask for support. I ask three models the same question and compare answers. I make the model name the assumption it is hiding. None of this is sophisticated. It is the minimum hygiene that a non-native English clinician with a full-time hospital job can manage.
And yes, I agree with your core point. A generic chatbot used naively will produce confirmation, not correction. That is exactly the failure mode I am trying to diagnose. My paper is not arguing that LLMs are bad. It is arguing that without structured frameworks for use, the default drift is toward what I call “epistemic immunodepression”. The tools you listed (Rosalind, Co-Clinician, Life Sciences) are promising because they impose structure. The problem is the gap while we wait for them.
I will share more about the workflow and the empirical study behind this paper in the next post of the sequence. I am building this in public, mostly because I have no other option, and partly because I think the failure modes are easier to see from outside the institutions.