I had, with Claude-Opus-4.1 and Gemini-2.5-Pro, but only with n=1, using a real-world case where I was contacted by someone who felt they had such a breakthrough. I love your idea of trying it on rejected LW posts!
<tries it on three rejected LW posts, chosen quickly based on rejection tag, title, and length. Names omitted for politeness’s sake, clickthrough available>
GPT-5-Thinking: ‘Scientific validity...Low at present; salvageable as a toy model with substantial work.’
Gemini-2.5-Pro: ‘Scientific Validity...The project is not scientifically valid in its current form because its mathematical foundation is critically flawed.’
GPT-5-Thinking: ‘The doc is not a lone-user “I’ve discovered X” claim; it’s a measured integration plan with explicit eval gates and rollback...Where self-deception could creep in is metric design.’
Claude-Opus-4.1: ‘The individual components are scientifically valid...However, the leap from “combining existing techniques” to “achieving AGI” lacks scientific justification. While the proposal addresses real challenges like catastrophic forgetting and sample efficiency, there’s no evidence or theoretical argument for why this particular combination would produce general intelligence.’
Claude-Opus-4.1: ‘Scientific Validity low to moderate...While the document engages with legitimate questions in consciousness studies, it lacks the rigor expected of scientific work...It’s closer to amateur philosophy of mind than scientific theory.’
Gemini-2.5-Pro: ‘Scientific Validity: To a low extent...not a scientifically valid theory...best classified as philosophy of mind.’
Those seem like fairly reasonable results. Case 2 is jargon-heavy and hard to evaluate, but it passes my ‘not obvious nonsense and not blatantly unscientific’ filter, at least on a quick read, so I think it’s good that it’s not fully rejected by the LLMs.
I had, with Claude-Opus-4.1 and Gemini-2.5-Pro, but only with n=1, using a real-world case where I was contacted by someone who felt they had such a breakthrough. I love your idea of trying it on rejected LW posts!
<tries it on three rejected LW posts, chosen quickly based on rejection tag, title, and length. Names omitted for politeness’s sake, clickthrough available>
Case 1:
GPT-5-Thinking: ‘Scientific validity...Low at present; salvageable as a toy model with substantial work.’
Gemini-2.5-Pro: ‘Scientific Validity...The project is not scientifically valid in its current form because its mathematical foundation is critically flawed.’
Case 2:
GPT-5-Thinking: ‘The doc is not a lone-user “I’ve discovered X” claim; it’s a measured integration plan with explicit eval gates and rollback...Where self-deception could creep in is metric design.’
Claude-Opus-4.1: ‘The individual components are scientifically valid...However, the leap from “combining existing techniques” to “achieving AGI” lacks scientific justification. While the proposal addresses real challenges like catastrophic forgetting and sample efficiency, there’s no evidence or theoretical argument for why this particular combination would produce general intelligence.’
Case 3:
Claude-Opus-4.1: ‘Scientific Validity low to moderate...While the document engages with legitimate questions in consciousness studies, it lacks the rigor expected of scientific work...It’s closer to amateur philosophy of mind than scientific theory.’
Gemini-2.5-Pro: ‘Scientific Validity: To a low extent...not a scientifically valid theory...best classified as philosophy of mind.’
Those seem like fairly reasonable results. Case 2 is jargon-heavy and hard to evaluate, but it passes my ‘not obvious nonsense and not blatantly unscientific’ filter, at least on a quick read, so I think it’s good that it’s not fully rejected by the LLMs.