Man, what haven’t they done?
lc
If any groups are up for organizing it, I would love the chance to attend a protest at the a16z offices.
I was thinking about offering the ability to change the model, but this seems like a more general solution. I would conceptualize it less as an API and more as a “plugin investigative process” that the user could select, that would be subjected to benchmarks that we’d also use to optimize the “default” tool and that people could compare.
When in fact the post was published on December 03 2025.
Probably just a bug. It has to grab all this shit from the SPA DOM.
If your goal is to tell us “here’s what the extension is like, also be aware that some of the corrections are wrong/unhelpful”, then fine
Yeah, that was why I left the example in. Hopefully it will get better soon:
That is cherrypicking. You didn’t get a random sample, but you knew the ones before your chosen point were bad.
Dude, I gathered the screenshots at the time I did, because that’s when I began writing. The ones I generated while testing weren’t bad; they had the same hit rate as everything else. If you don’t believe me then like I said, you can refrain from using it; I promise I will never contact you about it and I apologize in advance if someone ever does.
In what sense is not pointing out, and forcing every reader to figure it out for themselves (if they do at all) that it is a highly dubious correction
You clearly didn’t read the conversation; just going to ban, go be an asshole somewhere else.
eg, he’s claiming that Scott probably believes that there are group differences in intelligence (it’s a part of his worldview), but is also flinching away from propagating all the implications.
Right, and what I’m saying is that they’re very explicit about it and propagate the implications.
I have no way of knowing that
Ok, well, I’m telling you.
You already posted at least two you made besides OP; are you claiming that all of those are from before v0.1?
They were the first articles I got feedback for after finalizing the initial v0.1 GitHub release, two days ago, the day I wrote the post. They are representative examples. LessWrong’s draft reviewer actually pointed out that he thought one on my article was a nit, and I left it in because I wanted to be accurate about what the tool’s output was:
There’s also a public API which you can use to inspect a random sample of all of the pieces I looked at after I moved off local and started hosting api.openerrata.com. Or you can just use the tool yourself.
Er, yes, the correction was wrong, unless you disagree with GeneSmith and have not yet had the time to respond with a rebuttal to explain why your correction is right?
The comment is me asking GeneSmith if the correction from GPT-5.2 is accurate. GeneSmith replied that it wasn’t. You described that as me being “wrong”, not the output of the tool being incorrect.
And these are the examples lc has chosen to cherrypick as a demo? Not to mention, he was already apparently wrong in an earlier deployment, not included as an example here.
Well that’s unfair; these aren’t “cherry-picks”, they’re just the first four or five corrections the tool gave. And that comment wasn’t “wrong”, it’s me screenshotting the tool’s output and asking him whether the AI was accurate, because I’m trying to test out the v0.1 of a product I coded in the last five days. Here’s another comment I left about the Japan post that led to a correction before this post went out.
Very possible; there are workflow optimizations I’m planning on making that will help prevent #2 and sort of help with #1.
How hard would it be to extend it to the whole Web?
Would love to do that! Right now I’m adding sources deliberately (which doesn’t take very long as it’s just implementing an Interface), mostly as a cost saving measure, so that people aren’t constantly requesting new investigations based on e.g. an additional comment to the same page. But maybe there’s some sort of “fallback” we could also add? I would have to check how genius did it.
Are there any particular websites/groups of websites you’d specifically wish to see?
Open sourcing a browser extension that shows when people are wrong on the internet
It’s probably difficult to respond to this in a way that’s satisfying to you, because I and most other people are not paid independently to post on the internet, and so there’s limits to what we can say in public. But every man under the age of 30 that I’ve ever met at Lighthaven, that I’ve had the opportunity to speak with privately, has completely and totally integrated IQ differences into their ontology. There’s no hem and hawing, they’ve just accepted it as part of their worldview.
The reason I disagree voted with your post is not because it touches on ‘taboos’, it’s because it’s a vast inferential leap, developed piecemeal from assorted blogs and sociological studies you’ve gathered on the internet. Anybody that doesn’t both share your priors and information feed almost exactly will naturally end up disagreeing with large portions of it, even if they agree with you that the cause of black poverty is genetics. And IMO for good reason, because large portions of the post are generated from claims like “Elites are pro-Hamas” that are just literally and obviously false, and that you’re treating as background knowledge that I’m supposed to share.
GPT-5.2 is saying this:
Is that accurate? The clickthrough says:
The post claims that if a later embryo transfer from the same retrieval results in another baby, that second child is “ignored by official statistics.”
But SART’s published outcome tables and methodology describe reporting that includes frozen embryo transfers (FETs) and their live-birth outcomes:
The SART Outcome Tables page contains dedicated sections for frozen embryo transfer outcomes (with counts of transfers and live birth rates).
In the “Understand This Report” explanation, SART states that by tracking outcomes from egg-retrieval–initiated cycles, the report “accounts for both fresh and frozen embryo transfers resulting from the egg retrieval cycle.”
SART also defines that each ART cycle is counted (including separate cycles performed for banking/transfer), which means live births from later FET cycles are part of the official statistics (even if they are not always presented as “children per single retrieval” in one headline number).
So while some summary metrics may focus on the first transfer outcome for a retrieval within a given reporting structure, it is not accurate to say a second child from a subsequent transfer is simply “ignored by official statistics.”
Sources
SART Outcome Tables (Public SART Clinic Summary/Outcome Tables)
Be skeptical of milestone announcements by young AI startups
Probably will end up being redundant but I have started to vibecode a browser extension for this that I will just open source
Large language models are so good at this point that I think it might be a good idea for them to fact check LessWrong posts.
I apologize, I was just joking.
Personally I would like to protest their influence over the U.S. government and AI policy specifically. I think that’s something a lot of people find immediately sympathetic, even people with different views on AI X-risk than us. It’s also what I’m angry about right now in this moment.