> This does not give me any confidence in your results given that in the most trivially checkable places so far, complete ignorant amateurs here have already found serious misstatements.
This was a month ago and I’ve smoothed over errors since. A lot of bewilderment has since faded. I certainly hope that’s not due to LLMs talking authoritatively to me, but I have more reason to cast doubt on that now.
> and America represents less than half global R&D so that’s at least a 100% overestimate
The report this is from (F “225”) is about American losses to everyone specifically, not total R&D lost to the Chinese. So foreign IP isn’t relevant.
> And I don’t think it is front page material until some actual experts, not LLMs, sign off on it.
Yes this post did not go through anyone who actually works in utilities or a space weather expert; I think now it was a mistake not to run this through some first. Now that there’s an artifact of my research thus far, doing so is easier, so I’ll do that now and add an epistemic status marker at the beginning.
This was a month ago and I’ve smoothed over errors since.
This was exactly the response I was hoping you would not make. The problem is not the mere existence of a specific error, but what it says about the process as a whole. Thinking you can just patch bugs is not a solution; a solution is preventing the bugs from happening in the first place. The solution to buffer overflows was not patching every C program one by one as hackers discovered each vulnerability, but moving to memory-safe languages; the solution to ChatGPTese is not search-and-replacing em dashes with semicolons or rewriting it until it fools Pangram...
The report this is from (F “225”) is about American losses to everyone specifically, not total R&D lost to the Chinese.
Fair enough. It is still an overestimate for the previous mentioned reason and the footnote is still wrong. (And now that I look at the PDF, I am in even more doubt about the substantive claim of positive externalities; it is not at all obvious to me how to transform a claim of an annual loss of “counterfeit goods, pirated software, and theft of trade secrets” into a global positive externality figure, especially given how enormous Chinese R&D has become as a % of global R&D, and how much of a powerhouse they are in many industries like solar panels or cars. What is sauce for the goose is sauce for the gander.)
This was exactly the response I was hoping you would not make. The problem is not the mere existence of a specific error, but what it says about the process as a whole. Thinking you can just patch bugs is not a solution; a solution is preventing the bugs from happening in the first place. The solution to buffer overflows was not patching every C program one by one as hackers discovered each vulnerability, but moving to memory-safe languages; the solution to ChatGPTese is not search-and-replacing em dashes with semicolons or rewriting it until it fools Pangram...
I’m confused by what you think the counterfactual here is, and how your proposed LLM policy would have helped here at all. Approximately none of the text in this post was written by an LLM. It probably is the case that most of the facts cited in the post came from LLM-generated research, and it’s true that I have no idea how many of them were checked against primary sources. This does not seem like a difference in kind from a post where most of the cited facts came from variety of NYTimes articles (or other secondary sources of similar repute); if anything I’d expect current frontier LLMs to be slightly safer to rely on in this way (maybe more hallucinations, but less actively adversarial “technically true but misleading” framings).
In particular, this seems false:
I can make no use of this post, and it is worthless to me except perhaps as a source of some curated external links for my own LLM agents.
If you believed this, you might be happy to take bets about how accurately the post represented e.g. various historical cases, such as the 2003 solar storms that reportedly bricked 12 transformers in South Africa. I do not think you believe the correct update to have made, upon reading that section of the post, was “no update”.
And I don’t think it is front page material until some actual experts, not LLMs, sign off on it.
I don’t think this post is confidently making “interesting” claims that would more strongly motivate “expert” sign-off than many other interesting[1] LessWrong posts.
It probably is the case that most of the facts cited in the post came from LLM-generated research, and it’s true that I have no idea how many of them were checked against primary sources. This does not seem like a difference in kind from a post where most of the cited facts came from variety of NYTimes articles
I think there is a lot more reason to trust the facts cited in an NYT article. For one, the New York Times, along with most major news publications, has standards for fact checking. They try hard to get primary source validation, or at least secondary source validation (some of those guidelines are stated here); falsifying information is a fireable offense. They also have a reputation to uphold, a major part of which rests on their ability to convey the news truthfully. These kinds of checks don’t really exist for LLMs.
Nor do we have much insight into how LLM information is generated. With news publications, we can at least understand the sorts of biases which might be introduced via the mechanisms under which stories are produced: people interviewing a bunch of people, maybe in misleading ways, leaving out some facts, etc. With LLMs, we have much less of an idea of what kind of errors might emerge, and hence what to mentally correct for, since we don’t understand the process that generates their outputs.
if anything I’d expect current frontier LLMs to be slightly safer to rely on in this way (maybe more hallucinations, but less actively adversarial “technically true but misleading” framings).
Perhaps this is just a personal difference, but I would much rather take “technically true but misleading” over “totally wrong but subtle enough and authoritative enough and seems-kind-of-right enough that you can barely notice unless you really dig into the claims or already have extensive background knowledge.”
I do not think you believe the correct update to have made, upon reading that section of the post, was “no update”.
My response upon reading that LLMs did substantial research or writing for a post is generally to not make any update. That doesn’t mean parts of it aren’t right, they likely are, it just means that it takes a ton of work for me to sus out what’s true (much more than for a human post, for reasons that Gwern outlined above), and it’s usually not worth it.
Yeah I’ve eliminated the footnote’s sentence. It was way too shaky and didn’t even bring much to the post. The reason I quoted it at all is because I had read the claim in an Economist article I couldn’t find, and thought it’d be interesting to include it.
> This was exactly the response I was hoping you would not make. The problem is not the mere existence of a specific error, but what it says about the process as a whole.
Right… well I’m emailing researchers now. I hope to overhaul this post that way. I will definitely do so first next time.
> This does not give me any confidence in your results given that in the most trivially checkable places so far, complete ignorant amateurs here have already found serious misstatements.
This was a month ago and I’ve smoothed over errors since. A lot of bewilderment has since faded. I certainly hope that’s not due to LLMs talking authoritatively to me, but I have more reason to cast doubt on that now.
> and America represents less than half global R&D so that’s at least a 100% overestimate
The report this is from (F “225”) is about American losses to everyone specifically, not total R&D lost to the Chinese. So foreign IP isn’t relevant.
> And I don’t think it is front page material until some actual experts, not LLMs, sign off on it.
Yes this post did not go through anyone who actually works in utilities or a space weather expert; I think now it was a mistake not to run this through some first. Now that there’s an artifact of my research thus far, doing so is easier, so I’ll do that now and add an epistemic status marker at the beginning.
This was exactly the response I was hoping you would not make. The problem is not the mere existence of a specific error, but what it says about the process as a whole. Thinking you can just patch bugs is not a solution; a solution is preventing the bugs from happening in the first place. The solution to buffer overflows was not patching every C program one by one as hackers discovered each vulnerability, but moving to memory-safe languages; the solution to ChatGPTese is not search-and-replacing em dashes with semicolons or rewriting it until it fools Pangram...
You can link to a specific page like
https://www.nbr.org/wp-content/uploads/pdfs/publications/IP_Commission_Report_Update.pdf#page=9BTW.Fair enough. It is still an overestimate for the previous mentioned reason and the footnote is still wrong. (And now that I look at the PDF, I am in even more doubt about the substantive claim of positive externalities; it is not at all obvious to me how to transform a claim of an annual loss of “counterfeit goods, pirated software, and theft of trade secrets” into a global positive externality figure, especially given how enormous Chinese R&D has become as a % of global R&D, and how much of a powerhouse they are in many industries like solar panels or cars. What is sauce for the goose is sauce for the gander.)
I’m confused by what you think the counterfactual here is, and how your proposed LLM policy would have helped here at all. Approximately none of the text in this post was written by an LLM. It probably is the case that most of the facts cited in the post came from LLM-generated research, and it’s true that I have no idea how many of them were checked against primary sources. This does not seem like a difference in kind from a post where most of the cited facts came from variety of NYTimes articles (or other secondary sources of similar repute); if anything I’d expect current frontier LLMs to be slightly safer to rely on in this way (maybe more hallucinations, but less actively adversarial “technically true but misleading” framings).
In particular, this seems false:
If you believed this, you might be happy to take bets about how accurately the post represented e.g. various historical cases, such as the 2003 solar storms that reportedly bricked 12 transformers in South Africa. I do not think you believe the correct update to have made, upon reading that section of the post, was “no update”.
I don’t think this post is confidently making “interesting” claims that would more strongly motivate “expert” sign-off than many other interesting[1] LessWrong posts.
Maybe you disagree! But I don’t think this post is much worse on the relevant dimensions than the average curated post.
I think there is a lot more reason to trust the facts cited in an NYT article. For one, the New York Times, along with most major news publications, has standards for fact checking. They try hard to get primary source validation, or at least secondary source validation (some of those guidelines are stated here); falsifying information is a fireable offense. They also have a reputation to uphold, a major part of which rests on their ability to convey the news truthfully. These kinds of checks don’t really exist for LLMs.
Nor do we have much insight into how LLM information is generated. With news publications, we can at least understand the sorts of biases which might be introduced via the mechanisms under which stories are produced: people interviewing a bunch of people, maybe in misleading ways, leaving out some facts, etc. With LLMs, we have much less of an idea of what kind of errors might emerge, and hence what to mentally correct for, since we don’t understand the process that generates their outputs.
Perhaps this is just a personal difference, but I would much rather take “technically true but misleading” over “totally wrong but subtle enough and authoritative enough and seems-kind-of-right enough that you can barely notice unless you really dig into the claims or already have extensive background knowledge.”
My response upon reading that LLMs did substantial research or writing for a post is generally to not make any update. That doesn’t mean parts of it aren’t right, they likely are, it just means that it takes a ton of work for me to sus out what’s true (much more than for a human post, for reasons that Gwern outlined above), and it’s usually not worth it.
Yeah I’ve eliminated the footnote’s sentence. It was way too shaky and didn’t even bring much to the post. The reason I quoted it at all is because I had read the claim in an Economist article I couldn’t find, and thought it’d be interesting to include it.
> This was exactly the response I was hoping you would not make. The problem is not the mere existence of a specific error, but what it says about the process as a whole.
Right… well I’m emailing researchers now. I hope to overhaul this post that way. I will definitely do so first next time.