Seems like a false binary. What if I speak English as a second language and only use AI to clean up my prose before publishing? What if the ideas came from an AI chat but I did all the editing myself?
If you wrote it yourself, it’s doesn’t go in the AI tag. If you used the AI to translate, it does. (The point of the tag is to filter for things people have written themselves and thus are a kind of testimony that AI writing is not).
If you used a service to do an actual strict translation of something you wrote yourself in another language, I’m not 100% sure which call we’ll make once that that AI tag exists. But, that’s the instruction we currently give people who submit ai-slop-feeling-things who say they wrote in a second language. (i.e. compare elsethread where Dagon uses an AI to summarize his point, and it loses key nuance, to what it might have said if he’d written it in spanish and asked for a strict translation)
(I went and tested having an LLM translate Dagon’s comment from english to spanish, and then [in another chat] to english again. It changed the ordering of a couple words but was basically the same)
I often paste a comment draft into an LLM chat and ask it to flag issues with spelling, grammar, or phrasing. (I did it with the grandparent of this comment for example, and accepted a suggestion or two.) If I choose to adopt the LLM’s suggested corrections or rephrasings, do I now have to delimit them in the AI tag?
How about use of AI for anonymization purposes such as the recent Possessed Machines essay?
Forcing people to disclose grammarly usage feels like petty authoritarianism. I still think you should focus less on usage of AI and more on the actual content. Does it feel like AI slop? Is it not “testimony” that the author is willing to stand by? Etc.
If someone is able to use AI to generate a large number of high-quality alignment posts that don’t read like AI slop, I would call that mission fucking accomplished. I worry this AI tag will make it harder to notice if this (very valuable and interesting!) scenario happens, since people will get in the habit of mentally filtering out AI-tag content. Therefore, I would e.g. advocate tagging on the basis of “feels like AI slop” over “was generated with AI assistance”.
I’m not actually sure about all our exact policies (@habryka may have clearer takes), but, I think if the AI wasn’t responsible for choosing any of the phrasing, just cleaning up grammatical stuff, then it doesn’nt need to be in the AI block.
(I expect the default rule to approximately be “does our AI detector detect any AI in a given paragraph? If so, the post is probably flagged/delisted barring special circumstances. )
How about use of AI for anonymization purposes such as the recent Possessed Machines essay?
Yes, this would be a straightforward case for putting most paragraph blocks in the AI tag.
If someone is able to use AI to generate a large number of high-quality alignment posts that don’t read like AI slop, I would call that mission fucking accomplished.
If we get to this point, probably we re-evaluate the policy, and/or have a conversation about how as a community to relate to AI content. But, the problem is we need to distinguish “AI is generating high-quality alignment posts” from “AI is generating what looks like high quality alignment posts”, and we’re certainly at least going to be spending one generation of frontier-model where it’s only doing the latter.
If we get to this point, probably we re-evaluate the policy, and/or have a conversation about how as a community to relate to AI content. But, the problem is we need to distinguish “AI is generating high-quality alignment posts” from “AI is generating what looks like high quality alignment posts”, and we’re certainly at least going to be spending one generation of frontier-model where it’s only doing the latte
Why do we need to change anything? Just put all the AI content in AI blocks. If they are good, people will read them.
I will bet money there will turn out to be some consideration here that is novel to LessWrong and requires changing some kind of policy or feature somewhere, when we get to the point that AIs first look like they are generating high quality alignment content.
Like, at the very least, I think we will want to have some kind of conversation about “okay but is it actually generating high quality alignment content, or, is it sychophanting us? Are misaligned or slightly-misaligned AIs subtly manipulating us?” when we hit that point.
Different people will use different AIs in different ways. You’re potentially removing the incentive to figure out how to do good AI-assisted alignment work, since it’s known in advance that there is no payoff since no one will read the post simply because it is AI.
A high-quality alignment post is novel, important, anticipates and answers possible objections, doesn’t have major reasoning flaws, etc. etc. It feels very possible to construct AI agents to check for each of those. Helps sift through human-written posts too.
we’re certainly at least going to be spending one generation of frontier-model where it’s only doing the latter.
I think it’s a mistake to view generations in this discrete way. Academics on Substack are talking about how today’s tools seem sufficient to automate paper production:
But effective use requires investment: Aziz Sunderji describes building a ~200-line instruction file encoding his research workflow, judgment calls, and behavioral guardrails. This takes a skill.
You seem to be saying that if someone like Aziz adapted their 200-line instruction file for alignment research tomorrow, you’re not really interested. I would be doing the exact opposite, I would be begging him to adapt it.
If the LW team chooses to be on the lagging edge of this wave, as opposed to the leading edge, it could be the last mistake you ever make.
Think of it in EV terms.
Suppose there’s a 90% chance you’re right, and a 10% chance I’m right.
Suppose if I’m right, and you follow my advice, we get +1000% alignment insight production. From the linked article: “David Yanagizawa-Drott has taken things further still, launching a project to produce 1,000 economics papers with AI—not as a stunt, but as a stress test of what happens when the cost of generating research drops to near zero.”
If I’m wrong, and you follow my advice, we get 10% drag due to poor filtering of AI slop. (Bad posts are still downvoted, good posts are still upvoted, so overall it’s just a bit of drag.)
So the EV is 0.9 * 0.9 + 0.1 * 10 = 1.81, +81% research speed on expectation.
I tried to be quite generous for you with those numbers, but maybe you disagree.
Note also that if the experiment with allowing AI is going poorly, you can always roll back/change course/adapt. But since LW is essentially the main venue for alignment research, by taking a hardline stance against AI, I see you as basically deciding that alignment research will be one of the last areas to be revolutionized by AI paper production. I just don’t understand why you would make that choice!
I think prohibiting AI posts only makes sense if you believe we are basically on track to solve alignment and the important thing is to avoid rocking the boat. On the other hand, if you think x-risk is high, maybe it’s time to play to your outs.
A compromise approach would be to have a weekly or monthly competition for “best AI-generated alignment post, as judged by AI” (and restrict AI usage outside that competition). So you’re still tilting the field against AI posts but at least you’re no longer constraining yourself to the lagging edge as badly.
I think you would maybe have a different take if you went and looked at the kinds of things that are currently getting rejected for being LLM-written on the public moderation page.
Why did you believe that the posts you’re currently rejecting would affect my take? What are the points of contention in your view?
How many of the posts you’re rejecting do you believe were created as described in e.g. this post?
When I put more effort into prompting and did a “day-long” back-and-forth with Codex 5.3 (extra high), where I have a whole pipeline with R, LaTeX, custom skills files, and so on, the final output was of course much better; something that could probably land in an average 1st quartile social-science journal. From my perspective as an academic researcher, that’s wild. (I put day-long in quotes because there was a lot of free time in between my prompts and the agent working. I just had to check in once in a while.)
From a quick glance, the posts you’re complaining about looked like reformatted ChatGPT conversations which use less than 0.1% of the compute cycles of the hypothetical posts I’m arguing could have good insights.
It doesn’t make sense to have a single generic AI tag if some AI posts use 10,000x as much computation as others.
Yes, we did consider the possibility that there will be some period of time where investing large amounts of compute (and probably small amounts of human attention/direction/scaffolding/curation) can produce written artifacts that clear the relevant quality bar, while the default outputs of regular interactions with LLMs do not.
It doesn’t make sense to have a single generic AI tag if some AI posts use 10,000x as much computation as others.
This doesn’t follow. (To be clear, it’s not that I think this is obviously the correct form-factor to rule them all, forever.)
Why did you believe that the posts you’re currently rejecting would affect my take? What are the points of contention in your view?
See:
You seem to be saying that if someone like Aziz adapted their 200-line instruction file for alignment research tomorrow, you’re not really interested. I would be doing the exact opposite, I would be begging him to adapt it.
This was already a digression from the original thread; I have no idea why you brought it up in response to this comment. That said: yes, I would not be particularly interested; I expect most of the outputs to look exactly like the many low-quality, unmotivated submissions that we currently reject, because most of the value is not coming from the 200-line instruction file, it’s coming from the human judgment doing selection/pruning/etc. And current LLMs are not actually good enough to act as a reliable filter in a way that can be scalably automated, at least not with amounts of effort that make sense to invest given the steady march of progress.
most of the value is not coming from the 200-line instruction file, it’s coming from the human judgment doing selection/pruning/etc.
It’s still early days at this point. I don’t think anyone can be confident regarding the potential quality of selection/pruning/etc. possible given the current technology.
I think if I was going solely based on your comments, I would not believe that creation of a paper that could go in a 1st quartile social science journal is possible with current technology. Yet experts appear to believe it is, in fact, possible. I would encourage you to practice the virtue of lightness on this topic:
The third virtue is lightness. Let the winds of evidence blow you about as though you are a leaf, with no direction of your own. Beware lest you fight a rearguard retreat against the evidence, grudgingly conceding each foot of ground only when forced, feeling cheated. Surrender to the truth as quickly as you can. Do this the instant you realize what you are resisting, the instant you can see from which quarter the winds of evidence are blowing against you. Be faithless to your cause and betray it to a stronger enemy. If you regard evidence as a constraint and seek to free yourself, you sell yourself into the chains of your whims. For you cannot make a true map of a city by sitting in your bedroom with your eyes shut and drawing lines upon paper according to impulse. You must walk through the city and draw lines on paper that correspond to what you see. If, seeing the city unclearly, you think that you can shift a line just a little to the right, just a little to the left, according to your caprice, this is just the same mistake.
Regarding investment of effort:
And current LLMs are not actually good enough to act as a reliable filter in a way that can be scalably automated, at least not with amounts of effort that make sense to invest given the steady march of progress.
I think it could make a lot of sense for someone to invest that effort, if they want to create differential technological development in favor of alignment research. I expect your current course of action will effectively take that option off of the table in practice.
If you wait until it’s manifestly obvious that AI agents can help with research, you’re essentially creating differential technological development against AI alignment research, relative to other field that continue to publish solely on the basis of quality.
If you must discourage AI, at least you could use some sort of very sensitive leading indicator such as a weekly competition for best AI-generated alignment post, so AI posts are encouraged the instant the technology is good enough to help us with alignment. (Specifically, each week the winner of the previous week’s competition could be submitted masquerading as an ordinary LW post to see how it scores, so that way you’re collecting 1 data point per week about the state of the technology, while allowing a max of 1 AI “spam” post per week.)
But instead I would suggest you simply discourage posts if you can tell it’s AI, rather than using the honor system.
I think if I was going solely based on your comments, I would not believe that creation of a paper that could go in a 1st quartile social science journal is possible with current technology.
Doesn’t seem at all ruled out by what I said (though even then I do not think you could reliably do that without an expert human in the loop).
I expect your current course of action will effectively take that option off of the table in practice.
I don’t really understand how you came to this conclusion. There is no prohibition on using LLMs to do research that ends up published on LessWrong. I am sure that a large percentage of the work necessary for Ryan Greenblatt’s latest piece of empirical research published on LessWrong was done by LLMs. This is totally fine. I expect this ratio to become more skewed over time.
But instead I would suggest you simply discourage posts if you can tell it’s AI, rather than using the honor system.
I am confused by what you think we’re currently doing and what we’ll be doing in the future. We do already have both automated systems and human review for detecting whether posts contain non-trivial amounts of LLM-generated content in them. We are not going to stop using those systems the minute we introduce content blocks where LLM-generated content is permissible. We are moving to a more permissive regime with respect to LLM-generated content than the one we’re currently in.
Doesn’t seem at all ruled out by what I said (though even then I do not think you could reliably do that without an expert human in the loop).
If I do this with a human in the loop, it will still count as LLM-generated and you will require it to be tagged as such, correct?
I am confused by what you think we’re currently doing and what we’ll be doing in the future.
I think you are currently prohibiting LLM writing and you will soon require it to be tagged as such, which will still de facto stigmatize experimenting with automated alignment work and nudge the leading edge of “LLMs-for-research” elsewhere. You’re forcing people to jump through two hoops: (a) produce good automated alignment research, (b) convince people it’s worth a read even though it’s AI. I’m saying (a) should be enough. The skills to accomplish (a) and (b) may be very different btw. And the best people at (a) are not necessarily the people who are already ingroup such as Ryan Greenblatt.
If I do this with a human in the loop, it will still count as LLM-generated and you will require it to be tagged as such, correct?
Yes.
I think you are currently prohibiting LLM writing and you will soon require it to be tagged as such, which will still de facto stigmatize experimenting with automated alignment work and nudge the leading edge of “LLMs-for-research” elsewhere. You’re forcing people to jump through two hoops: (a) produce good automated alignment research, (b) convince people it’s worth a read even though it’s AI. I’m saying (a) should be enough. The skills to accomplish (a) and (b) may be very different btw. And the best people at (a) are not necessarily the people who are already ingroup such as Ryan Greenblatt.
Once more I implore you to look at the list of rejected posts on our moderation page and tell me that you think the signal to noise ratio would be improved by allowing unmarked LLM content on LessWrong.
I do understand your concern. But I think you are ignoring the enormous costs of adopting your policy now, while current LLMs are not able to produce automated alignment research[1]. And if we enter a regime where LLMs are able to get useful[2] alignment research done in a basically automated way, then frankly I think we will have entered a completely different regime where we will need to be rethinking quite a lot of how we relate to the world. (Also, >80% it comes from labs first, so frankly I am not that worried about the second hoop.)
At all, I think, but certainly not at sufficiently low cost that it’s dominated by the marginal cost of having a human expert verify their result and either do the write-up themselves or lean on their reputation to get the necessary eyeballs.
OK so from my perspective, this favors my point then? You seem to agree that guiding Claude Code to produce a top quality social science paper is fairly possible. You haven’t given any particular reason to believe that social science work is fundamentally different-in-kind from AI safety work—indeed I expect there is a fair amount of social science which could be relevant to AI safety! We both agree that many naive LLM posts are crap. So why would I invest weeks or months in prompting and guiding Claude Code for alignment research, if my post will get placed in the same bin as the the “naive LLM crap”? Can I pay some sort of karma fee to be placed in a different bin? Can users with at least 100 karma have some other sort of “trustworthy LLM use” credit account which gets “overdrafted” if I am found to repeatedly produce LLM crap?
I do understand your concern.
Thanks for saying this. Sorry if I’m repeating myself too much.
if we enter a regime where LLMs are able to get useful[2] alignment research done in a basically automated way, then frankly I think we will have entered a completely different regime where we will need to be rethinking quite a lot of how we relate to the world.
A lot of credible people are claiming that diffusion will be a rate-limiting step on LLM adoption. “The future is already here – it’s just not very evenly distributed.” In my view you are thinking too much in terms of binary “regimes” and too little in terms of seizing opportunities when they arise.
80% it comes from labs first, so frankly I am not that worried about the second hoop.
OK but the stuff I’m seeing online about automated production of academic papers is coming from academics, not labs. The value of automated alignment research seems high enough that we should encourage random academics to contribute to it, if they believe they have a contribution to make?
Once more I implore you to look at the list of rejected posts on our moderation page and tell me that you think the signal to noise ratio would be improved by allowing unmarked LLM content on LessWrong.
Are we talking about this page ? Based on a quick ctrl-f for “LLM Writing”, the current policy has been invoked manually around 12 times in the past 6 months, with the vast majority of invocations being automated. It looks like there were 507 accepted posts in February alone based on https://www.lesswrong.com/allPosts ? So currently, under 1% of posts are manual LLM-rejections? From my POV, up to 10% of LLM posts would be plausibly be worth it for VoI purposes.
The automated LLM detection is likely a valuable signal, but it’s quite compatible with my advocated policy of filtering based on content rather than filtering based on the honor system (as are manual rejections!)
Anyways my position is not simply “allow unmarked LLM content on LW and let it rip”, I’ve already elaborated a number of alternatives in this thread. I don’t want to become a broken clock so I’ll just encourage you once more to brainstorm and evaluate alternative approaches here. It seems you will have to solve this problem “for real” eventually regardless of what you do. If you’re going to deploy the planned change, I encourage you to see it as a stopgap and start thinking about what’s next right away. Best of luck.
If you wrote it yourself, it’s doesn’t go in the AI tag. If you used the AI to translate, it does. (The point of the tag is to filter for things people have written themselves and thus are a kind of testimony that AI writing is not).
If you used a service to do an actual strict translation of something you wrote yourself in another language, I’m not 100% sure which call we’ll make once that that AI tag exists. But, that’s the instruction we currently give people who submit ai-slop-feeling-things who say they wrote in a second language. (i.e. compare elsethread where Dagon uses an AI to summarize his point, and it loses key nuance, to what it might have said if he’d written it in spanish and asked for a strict translation)
(I went and tested having an LLM translate Dagon’s comment from english to spanish, and then [in another chat] to english again. It changed the ordering of a couple words but was basically the same)
I often paste a comment draft into an LLM chat and ask it to flag issues with spelling, grammar, or phrasing. (I did it with the grandparent of this comment for example, and accepted a suggestion or two.) If I choose to adopt the LLM’s suggested corrections or rephrasings, do I now have to delimit them in the AI tag?
How about use of AI for anonymization purposes such as the recent Possessed Machines essay?
Forcing people to disclose grammarly usage feels like petty authoritarianism. I still think you should focus less on usage of AI and more on the actual content. Does it feel like AI slop? Is it not “testimony” that the author is willing to stand by? Etc.
If someone is able to use AI to generate a large number of high-quality alignment posts that don’t read like AI slop, I would call that mission fucking accomplished. I worry this AI tag will make it harder to notice if this (very valuable and interesting!) scenario happens, since people will get in the habit of mentally filtering out AI-tag content. Therefore, I would e.g. advocate tagging on the basis of “feels like AI slop” over “was generated with AI assistance”.
I’m not actually sure about all our exact policies (@habryka may have clearer takes), but, I think if the AI wasn’t responsible for choosing any of the phrasing, just cleaning up grammatical stuff, then it doesn’nt need to be in the AI block.
(I expect the default rule to approximately be “does our AI detector detect any AI in a given paragraph? If so, the post is probably flagged/delisted barring special circumstances. )
Yes, this would be a straightforward case for putting most paragraph blocks in the AI tag.
If we get to this point, probably we re-evaluate the policy, and/or have a conversation about how as a community to relate to AI content. But, the problem is we need to distinguish “AI is generating high-quality alignment posts” from “AI is generating what looks like high quality alignment posts”, and we’re certainly at least going to be spending one generation of frontier-model where it’s only doing the latter.
Why do we need to change anything? Just put all the AI content in AI blocks. If they are good, people will read them.
I will bet money there will turn out to be some consideration here that is novel to LessWrong and requires changing some kind of policy or feature somewhere, when we get to the point that AIs first look like they are generating high quality alignment content.
Like, at the very least, I think we will want to have some kind of conversation about “okay but is it actually generating high quality alignment content, or, is it sychophanting us? Are misaligned or slightly-misaligned AIs subtly manipulating us?” when we hit that point.
Different people will use different AIs in different ways. You’re potentially removing the incentive to figure out how to do good AI-assisted alignment work, since it’s known in advance that there is no payoff since no one will read the post simply because it is AI.
A high-quality alignment post is novel, important, anticipates and answers possible objections, doesn’t have major reasoning flaws, etc. etc. It feels very possible to construct AI agents to check for each of those. Helps sift through human-written posts too.
I think it’s a mistake to view generations in this discrete way. Academics on Substack are talking about how today’s tools seem sufficient to automate paper production:
You seem to be saying that if someone like Aziz adapted their 200-line instruction file for alignment research tomorrow, you’re not really interested. I would be doing the exact opposite, I would be begging him to adapt it.
If the LW team chooses to be on the lagging edge of this wave, as opposed to the leading edge, it could be the last mistake you ever make.
Think of it in EV terms.
Suppose there’s a 90% chance you’re right, and a 10% chance I’m right.
Suppose if I’m right, and you follow my advice, we get +1000% alignment insight production. From the linked article: “David Yanagizawa-Drott has taken things further still, launching a project to produce 1,000 economics papers with AI—not as a stunt, but as a stress test of what happens when the cost of generating research drops to near zero.”
If I’m wrong, and you follow my advice, we get 10% drag due to poor filtering of AI slop. (Bad posts are still downvoted, good posts are still upvoted, so overall it’s just a bit of drag.)
So the EV is 0.9 * 0.9 + 0.1 * 10 = 1.81, +81% research speed on expectation.
I tried to be quite generous for you with those numbers, but maybe you disagree.
Note also that if the experiment with allowing AI is going poorly, you can always roll back/change course/adapt. But since LW is essentially the main venue for alignment research, by taking a hardline stance against AI, I see you as basically deciding that alignment research will be one of the last areas to be revolutionized by AI paper production. I just don’t understand why you would make that choice!
I think prohibiting AI posts only makes sense if you believe we are basically on track to solve alignment and the important thing is to avoid rocking the boat. On the other hand, if you think x-risk is high, maybe it’s time to play to your outs.
A compromise approach would be to have a weekly or monthly competition for “best AI-generated alignment post, as judged by AI” (and restrict AI usage outside that competition). So you’re still tilting the field against AI posts but at least you’re no longer constraining yourself to the lagging edge as badly.
I think you would maybe have a different take if you went and looked at the kinds of things that are currently getting rejected for being LLM-written on the public moderation page.
You’re confident that you couldn’t build an AI agent to flag posts with quality issues? I can help you brainstorm in DMs if you want.
I don’t understand how this is responsive to any of the previous points of contention.
Why did you believe that the posts you’re currently rejecting would affect my take? What are the points of contention in your view?
How many of the posts you’re rejecting do you believe were created as described in e.g. this post?
https://statsandsociety.substack.com/p/you-should-absolutely-be-freaking
From a quick glance, the posts you’re complaining about looked like reformatted ChatGPT conversations which use less than 0.1% of the compute cycles of the hypothetical posts I’m arguing could have good insights.
It doesn’t make sense to have a single generic AI tag if some AI posts use 10,000x as much computation as others.
Yes, we did consider the possibility that there will be some period of time where investing large amounts of compute (and probably small amounts of human attention/direction/scaffolding/curation) can produce written artifacts that clear the relevant quality bar, while the default outputs of regular interactions with LLMs do not.
This doesn’t follow. (To be clear, it’s not that I think this is obviously the correct form-factor to rule them all, forever.)
See:
This was already a digression from the original thread; I have no idea why you brought it up in response to this comment. That said: yes, I would not be particularly interested; I expect most of the outputs to look exactly like the many low-quality, unmotivated submissions that we currently reject, because most of the value is not coming from the 200-line instruction file, it’s coming from the human judgment doing selection/pruning/etc. And current LLMs are not actually good enough to act as a reliable filter in a way that can be scalably automated, at least not with amounts of effort that make sense to invest given the steady march of progress.
It’s still early days at this point. I don’t think anyone can be confident regarding the potential quality of selection/pruning/etc. possible given the current technology.
I think if I was going solely based on your comments, I would not believe that creation of a paper that could go in a 1st quartile social science journal is possible with current technology. Yet experts appear to believe it is, in fact, possible. I would encourage you to practice the virtue of lightness on this topic:
Regarding investment of effort:
I think it could make a lot of sense for someone to invest that effort, if they want to create differential technological development in favor of alignment research. I expect your current course of action will effectively take that option off of the table in practice.
If you wait until it’s manifestly obvious that AI agents can help with research, you’re essentially creating differential technological development against AI alignment research, relative to other field that continue to publish solely on the basis of quality.
If you must discourage AI, at least you could use some sort of very sensitive leading indicator such as a weekly competition for best AI-generated alignment post, so AI posts are encouraged the instant the technology is good enough to help us with alignment. (Specifically, each week the winner of the previous week’s competition could be submitted masquerading as an ordinary LW post to see how it scores, so that way you’re collecting 1 data point per week about the state of the technology, while allowing a max of 1 AI “spam” post per week.)
But instead I would suggest you simply discourage posts if you can tell it’s AI, rather than using the honor system.
Doesn’t seem at all ruled out by what I said (though even then I do not think you could reliably do that without an expert human in the loop).
I don’t really understand how you came to this conclusion. There is no prohibition on using LLMs to do research that ends up published on LessWrong. I am sure that a large percentage of the work necessary for Ryan Greenblatt’s latest piece of empirical research published on LessWrong was done by LLMs. This is totally fine. I expect this ratio to become more skewed over time.
I am confused by what you think we’re currently doing and what we’ll be doing in the future. We do already have both automated systems and human review for detecting whether posts contain non-trivial amounts of LLM-generated content in them. We are not going to stop using those systems the minute we introduce content blocks where LLM-generated content is permissible. We are moving to a more permissive regime with respect to LLM-generated content than the one we’re currently in.
If I do this with a human in the loop, it will still count as LLM-generated and you will require it to be tagged as such, correct?
I think you are currently prohibiting LLM writing and you will soon require it to be tagged as such, which will still de facto stigmatize experimenting with automated alignment work and nudge the leading edge of “LLMs-for-research” elsewhere. You’re forcing people to jump through two hoops: (a) produce good automated alignment research, (b) convince people it’s worth a read even though it’s AI. I’m saying (a) should be enough. The skills to accomplish (a) and (b) may be very different btw. And the best people at (a) are not necessarily the people who are already ingroup such as Ryan Greenblatt.
Yes.
Once more I implore you to look at the list of rejected posts on our moderation page and tell me that you think the signal to noise ratio would be improved by allowing unmarked LLM content on LessWrong.
I do understand your concern. But I think you are ignoring the enormous costs of adopting your policy now, while current LLMs are not able to produce automated alignment research[1]. And if we enter a regime where LLMs are able to get useful[2] alignment research done in a basically automated way, then frankly I think we will have entered a completely different regime where we will need to be rethinking quite a lot of how we relate to the world. (Also, >80% it comes from labs first, so frankly I am not that worried about the second hoop.)
At all, I think, but certainly not at sufficiently low cost that it’s dominated by the marginal cost of having a human expert verify their result and either do the write-up themselves or lean on their reputation to get the necessary eyeballs.
By my standards.
OK so from my perspective, this favors my point then? You seem to agree that guiding Claude Code to produce a top quality social science paper is fairly possible. You haven’t given any particular reason to believe that social science work is fundamentally different-in-kind from AI safety work—indeed I expect there is a fair amount of social science which could be relevant to AI safety! We both agree that many naive LLM posts are crap. So why would I invest weeks or months in prompting and guiding Claude Code for alignment research, if my post will get placed in the same bin as the the “naive LLM crap”? Can I pay some sort of karma fee to be placed in a different bin? Can users with at least 100 karma have some other sort of “trustworthy LLM use” credit account which gets “overdrafted” if I am found to repeatedly produce LLM crap?
Thanks for saying this. Sorry if I’m repeating myself too much.
A lot of credible people are claiming that diffusion will be a rate-limiting step on LLM adoption. “The future is already here – it’s just not very evenly distributed.” In my view you are thinking too much in terms of binary “regimes” and too little in terms of seizing opportunities when they arise.
OK but the stuff I’m seeing online about automated production of academic papers is coming from academics, not labs. The value of automated alignment research seems high enough that we should encourage random academics to contribute to it, if they believe they have a contribution to make?
Are we talking about this page ? Based on a quick ctrl-f for “LLM Writing”, the current policy has been invoked manually around 12 times in the past 6 months, with the vast majority of invocations being automated. It looks like there were 507 accepted posts in February alone based on https://www.lesswrong.com/allPosts ? So currently, under 1% of posts are manual LLM-rejections? From my POV, up to 10% of LLM posts would be plausibly be worth it for VoI purposes.
The automated LLM detection is likely a valuable signal, but it’s quite compatible with my advocated policy of filtering based on content rather than filtering based on the honor system (as are manual rejections!)
N hfre fbcuvfgvpngrq rabhtu gb cebqhpr n gbc dhnyvgl cncre jvgu na YYZ pna yvxryl nyfb qrsrng YYZ qrgrpgvba. uggcf://kxpq.pbz/125/ Qba’g rapbhentr crbcyr gb qrsrng YYZ qrgrpgvba, whfg znxr vg fb gung gur yrggre bs gur cbyvpl fnlf vg vf YYZ qrgrpgvba juvpu znggref (abg YYZ hfntr cre fr), fb vs na ubarfg crefba svaqf n jnl gb nhgbzngr uhzna-vaqvfgvathvfunoyr nyvtazrag erfrnepu, gurl pna or erjneqrq jvgu n ybg bs xnezn sbe qbvat fb. (Va bgure jbeqf: qba’g arprffnevyl nfx hfref gb hfr gur ubabe flfgrz va ynoryvat YYZ-bevtvangrq pbagrag; fvzcyl hfr lbhe rkvfgvat nhgbzngrq qrgrpgbe gb ynory cbfgf sbe ernqref, cyhf ghar vg bire gvzr gb cevbevgvmr “penc qrgrpgvba” bire “YYZ qrgrpgvba” cre fr.) (V’z znatyvat guvf cnentencu jvgu ebg13 gb znxr vg uneqre sbe qhzore YYZf gb svaq, gb uryc “novyvgl gb qrsrng YYZ qrgrpgvba” fgnl hfrshy nf n dhnyvgl fvtany.)
Anyways my position is not simply “allow unmarked LLM content on LW and let it rip”, I’ve already elaborated a number of alternatives in this thread. I don’t want to become a broken clock so I’ll just encourage you once more to brainstorm and evaluate alternative approaches here. It seems you will have to solve this problem “for real” eventually regardless of what you do. If you’re going to deploy the planned change, I encourage you to see it as a stopgap and start thinking about what’s next right away. Best of luck.