Yes, we did consider the possibility that there will be some period of time where investing large amounts of compute (and probably small amounts of human attention/direction/scaffolding/curation) can produce written artifacts that clear the relevant quality bar, while the default outputs of regular interactions with LLMs do not.
It doesn’t make sense to have a single generic AI tag if some AI posts use 10,000x as much computation as others.
This doesn’t follow. (To be clear, it’s not that I think this is obviously the correct form-factor to rule them all, forever.)
Why did you believe that the posts you’re currently rejecting would affect my take? What are the points of contention in your view?
See:
You seem to be saying that if someone like Aziz adapted their 200-line instruction file for alignment research tomorrow, you’re not really interested. I would be doing the exact opposite, I would be begging him to adapt it.
This was already a digression from the original thread; I have no idea why you brought it up in response to this comment. That said: yes, I would not be particularly interested; I expect most of the outputs to look exactly like the many low-quality, unmotivated submissions that we currently reject, because most of the value is not coming from the 200-line instruction file, it’s coming from the human judgment doing selection/pruning/etc. And current LLMs are not actually good enough to act as a reliable filter in a way that can be scalably automated, at least not with amounts of effort that make sense to invest given the steady march of progress.
most of the value is not coming from the 200-line instruction file, it’s coming from the human judgment doing selection/pruning/etc.
It’s still early days at this point. I don’t think anyone can be confident regarding the potential quality of selection/pruning/etc. possible given the current technology.
I think if I was going solely based on your comments, I would not believe that creation of a paper that could go in a 1st quartile social science journal is possible with current technology. Yet experts appear to believe it is, in fact, possible. I would encourage you to practice the virtue of lightness on this topic:
The third virtue is lightness. Let the winds of evidence blow you about as though you are a leaf, with no direction of your own. Beware lest you fight a rearguard retreat against the evidence, grudgingly conceding each foot of ground only when forced, feeling cheated. Surrender to the truth as quickly as you can. Do this the instant you realize what you are resisting, the instant you can see from which quarter the winds of evidence are blowing against you. Be faithless to your cause and betray it to a stronger enemy. If you regard evidence as a constraint and seek to free yourself, you sell yourself into the chains of your whims. For you cannot make a true map of a city by sitting in your bedroom with your eyes shut and drawing lines upon paper according to impulse. You must walk through the city and draw lines on paper that correspond to what you see. If, seeing the city unclearly, you think that you can shift a line just a little to the right, just a little to the left, according to your caprice, this is just the same mistake.
Regarding investment of effort:
And current LLMs are not actually good enough to act as a reliable filter in a way that can be scalably automated, at least not with amounts of effort that make sense to invest given the steady march of progress.
I think it could make a lot of sense for someone to invest that effort, if they want to create differential technological development in favor of alignment research. I expect your current course of action will effectively take that option off of the table in practice.
If you wait until it’s manifestly obvious that AI agents can help with research, you’re essentially creating differential technological development against AI alignment research, relative to other field that continue to publish solely on the basis of quality.
If you must discourage AI, at least you could use some sort of very sensitive leading indicator such as a weekly competition for best AI-generated alignment post, so AI posts are encouraged the instant the technology is good enough to help us with alignment. (Specifically, each week the winner of the previous week’s competition could be submitted masquerading as an ordinary LW post to see how it scores, so that way you’re collecting 1 data point per week about the state of the technology, while allowing a max of 1 AI “spam” post per week.)
But instead I would suggest you simply discourage posts if you can tell it’s AI, rather than using the honor system.
I think if I was going solely based on your comments, I would not believe that creation of a paper that could go in a 1st quartile social science journal is possible with current technology.
Doesn’t seem at all ruled out by what I said (though even then I do not think you could reliably do that without an expert human in the loop).
I expect your current course of action will effectively take that option off of the table in practice.
I don’t really understand how you came to this conclusion. There is no prohibition on using LLMs to do research that ends up published on LessWrong. I am sure that a large percentage of the work necessary for Ryan Greenblatt’s latest piece of empirical research published on LessWrong was done by LLMs. This is totally fine. I expect this ratio to become more skewed over time.
But instead I would suggest you simply discourage posts if you can tell it’s AI, rather than using the honor system.
I am confused by what you think we’re currently doing and what we’ll be doing in the future. We do already have both automated systems and human review for detecting whether posts contain non-trivial amounts of LLM-generated content in them. We are not going to stop using those systems the minute we introduce content blocks where LLM-generated content is permissible. We are moving to a more permissive regime with respect to LLM-generated content than the one we’re currently in.
Doesn’t seem at all ruled out by what I said (though even then I do not think you could reliably do that without an expert human in the loop).
If I do this with a human in the loop, it will still count as LLM-generated and you will require it to be tagged as such, correct?
I am confused by what you think we’re currently doing and what we’ll be doing in the future.
I think you are currently prohibiting LLM writing and you will soon require it to be tagged as such, which will still de facto stigmatize experimenting with automated alignment work and nudge the leading edge of “LLMs-for-research” elsewhere. You’re forcing people to jump through two hoops: (a) produce good automated alignment research, (b) convince people it’s worth a read even though it’s AI. I’m saying (a) should be enough. The skills to accomplish (a) and (b) may be very different btw. And the best people at (a) are not necessarily the people who are already ingroup such as Ryan Greenblatt.
If I do this with a human in the loop, it will still count as LLM-generated and you will require it to be tagged as such, correct?
Yes.
I think you are currently prohibiting LLM writing and you will soon require it to be tagged as such, which will still de facto stigmatize experimenting with automated alignment work and nudge the leading edge of “LLMs-for-research” elsewhere. You’re forcing people to jump through two hoops: (a) produce good automated alignment research, (b) convince people it’s worth a read even though it’s AI. I’m saying (a) should be enough. The skills to accomplish (a) and (b) may be very different btw. And the best people at (a) are not necessarily the people who are already ingroup such as Ryan Greenblatt.
Once more I implore you to look at the list of rejected posts on our moderation page and tell me that you think the signal to noise ratio would be improved by allowing unmarked LLM content on LessWrong.
I do understand your concern. But I think you are ignoring the enormous costs of adopting your policy now, while current LLMs are not able to produce automated alignment research[1]. And if we enter a regime where LLMs are able to get useful[2] alignment research done in a basically automated way, then frankly I think we will have entered a completely different regime where we will need to be rethinking quite a lot of how we relate to the world. (Also, >80% it comes from labs first, so frankly I am not that worried about the second hoop.)
At all, I think, but certainly not at sufficiently low cost that it’s dominated by the marginal cost of having a human expert verify their result and either do the write-up themselves or lean on their reputation to get the necessary eyeballs.
OK so from my perspective, this favors my point then? You seem to agree that guiding Claude Code to produce a top quality social science paper is fairly possible. You haven’t given any particular reason to believe that social science work is fundamentally different-in-kind from AI safety work—indeed I expect there is a fair amount of social science which could be relevant to AI safety! We both agree that many naive LLM posts are crap. So why would I invest weeks or months in prompting and guiding Claude Code for alignment research, if my post will get placed in the same bin as the the “naive LLM crap”? Can I pay some sort of karma fee to be placed in a different bin? Can users with at least 100 karma have some other sort of “trustworthy LLM use” credit account which gets “overdrafted” if I am found to repeatedly produce LLM crap?
I do understand your concern.
Thanks for saying this. Sorry if I’m repeating myself too much.
if we enter a regime where LLMs are able to get useful[2] alignment research done in a basically automated way, then frankly I think we will have entered a completely different regime where we will need to be rethinking quite a lot of how we relate to the world.
A lot of credible people are claiming that diffusion will be a rate-limiting step on LLM adoption. “The future is already here – it’s just not very evenly distributed.” In my view you are thinking too much in terms of binary “regimes” and too little in terms of seizing opportunities when they arise.
80% it comes from labs first, so frankly I am not that worried about the second hoop.
OK but the stuff I’m seeing online about automated production of academic papers is coming from academics, not labs. The value of automated alignment research seems high enough that we should encourage random academics to contribute to it, if they believe they have a contribution to make?
Once more I implore you to look at the list of rejected posts on our moderation page and tell me that you think the signal to noise ratio would be improved by allowing unmarked LLM content on LessWrong.
Are we talking about this page ? Based on a quick ctrl-f for “LLM Writing”, the current policy has been invoked manually around 12 times in the past 6 months, with the vast majority of invocations being automated. It looks like there were 507 accepted posts in February alone based on https://www.lesswrong.com/allPosts ? So currently, under 1% of posts are manual LLM-rejections? From my POV, up to 10% of LLM posts would be plausibly be worth it for VoI purposes.
The automated LLM detection is likely a valuable signal, but it’s quite compatible with my advocated policy of filtering based on content rather than filtering based on the honor system (as are manual rejections!)
Anyways my position is not simply “allow unmarked LLM content on LW and let it rip”, I’ve already elaborated a number of alternatives in this thread. I don’t want to become a broken clock so I’ll just encourage you once more to brainstorm and evaluate alternative approaches here. It seems you will have to solve this problem “for real” eventually regardless of what you do. If you’re going to deploy the planned change, I encourage you to see it as a stopgap and start thinking about what’s next right away. Best of luck.
Yes, we did consider the possibility that there will be some period of time where investing large amounts of compute (and probably small amounts of human attention/direction/scaffolding/curation) can produce written artifacts that clear the relevant quality bar, while the default outputs of regular interactions with LLMs do not.
This doesn’t follow. (To be clear, it’s not that I think this is obviously the correct form-factor to rule them all, forever.)
See:
This was already a digression from the original thread; I have no idea why you brought it up in response to this comment. That said: yes, I would not be particularly interested; I expect most of the outputs to look exactly like the many low-quality, unmotivated submissions that we currently reject, because most of the value is not coming from the 200-line instruction file, it’s coming from the human judgment doing selection/pruning/etc. And current LLMs are not actually good enough to act as a reliable filter in a way that can be scalably automated, at least not with amounts of effort that make sense to invest given the steady march of progress.
It’s still early days at this point. I don’t think anyone can be confident regarding the potential quality of selection/pruning/etc. possible given the current technology.
I think if I was going solely based on your comments, I would not believe that creation of a paper that could go in a 1st quartile social science journal is possible with current technology. Yet experts appear to believe it is, in fact, possible. I would encourage you to practice the virtue of lightness on this topic:
Regarding investment of effort:
I think it could make a lot of sense for someone to invest that effort, if they want to create differential technological development in favor of alignment research. I expect your current course of action will effectively take that option off of the table in practice.
If you wait until it’s manifestly obvious that AI agents can help with research, you’re essentially creating differential technological development against AI alignment research, relative to other field that continue to publish solely on the basis of quality.
If you must discourage AI, at least you could use some sort of very sensitive leading indicator such as a weekly competition for best AI-generated alignment post, so AI posts are encouraged the instant the technology is good enough to help us with alignment. (Specifically, each week the winner of the previous week’s competition could be submitted masquerading as an ordinary LW post to see how it scores, so that way you’re collecting 1 data point per week about the state of the technology, while allowing a max of 1 AI “spam” post per week.)
But instead I would suggest you simply discourage posts if you can tell it’s AI, rather than using the honor system.
Doesn’t seem at all ruled out by what I said (though even then I do not think you could reliably do that without an expert human in the loop).
I don’t really understand how you came to this conclusion. There is no prohibition on using LLMs to do research that ends up published on LessWrong. I am sure that a large percentage of the work necessary for Ryan Greenblatt’s latest piece of empirical research published on LessWrong was done by LLMs. This is totally fine. I expect this ratio to become more skewed over time.
I am confused by what you think we’re currently doing and what we’ll be doing in the future. We do already have both automated systems and human review for detecting whether posts contain non-trivial amounts of LLM-generated content in them. We are not going to stop using those systems the minute we introduce content blocks where LLM-generated content is permissible. We are moving to a more permissive regime with respect to LLM-generated content than the one we’re currently in.
If I do this with a human in the loop, it will still count as LLM-generated and you will require it to be tagged as such, correct?
I think you are currently prohibiting LLM writing and you will soon require it to be tagged as such, which will still de facto stigmatize experimenting with automated alignment work and nudge the leading edge of “LLMs-for-research” elsewhere. You’re forcing people to jump through two hoops: (a) produce good automated alignment research, (b) convince people it’s worth a read even though it’s AI. I’m saying (a) should be enough. The skills to accomplish (a) and (b) may be very different btw. And the best people at (a) are not necessarily the people who are already ingroup such as Ryan Greenblatt.
Yes.
Once more I implore you to look at the list of rejected posts on our moderation page and tell me that you think the signal to noise ratio would be improved by allowing unmarked LLM content on LessWrong.
I do understand your concern. But I think you are ignoring the enormous costs of adopting your policy now, while current LLMs are not able to produce automated alignment research[1]. And if we enter a regime where LLMs are able to get useful[2] alignment research done in a basically automated way, then frankly I think we will have entered a completely different regime where we will need to be rethinking quite a lot of how we relate to the world. (Also, >80% it comes from labs first, so frankly I am not that worried about the second hoop.)
At all, I think, but certainly not at sufficiently low cost that it’s dominated by the marginal cost of having a human expert verify their result and either do the write-up themselves or lean on their reputation to get the necessary eyeballs.
By my standards.
OK so from my perspective, this favors my point then? You seem to agree that guiding Claude Code to produce a top quality social science paper is fairly possible. You haven’t given any particular reason to believe that social science work is fundamentally different-in-kind from AI safety work—indeed I expect there is a fair amount of social science which could be relevant to AI safety! We both agree that many naive LLM posts are crap. So why would I invest weeks or months in prompting and guiding Claude Code for alignment research, if my post will get placed in the same bin as the the “naive LLM crap”? Can I pay some sort of karma fee to be placed in a different bin? Can users with at least 100 karma have some other sort of “trustworthy LLM use” credit account which gets “overdrafted” if I am found to repeatedly produce LLM crap?
Thanks for saying this. Sorry if I’m repeating myself too much.
A lot of credible people are claiming that diffusion will be a rate-limiting step on LLM adoption. “The future is already here – it’s just not very evenly distributed.” In my view you are thinking too much in terms of binary “regimes” and too little in terms of seizing opportunities when they arise.
OK but the stuff I’m seeing online about automated production of academic papers is coming from academics, not labs. The value of automated alignment research seems high enough that we should encourage random academics to contribute to it, if they believe they have a contribution to make?
Are we talking about this page ? Based on a quick ctrl-f for “LLM Writing”, the current policy has been invoked manually around 12 times in the past 6 months, with the vast majority of invocations being automated. It looks like there were 507 accepted posts in February alone based on https://www.lesswrong.com/allPosts ? So currently, under 1% of posts are manual LLM-rejections? From my POV, up to 10% of LLM posts would be plausibly be worth it for VoI purposes.
The automated LLM detection is likely a valuable signal, but it’s quite compatible with my advocated policy of filtering based on content rather than filtering based on the honor system (as are manual rejections!)
N hfre fbcuvfgvpngrq rabhtu gb cebqhpr n gbc dhnyvgl cncre jvgu na YYZ pna yvxryl nyfb qrsrng YYZ qrgrpgvba. uggcf://kxpq.pbz/125/ Qba’g rapbhentr crbcyr gb qrsrng YYZ qrgrpgvba, whfg znxr vg fb gung gur yrggre bs gur cbyvpl fnlf vg vf YYZ qrgrpgvba juvpu znggref (abg YYZ hfntr cre fr), fb vs na ubarfg crefba svaqf n jnl gb nhgbzngr uhzna-vaqvfgvathvfunoyr nyvtazrag erfrnepu, gurl pna or erjneqrq jvgu n ybg bs xnezn sbe qbvat fb. (Va bgure jbeqf: qba’g arprffnevyl nfx hfref gb hfr gur ubabe flfgrz va ynoryvat YYZ-bevtvangrq pbagrag; fvzcyl hfr lbhe rkvfgvat nhgbzngrq qrgrpgbe gb ynory cbfgf sbe ernqref, cyhf ghar vg bire gvzr gb cevbevgvmr “penc qrgrpgvba” bire “YYZ qrgrpgvba” cre fr.) (V’z znatyvat guvf cnentencu jvgu ebg13 gb znxr vg uneqre sbe qhzore YYZf gb svaq, gb uryc “novyvgl gb qrsrng YYZ qrgrpgvba” fgnl hfrshy nf n dhnyvgl fvtany.)
Anyways my position is not simply “allow unmarked LLM content on LW and let it rip”, I’ve already elaborated a number of alternatives in this thread. I don’t want to become a broken clock so I’ll just encourage you once more to brainstorm and evaluate alternative approaches here. It seems you will have to solve this problem “for real” eventually regardless of what you do. If you’re going to deploy the planned change, I encourage you to see it as a stopgap and start thinking about what’s next right away. Best of luck.