Last week, I was working with a paper that has over 100 upvotes on LessWrong and discovered it is mostly false but gives nice-looking statistics only because of a very specific evaluation setup.
I don’t feel comfortable. I understand why not naming the post somewhat undermines what I am saying, but here’s the issue:
I think it would be in bad taste to publicly name the work without giving a detailed explanation.
Giving a detailed explanation is nontrivial and would require me to rerun the code, reload the models, perform proper evaluations, etc. I predict doing this fairly and properly would take ~10 hours but I’m 98% confident[1] that I would stand by my original claim.
I don’t currently have the time to do this but with a small amount of funding, I would be willing to do this kind of work full time after I graduate.
I’m happy to chip in $500 for a replication. $250 if it seems post-facto to be a good-faith attempt, and $250 if it indeed does not replicate (as determined by some third party, perhaps Greenblatt or kave rennedy). Feel free to his the plus react if you also would chip in this money, or comment with a different amount.
I think it is awesome that people are willing to do this kind of thing! This is what I love about LW. There is a 85% chance I would be willing to take you up on this over my winter break. I will DM you when the time comes along.
Not too concerned about who the judge is as long as they agree to publicly give their decision and their reasoning (so that it can be more nuanced than simply “the paper was entirely wrong” or “the paper is not problematic in any way”).
If anyone else is curious about helping with this or is interested in replicating other safety papers you can contact me at zroe@uchicago.edu.
To clarify, I would be 100% willing to do it for only what @Ben Pace offered and if I don’t have time I would happily let someone else who emails me try.
Extremely grateful for the offer because I don’t think it would counterfactually get done! Also because I’m a college kid with barely any spending money :)
Name and shame, please?
I don’t feel comfortable. I understand why not naming the post somewhat undermines what I am saying, but here’s the issue:
I think it would be in bad taste to publicly name the work without giving a detailed explanation.
Giving a detailed explanation is nontrivial and would require me to rerun the code, reload the models, perform proper evaluations, etc. I predict doing this fairly and properly would take ~10 hours but I’m 98% confident[1] that I would stand by my original claim.
I don’t currently have the time to do this but with a small amount of funding, I would be willing to do this kind of work full time after I graduate.
In the case where I am wrong, there are plenty of other examples that are similar so I’m not concerned that replications aren’t a good use of time.
I’m happy to chip in $500 for a replication. $250 if it seems post-facto to be a good-faith attempt, and $250 if it indeed does not replicate (as determined by some third party, perhaps Greenblatt or kave rennedy). Feel free to his the plus react if you also would chip in this money, or comment with a different amount.
I think it is awesome that people are willing to do this kind of thing! This is what I love about LW. There is a 85% chance I would be willing to take you up on this over my winter break. I will DM you when the time comes along.
Not too concerned about who the judge is as long as they agree to publicly give their decision and their reasoning (so that it can be more nuanced than simply “the paper was entirely wrong” or “the paper is not problematic in any way”).
If anyone else is curious about helping with this or is interested in replicating other safety papers you can contact me at zroe@uchicago.edu.
To clarify, I would be 100% willing to do it for only what @Ben Pace offered and if I don’t have time I would happily let someone else who emails me try.
Extremely grateful for the offer because I don’t think it would counterfactually get done! Also because I’m a college kid with barely any spending money :)
(My plus is conditional on me not being the adjudicator)