TurnTrout comments on ryan_greenblatt’s Shortform

TurnTrout 17 Jun 2025 20:05 UTC
LW: 33 AF: 17
8
AF
Thanks for taking these steps!
Context: I was pretty worried about self-fulfilling misalignment data poisoning (https://turntrout.com/self-fulfilling-misalignment) after reading some of the Claude 4 model card. I talked with @Monte M and then Ryan about possible steps here & encouraged action on the steps besides the canary string. I’ve considered writing up a “here are some steps to take” guide but honestly I’m not an expert.
Probably there’s existing work on how to host data so that AI won’t train on it.
If not: I think it’d be great for someone to make a template website for e.g. signing up with CloudFlare. Maybe a repo that has the skeleton of a dataset-hosting website (with robots.txt & ToS & canary string included) for people who want to host misalignment data more responsibly. Ideally those people would just have to
1. Sign up with e.g. Cloudflare using a linked guide,
2. Clone the repo,
3. Fill in some information and host their dataset.
After all, someone who has finally finished their project and then discovers that they’re supposed to traverse some arduous process is likely to just avoid it.
What links here?
- We Built a Tool to Protect Your Dataset From Simple Scrapers by TurnTrout (25 Jul 2025 5:44 UTC; 60 points)
- TurnTrout 18 Jun 2025 16:38 UTC
  LW: 31 AF: 15
  7
  AF Parent
  I think that “make it easy to responsibly share a dataset” would be a highly impactful project. Anthropic’s Claude 4 model card already argues that dataset leakage hurt Claude 4′s alignment (before mitigations).
  For my part, I’ll put out a $500 bounty on someone completing this project and doing a good job of it (as judged by me / whomever I consult). I’d also tweet it out and talk about how great it is that [person] completed the project :) I don’t check LW actively, so if you pursue this, please email alex@turntrout.com.
  EDIT: Thanks to my coworker Anna Wang , the bounty is doubled to $1,000! Completion criterion is:
  An unfamiliar researcher can follow the instructions and have their dataset responsibly uploaded within one hour
  Please check proposed solutions with dummy datasets and scrapers
  - ryan_greenblatt 18 Jun 2025 18:46 UTC
    LW: 4 AF: 4
    0
    AF Parent
    Something tricky about this is that researchers might want to display their data/transcripts in a particular way. So, the guide should ideally support this sort of thing. Not sure how this would interact with the 1 hour criteria.