ACCount comments on Model Organisms for Emergent Misalignment

ACCount 16 Jun 2025 23:44 UTC
15 points
−2
We open source all code, datasets, and finetuned models on GitHub and HuggingFace.
Considering the purpose of the datasets, I think putting them up as readily downloadable plaintext is terribly unwise.
Who knows what scrapers are going to come across them?
IMO, datasets like this should be obfuscated, i.e. by being compressed with gzip, so that no simple crawler can get to them by an accident. I don’t think harm is likely, but why take chances?
- gabrielrecc 17 Jun 2025 8:41 UTC
  3 points
  0
  Parent
  Agree that it would be better not to have them up as readily downloadable plaintext, and it might even be worth going a step farther and encrypting the gzip or zip file, and making the password readily available in the repo’s README. This is what David Rein did with GPQA and what we did with FindTheFlaws. Might be overkill, but if I were working for a frontier lab building scrapers to pull in as much data from the web as possible, I’d certainly have those scrapers unzip any unencrypted gzips they came across, and I assume their scrapers are probably doing the same.
  
  PS to the original posters: seems like nice work! Am planning to read the full paper and ask a more substantive follow-up question when I get the chance
- Matan Shtepel 2 Sep 2025 21:42 UTC
  1 point
  0
  Parent
  May I plug https://www.lesswrong.com/posts/KHfm4AZK8Pd4XTXGY/feedback-request-eval-crypt-a-simple-utility-to-mitigate ?
- Edward Turner 30 Jun 2025 1:19 UTC
  1 point
  0
  Parent
  Hey,
  
  Thanks for flagging this! To do a consistently better job for this we now have the below WIP (with the aim being others adopt it too as the issue seems to be becoming more common—in beta currently):
  https://github.com/Responsible-Dataset-Sharing/easy-dataset-share
  - megasilverfist 4 Nov 2025 4:03 UTC
    1 point
    0
    Parent
    I am trying and failing to find the password for em_organism_dir/data/training_datasets.zip.enc
    - Edward Turner 4 Nov 2025 12:56 UTC
      4 points
      0
      Parent
      “-p model-organisms-em-datasets” in setup on the README here.
      - megasilverfist 5 Nov 2025 4:08 UTC
        1 point
        0
        Parent
        Thanks, not sure how I missed that.
- Anna Soligo 18 Jun 2025 2:25 UTC
  1 point
  0
  Parent
  Thanks for raising this! Agree that harm is unlikely, but that the risk is there and its an easy fix. We’ve zipped the datasets in the repo now.
- [ ]
  [deleted]