MiguelDev comments on Open problems in emergent misalignment

MiguelDev 2 Mar 2025 18:09 UTC
3 points
0
You might be interested on a rough and random utilitarian (paperclip maximization) experiment that I did a while back on a GPT2XL, Phi1.5 and Falcon-RW-1B. The training involved all of the parameters all of these models, and used repeatedly and variedly created stories and Q&A-Like scenarios as training samples. Feel free to reach out if you have further questions.
- Jan Betley 2 Mar 2025 19:00 UTC
  2 points
  1
  Parent
  Hi, the link doesn’t work
  - MiguelDev 2 Mar 2025 19:39 UTC
    1 point
    1
    Parent
    Fixed!