johnswentworth comments on johnswentworth’s Shortform

johnswentworth 14 Mar 2025 23:33 UTC
5 points
0
The joke is that Claude somehow got activated on the editor, and added a line thanking itself for editing despite us not wanting it to edit anything and (as far as we’ve noticed) not editing anything else besides that one line.
- Daniel Kokotajlo 15 Mar 2025 1:07 UTC
  14 points
  9
  Parent
  Is it a joke or did it actually happen?
  - johnswentworth 15 Mar 2025 2:49 UTC
    5 points
    0
    Parent
    I have no idea. It’s entirely plausible that one of us wrote the Claude bit in there months ago and then forgot about it.
- β-redex 14 Mar 2025 23:49 UTC
  6 points
  2
  Parent
  Does Overleaf have such AI integration that can get “accidentally” activated, or are you using some other AI plugin?
  
  Either way, this sounds concerning to me, we are so bad at AI boxing that it doesn’t even have to break out, we just “accidentally” hand it edit access to random documents. (And especially an AI safety research paper is not something I would want a misaligned AI editing without close oversight.)