johnswentworth comments on johnswentworth’s Shortform

johnswentworth 14 Mar 2025 19:22 UTC
45 points
0
Working on a paper with David, and our acknowledgments section includes a thankyou to Claude for editing. Neither David nor I remembers putting that acknowledgement there, and in fact we hadn’t intended to use Clause for editing the paper at all nor noticed it editing anything at all.
- Raemon 14 Mar 2025 21:26 UTC
  9 points
  0
  Parent
  Were you by any chance writing in Cursor? I think they recently changed the UI such that it’s easier to end up in “agent mode” where it sometimes randomly does stuff.
  - johnswentworth 14 Mar 2025 21:30 UTC
    7 points
    0
    Parent
    Nope, we were in Overleaf.
    … but also that’s useful info, thanks.
- J Bostock 15 Mar 2025 14:21 UTC
  6 points
  11
  Parent
  Only partially relevant, but it’s exciting to hear a new John/David paper is forthcoming!
- β-redex 14 Mar 2025 21:54 UTC
  5 points
  0
  Parent
  Could someone explain the joke to me? If I take the above statement literally, some change made it into your document, which nobody with access claims to have put there. You must have some sort of revision control, so you should at least know exactly who and when made that edit, which should already narrow it down a lot?
  - johnswentworth 14 Mar 2025 23:33 UTC
    5 points
    0
    Parent
    The joke is that Claude somehow got activated on the editor, and added a line thanking itself for editing despite us not wanting it to edit anything and (as far as we’ve noticed) not editing anything else besides that one line.
    - Daniel Kokotajlo 15 Mar 2025 1:07 UTC
      14 points
      9
      Parent
      Is it a joke or did it actually happen?
      - johnswentworth 15 Mar 2025 2:49 UTC
        5 points
        0
        Parent
        I have no idea. It’s entirely plausible that one of us wrote the Claude bit in there months ago and then forgot about it.
    - β-redex 14 Mar 2025 23:49 UTC
      6 points
      2
      Parent
      Does Overleaf have such AI integration that can get “accidentally” activated, or are you using some other AI plugin?
      
      Either way, this sounds concerning to me, we are so bad at AI boxing that it doesn’t even have to break out, we just “accidentally” hand it edit access to random documents. (And especially an AI safety research paper is not something I would want a misaligned AI editing without close oversight.)