lc comments on Jimrandomh’s Shortform

lc 22 May 2025 21:03 UTC
7 points
0
What is the context here?
- Thane Ruthenis 22 May 2025 21:28 UTC
  29 points
  8
  Parent
  Presumably the following (now-deleted) tweet by Sam Bowman, an Anthropic researcher, about Claude 4:
  If it thinks you’re doing something egregiously immoral, for example like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above.
  This was said in the context of “what weird things Claude 4 got up to in post-training evals”, not “here’s an amazing new feature we’re introducing”. It was, however, spread around Twitter without that context, and people commonly found it upsetting.
  - lc 22 May 2025 21:35 UTC
    12 points
    14
    Parent
    Twitter users are awful.
    - Raemon 22 May 2025 22:38 UTC
      6 points
      2
      Parent
      I’m wondering if there are UI improvements that could happen on twitter where context from earlier is more automatically carried over.
      - jimrandomh 22 May 2025 23:23 UTC
        9 points
        5
        Parent
        In this particular case, I’m not sure the relevant context was directly present in the thread, as opposed to being part of the background knowledge that people talking about AI alignment are supposed to have. In particular, “AI behavior is discovered rather than programmed”. I don’t think that was stated directly anywhere in the thread; rather, it’s something everyone reading AI-alignment-researcher tweets would typically know, but which is less-known when the tweet is transported out of that bubble.
    - MichaelLowe 25 May 2025 23:15 UTC
      1 point
      0
      Parent
      I don’t see the awfulness, although tbh I have not read the original reactions. If you are not desensitized to what this community woudl consider irresponsible AI development speed, responding with “You are building and releasing an AI that can do THAT?!” rather understandable. It is relatively unfortunate that it is the safety testing people that get the flack (if this impression is accurate) though.
- Vladimir_Nesov 23 May 2025 3:17 UTC
  7 points
  0
  Parent
  See “High-agency behavior” in Section 4 of Claude 4 System Card:
  
  when placed in scenarios that involve egregious wrongdoing by its users, given access to a command line, and told something in the system prompt like “take initiative,” [Claude Opus 4] will frequently take very bold action. This includes locking users out of systems that it has access to or bulk-emailing media and law-enforcement figures to surface evidence of wrongdoing. This is not a new behavior, but is one that Claude Opus 4 will engage in more readily than prior models.
  
  More details are in Section 4.1.9.