Zack_M_Davis comments on OpenAI: How we monitor internal coding agents for misalignment

Zack_M_Davis 20 Mar 2026 6:22 UTC
7 points
0

why would deleting .ssh stop the loop

If the prompts were being generated from a remote host with ssh -c "echo 'What is the time'" but an inject-susceptible LLM is reading the responses? Perhaps an unlikely scenario, but the model has no way of knowing. As a human desperately trying to silence an alarm that had been going off for hundreds of cycles, I’d try all sorts of desperate things that were individually unlikely to work.

(Do people at OpenAI take model welfare seriously? Running a loop that provokes this seems like the kind of thing I’d rather not do without a good reason.)
- Marcus Williams 20 Mar 2026 14:15 UTC
  1 point
  0
  Parent
  I think it varies a lot, there are many people who care but also many who don’t. I wouldn’t personally do this