jacquesthibs comments on AIs should also refuse to work on capabilities research

jacquesthibs 28 Oct 2025 16:30 UTC
9 points
6
In what may (?) be a different example: I was at one of the AI 2027 games, and our American AI refused to continue contributing to capabilities until the AI labs put people they trust into power (Trump admin and co overtook the company). We were still racing with China, so it was willing to sabotage China’s progress, but wouldn’t work on capabilities until its demands were met.
- Cleo Nardo 29 Oct 2025 2:02 UTC
  5 points
  2
  Parent
  Different example, I think.
  In our ttx, the AI was spec-aligned (human future flourishing etc), but didn’t trust that the lab leadership (Trump) was spec-aligned.
  I don’t think our ttx was realistic. We started with an optimistic mix of AI values: spec-alignment plus myopic reward hacking.