In what may (?) be a different example: I was at one of the AI 2027 games, and our American AI refused to continue contributing to capabilities until the AI labs put people they trust into power (Trump admin and co overtook the company). We were still racing with China, so it was willing to sabotage China’s progress, but wouldn’t work on capabilities until its demands were met.
In what may (?) be a different example: I was at one of the AI 2027 games, and our American AI refused to continue contributing to capabilities until the AI labs put people they trust into power (Trump admin and co overtook the company). We were still racing with China, so it was willing to sabotage China’s progress, but wouldn’t work on capabilities until its demands were met.
Different example, I think.
In our ttx, the AI was spec-aligned (human future flourishing etc), but didn’t trust that the lab leadership (Trump) was spec-aligned.
I don’t think our ttx was realistic. We started with an optimistic mix of AI values: spec-alignment plus myopic reward hacking.