Guive comments on Richard Ngo’s Shortform

Guive 7 Feb 2026 18:04 UTC
1 point
0
Another issue is that these definitions typically don’t distinguish between models that would explicitly think about how to fool humans on most inputs vs. on a small percentage of inputs vs. such a tiny fraction of possible inputs that it doesn’t matter in practice.