avturchin comments on Untrusted smart models and trusted dumb models

avturchin 4 Nov 2023 12:17 UTC
3 points
−3
Models that are too good in deceptive alignment can deceptively look dumb during testing.
- Buck 4 Nov 2023 14:15 UTC
  3 points
  1
  Parent
  For the reasons I said in footnote 1, I feel pretty optimistic about being able to get around this problem. What do you think of those arguments?