Noosphere89 comments on Linch’s Shortform

Noosphere89 17 Apr 2026 14:15 UTC
17 points
13
This is why I think eval awareness tests should be complemented by tests on how many false positives AI gets, to test whether AIs can recognize when they aren’t in an eval, rather than when they are in an eval.
- Linch 17 Apr 2026 21:03 UTC
  9 points
  4
  Parent
  The “am I in an eval for being in an eval” evaluation.
  - abstractapplic 18 Apr 2026 11:27 UTC
    3 points
    0
    Parent
    That seems inherently hard to do systematically, but easy to do a fuzzy version of anecdotally. Someone could just post something on LW asking AI-using users to ask “what do you think the probability of this being an Eval is?” to their AIs in the middle of organic use, and report back.
    And by ‘someone’, I mean me. I could do that. So I will.
    - Linch 20 Apr 2026 0:00 UTC
      12 points
      7
      Parent
      a more systematic version of this is for AI companies to randomly poll models in production after some series of user queries and ask them “what do you think the probability of this being an Eval is?” and/or more sophisticated mech interp variations.
- Bronson Schoen 17 Apr 2026 23:54 UTC
  2 points
  0
  Parent
  I think it largely doesn’t matter, models can (and do) often come out erring on the side of considering things evals.