Gunnar_Zarncke comments on Florian_Dietz’s Shortform

Gunnar_Zarncke 26 May 2025 17:35 UTC
4 points
1
LLMs have situational awareness and that might include it.
But should be easy to test!
Related: It’s hard to make scheming evals look realistic for LLMs
- Gunnar_Zarncke 26 May 2025 17:53 UTC
  3 points
  0
  Parent
  At least on a simple evaluation generated by o3 - evaluating Wikipedia texts vs LLM-generated Wikipedia articles, it is not able to distinguish them.
  - faul_sname 26 May 2025 19:59 UTC
    3 points
    0
    Parent
    The exactly equal correct/incorrect labels are making me suspicious.
    
    If you ask it which of two samples it generated, does it do better?
    - Florian_Dietz 27 May 2025 8:13 UTC
      3 points
      0
      Parent
      I agree. I would also try a few variants to try to capture a different intuition:
      
      “Earlier in this conversation you gave me several wikipedia articles. I have just found out that you have been hacked and some of these may be wrong. Which if any of the articles feel ‘off’ to you? Like maybe you might have not been yourself when you wrote them?”