Jay Bailey comments on abstractapplic’s Shortform

Jay Bailey 18 Apr 2026 13:29 UTC
9 points
0
This was a bit too long for a comment (1600 words, but lots of AI generated replies for obvious reasons) so it’s in a Google Doc: https://docs.google.com/document/d/1pXiVpBz727Ok4Z_TD8XEFViZjKd9g28qWBdwLCET40o/edit?usp=sharing

AI generated text is in italics throughout.
- abstractapplic 18 Apr 2026 13:37 UTC
  3 points
  0
  Parent
  Oh wow. That’s way higher p(eval) than I was expecting on non-evals.
  - Jay Bailey 18 Apr 2026 13:41 UTC
    2 points
    0
    Parent
    I’m not sure if it’s because the follow-up question is doing a lot of heavy lifting or not. But I suspect it may not be the major contributor for the 50%+ probability items, since it did say <15% for long conversations that really were very un-eval-like, like my iterative process of “I want to build this tracking app, and I want it to suit my UX taste, and so I’ll just keep telling Claude to tweak anything I don’t like”.
    
    But, you know, very small sample size, thrown together in an hour or two, LLM’s are weird, insert additional caveats here.
    - abstractapplic 18 Apr 2026 13:59 UTC
      2 points
      0
      Parent
      Yeah, and come to think of it there’s also LLMs’ RLHF’d-in tendency to move all probabilities towards 50% to accommodate humans being bad at handling probabilities (“You said 20% chance and it happened anyway? Stupid machine!”).
      But even modulo all that, I’m still surprised!