It’s baked into my predictions. I would be shocked if probes could get us to >99% confidence in detecting things out of distribution on new generations of models. Doing it within well studied domains with a reasonable ground truth on a well studied model maybe, though 99.9% would still be impressive. But models are super messy
It’s baked into my predictions. I would be shocked if probes could get us to >99% confidence in detecting things out of distribution on new generations of models. Doing it within well studied domains with a reasonable ground truth on a well studied model maybe, though 99.9% would still be impressive. But models are super messy