RSS

Phil Blandfort

Karma: 46

De­tect­ing High-Stakes In­ter­ac­tions with Ac­ti­va­tion Probes

21 Jul 2025 18:21 UTC
50 points
0 comments4 min readLW link

Sam­pling Effects on Strate­gic Be­hav­ior in Su­per­vised Learn­ing Models

Phil Blandfort24 Sep 2024 7:44 UTC
1 point
0 comments6 min readLW link