Jacob_Hilton comments on Jacob_Hilton’s Shortform

Jacob_Hilton 1 May 2025 0:58 UTC
LW: 7 AF: 4
0
AF
I recently gave this talk at the Safety-Guaranteed LLMs workshop:
The talk is about ARC’s work on low probability estimation (LPE), covering:
- Theoretical motivation for LPE and (towards the end) activation modeling approaches (both described here)
- Empirical work on LPE in language models (described here)
- Recent work-in-progress on theoretical results