Verification

TagLast edit: Jul 9, 2022, 2:37 PM by Tor Økland Barstad

Formal verification, heuristic explanations and surprise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM

156 points

11 comments9 min readLW link

(www.alignment.org)

Compact Proofs of Model Performance via Mechanistic Interpretability

LawrenceC, rajashree, Adrià Garriga-alonso and Jason Gross

Jun 24, 2024, 7:27 PM

96 points

4 comments8 min readLW link

(arxiv.org)

Making it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM

15 points

5 comments22 min readLW link

Alignment with argument-networks and assessment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM

10 points

5 comments45 min readLW link

No comments.