Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Verification
Tag
Last edit:
Jul 9, 2022, 2:37 PM
by
Tor Økland Barstad
Relevant
New
Old
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
Jun 25, 2024, 3:40 PM
156
points
11
comments
9
min read
LW
link
(www.alignment.org)
Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC
,
rajashree
,
Adrià Garriga-alonso
and
Jason Gross
Jun 24, 2024, 7:27 PM
96
points
4
comments
8
min read
LW
link
(arxiv.org)
Making it harder for an AGI to “trick” us, with STVs
Tor Økland Barstad
Jul 9, 2022, 2:42 PM
15
points
5
comments
22
min read
LW
link
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
Dec 13, 2022, 2:17 AM
10
points
5
comments
45
min read
LW
link
No comments.
Back to top