Verification

TagLast edit: 9 Jul 2022 14:37 UTC by Tor Økland Barstad

Validating against a misalignment detector is very different to training against one

mattmacdermott4 Mar 2025 15:41 UTC

53 points

6 comments4 min readLW link

The V&V method—A step towards safer AGI

Yoav Hollander24 Jun 2025 13:42 UTC

20 points

1 comment1 min readLW link

(blog.foretellix.com)

I applied Audit standards to LLM agents. It reliably exposes hidden assumptions.

Lei Wang17 Jan 2026 4:40 UTC

1 point

0 comments2 min readLW link

Safe Recursive Self-Improvement with Verified Compilers

Adam Chlipala24 Mar 2026 13:35 UTC

15 points

0 comments11 min readLW link

Alignment with argument-networks and assessment-predictions

Tor Økland Barstad13 Dec 2022 2:17 UTC

10 points

5 comments45 min readLW link

All hands on deck to build the datacenter lie detector

Naci Cankaya19 Feb 2026 11:42 UTC

31 points

2 comments5 min readLW link

(open.substack.com)

Make Powerful Machines Verifiable

Naci Cankaya4 Mar 2026 14:20 UTC

22 points

4 comments4 min readLW link

Mapping the Constrained Autonomy Gradient: A Collaborative, Minimal-Scale AGI Safety Benchmark

ViceStudioPub15 Jan 2026 9:46 UTC

1 point

0 comments4 min readLW link

Compact Proofs of Model Performance via Mechanistic Interpretability

LawrenceC, rajashree, Adrià Garriga-alonso and Jason Gross

24 Jun 2024 19:27 UTC

104 points

4 comments8 min readLW link

(arxiv.org)

Formal verification, heuristic explanations and surprise accounting

Jacob_Hilton25 Jun 2024 15:40 UTC

168 points

11 comments9 min readLW link

(www.alignment.org)

Making it harder for an AGI to “trick” us, with STVs

Tor Økland Barstad9 Jul 2022 14:42 UTC

15 points

5 comments22 min readLW link

No comments.