RSS

Analo­gies be­tween scal­ing labs and mis­al­igned su­per­in­tel­li­gent AI

scasper21 Feb 2024 19:29 UTC
52 points
2 comments4 min readLW link

Ex­tinc­tion Risks from AI: In­visi­ble to Science?

21 Feb 2024 18:07 UTC
22 points
6 comments1 min readLW link
(philpapers.org)

Ex­tinc­tion-level Good­hart’s Law as a Prop­erty of the Environment

21 Feb 2024 17:56 UTC
18 points
0 comments10 min readLW link

Dy­nam­ics Cru­cial to AI Risk Seem to Make for Com­pli­cated Models

21 Feb 2024 17:54 UTC
15 points
1 comment9 min readLW link

Which Model Prop­er­ties are Ne­c­es­sary for Eval­u­at­ing an Ar­gu­ment?

21 Feb 2024 17:52 UTC
15 points
0 comments7 min readLW link

Weak vs Quan­ti­ta­tive Ex­tinc­tion-level Good­hart’s Law

21 Feb 2024 17:38 UTC
14 points
0 comments2 min readLW link

Why does gen­er­al­iza­tion work?

Martín Soto20 Feb 2024 17:51 UTC
36 points
9 comments4 min readLW link

Difficulty classes for al­ign­ment properties

Jozdien20 Feb 2024 9:08 UTC
23 points
4 comments2 min readLW link

Pro­to­col eval­u­a­tions: good analo­gies vs control

Fabien Roger19 Feb 2024 18:00 UTC
29 points
8 comments11 min readLW link

Self-Aware­ness: Tax­on­omy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC
52 points
0 comments11 min readLW link

The Poin­ter Re­s­olu­tion Problem

Jozdien16 Feb 2024 21:25 UTC
39 points
18 comments3 min readLW link

Fix­ing Fea­ture Sup­pres­sion in SAEs

16 Feb 2024 18:32 UTC
69 points
2 comments10 min readLW link

Ret­ro­spec­tive: PIBBSS Fel­low­ship 2023

16 Feb 2024 17:48 UTC
29 points
0 comments8 min readLW link

Search­ing for Search­ing for Search

Rubi J. Hudson14 Feb 2024 23:51 UTC
20 points
2 comments7 min readLW link

Cri­tiques of the AI con­trol agenda

Jozdien14 Feb 2024 19:25 UTC
47 points
11 comments9 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
20 points
6 comments31 min readLW link

In­ter­pret­ing Quan­tum Me­chan­ics in In­fra-Bayesian Physicalism

Yegreg12 Feb 2024 18:56 UTC
28 points
4 comments32 min readLW link

Nat­u­ral ab­strac­tions are ob­server-de­pen­dent: a con­ver­sa­tion with John Wentworth

Martín Soto12 Feb 2024 17:28 UTC
35 points
13 comments7 min readLW link

Up­date­less­ness doesn’t solve most problems

Martín Soto8 Feb 2024 17:30 UTC
116 points
34 comments12 min readLW link

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

7 Feb 2024 21:28 UTC
86 points
13 comments9 min readLW link
(arxiv.org)