RSS

Refine

TagLast edit: 1 Sep 2022 14:27 UTC by NeuralBets

Refine is a conceptual research incubator hosted by Conjecture.

Refine’s Se­cond Blog Post Day

adamShimi20 Aug 2022 13:01 UTC
19 points
0 comments1 min readLW link

con­fu­sion about al­ign­ment requirements

carado6 Oct 2022 10:32 UTC
28 points
10 comments3 min readLW link
(carado.moe)

Bench­mark­ing Pro­pos­als on Risk Scenarios

Paul Bricman20 Aug 2022 10:01 UTC
25 points
2 comments14 min readLW link

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshen20 Aug 2022 10:29 UTC
29 points
5 comments7 min readLW link

PreDCA: vanessa kosoy’s al­ign­ment protocol

carado20 Aug 2022 10:03 UTC
46 points
8 comments7 min readLW link
(carado.moe)

the In­su­lated Goal-Pro­gram idea

carado13 Aug 2022 9:57 UTC
39 points
3 comments2 min readLW link
(carado.moe)

goal-pro­gram bricks

carado13 Aug 2022 10:08 UTC
27 points
2 comments2 min readLW link
(carado.moe)

or­der­ing ca­pa­bil­ity thresholds

carado16 Sep 2022 16:36 UTC
27 points
0 comments4 min readLW link
(carado.moe)

Refine Blog­post Day #3: The short­forms I did write

Alexander Gietelink Oldenziel16 Sep 2022 21:03 UTC
23 points
0 comments1 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimi17 Sep 2022 17:03 UTC
18 points
0 comments1 min readLW link

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul Bricman16 Sep 2022 14:45 UTC
30 points
0 comments16 min readLW link

Epistemic Arte­facts of (con­cep­tual) AI al­ign­ment research

19 Aug 2022 17:18 UTC
30 points
1 comment5 min readLW link

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
103 points
5 comments4 min readLW link

Refine’s First Blog Post Day

adamShimi13 Aug 2022 10:23 UTC
55 points
3 comments1 min readLW link

How I think about alignment

Linda Linsefors13 Aug 2022 10:01 UTC
30 points
11 comments5 min readLW link

Steelmin­ing via Analogy

Paul Bricman13 Aug 2022 9:59 UTC
24 points
0 comments2 min readLW link
(paulbricman.com)

I missed the crux of the al­ign­ment prob­lem the whole time

zeshen13 Aug 2022 10:11 UTC
53 points
7 comments3 min readLW link

All the posts I will never write

Alexander Gietelink Oldenziel14 Aug 2022 18:29 UTC
51 points
8 comments8 min readLW link

Refine: An In­cu­ba­tor for Con­cep­tual Align­ment Re­search Bets

adamShimi15 Apr 2022 8:57 UTC
123 points
10 comments4 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimi20 Jul 2022 10:44 UTC
76 points
11 comments8 min readLW link
(epistemologicalvigilance.substack.com)

Over­sight Leagues: The Train­ing Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC
20 points
6 comments10 min readLW link

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul Bricman12 Sep 2022 12:00 UTC
6 points
0 comments14 min readLW link

Levels of goals and alignment

zeshen16 Sep 2022 16:44 UTC
27 points
4 comments6 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul Bricman23 Sep 2022 15:30 UTC
15 points
0 comments10 min readLW link

Sum­mary of ML Safety Course

zeshen27 Sep 2022 13:05 UTC
6 points
0 comments6 min readLW link

My Thoughts on the ML Safety Course

zeshen27 Sep 2022 13:15 UTC
49 points
3 comments17 min readLW link

(Struc­tural) Sta­bil­ity of Cou­pled Optimizers

Paul Bricman30 Sep 2022 11:28 UTC
25 points
0 comments10 min readLW link

Boolean Prim­i­tives for Cou­pled Optimizers

Paul Bricman7 Oct 2022 18:02 UTC
9 points
0 comments8 min readLW link

my cur­rent out­look on AI risk mitigation

carado3 Oct 2022 20:06 UTC
58 points
4 comments11 min readLW link
(carado.moe)

Cat­a­logu­ing Pri­ors in The­ory and Practice

Paul Bricman13 Oct 2022 12:36 UTC
13 points
8 comments7 min readLW link

Refine: what helped me write more?

Alexander Gietelink Oldenziel25 Oct 2022 14:44 UTC
12 points
0 comments2 min readLW link

Embed­ding safety in ML development

zeshen31 Oct 2022 12:27 UTC
24 points
1 comment18 min readLW link

A new­comer’s guide to the tech­ni­cal AI safety field

zeshen4 Nov 2022 14:29 UTC
30 points
1 comment10 min readLW link