RSS

Con­jec­ture (org)

TagLast edit: 9 Mar 2023 2:09 UTC by Andrea_Miotti

Conjecture is an alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research.

The initial directions of their research agenda include:

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor Leahy8 Apr 2022 11:40 UTC
197 points
25 comments4 min readLW link

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël Trazzi22 Jul 2022 18:44 UTC
195 points
29 comments14 min readLW link
(theinsideview.ai)

Episte­molog­i­cal Vigilance for Alignment

adamShimi6 Jun 2022 0:27 UTC
65 points
11 comments10 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC
596 points
161 comments41 min readLW link8 reviews
(generative.ink)

Ques­tions about Con­je­cure’s CoEm proposal

9 Mar 2023 19:32 UTC
51 points
4 comments2 min readLW link

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris Sala22 May 2023 14:31 UTC
154 points
5 comments3 min readLW link
(www.conjecture.dev)

Re-Ex­am­in­ing LayerNorm

Eric Winsor1 Dec 2022 22:20 UTC
124 points
12 comments5 min readLW link

Refine’s First Blog Post Day

adamShimi13 Aug 2022 10:23 UTC
55 points
3 comments1 min readLW link

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

25 Feb 2023 19:35 UTC
197 points
45 comments4 min readLW link

Search­ing for Search

28 Nov 2022 15:31 UTC
91 points
8 comments14 min readLW link1 review

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

23 Nov 2022 17:10 UTC
185 points
9 comments8 min readLW link

Hu­man de­ci­sion pro­cesses are not well factored

17 Feb 2023 13:11 UTC
32 points
3 comments2 min readLW link

Em­pa­thy as a nat­u­ral con­se­quence of learnt re­ward models

beren4 Feb 2023 15:35 UTC
46 points
27 comments13 min readLW link

Cri­tiques of promi­nent AI safety labs: Conjecture

Omega.12 Jun 2023 1:32 UTC
14 points
32 comments33 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
47 points
42 comments1 min readLW link

Refine’s Se­cond Blog Post Day

adamShimi20 Aug 2022 13:01 UTC
19 points
0 comments1 min readLW link

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
108 points
5 comments4 min readLW link

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

28 Nov 2022 12:54 UTC
196 points
33 comments31 min readLW link

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
224 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
60 points
7 comments47 min readLW link

Ja­pan AI Align­ment Conference

10 Mar 2023 6:56 UTC
64 points
7 comments1 min readLW link
(www.conjecture.dev)

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimi20 Jul 2022 10:44 UTC
87 points
11 comments8 min readLW link

Mo­saic and Pal­impsests: Two Shapes of Research

adamShimi12 Jul 2022 9:05 UTC
39 points
3 comments9 min readLW link

Refine: An In­cu­ba­tor for Con­cep­tual Align­ment Re­search Bets

adamShimi15 Apr 2022 8:57 UTC
144 points
13 comments4 min readLW link

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC
112 points
12 comments33 min readLW link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

29 Jul 2022 19:07 UTC
131 points
6 comments19 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Akash15 Sep 2022 18:37 UTC
107 points
23 comments15 min readLW link

Method­olog­i­cal Ther­apy: An Agenda For Tack­ling Re­search Bottlenecks

22 Sep 2022 18:41 UTC
54 points
6 comments9 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

23 Sep 2022 17:58 UTC
136 points
29 comments33 min readLW link

Mys­ter­ies of mode collapse

janus8 Nov 2022 10:37 UTC
282 points
57 comments14 min readLW link1 review

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

16 Nov 2022 14:14 UTC
89 points
2 comments12 min readLW link

Con­jec­ture Se­cond Hiring Round

23 Nov 2022 17:11 UTC
92 points
0 comments1 min readLW link

The First Filter

26 Nov 2022 19:37 UTC
67 points
5 comments1 min readLW link

Bi­ases are en­g­ines of cognition

30 Nov 2022 16:47 UTC
45 points
7 comments1 min readLW link

Trade­offs in com­plex­ity, ab­strac­tion, and generality

12 Dec 2022 15:55 UTC
32 points
0 comments2 min readLW link

Psy­cholog­i­cal Di­sor­ders and Problems

12 Dec 2022 18:15 UTC
39 points
6 comments1 min readLW link

Men­tal ac­cep­tance and reflection

22 Dec 2022 14:32 UTC
34 points
1 comment2 min readLW link

Ba­sic Facts about Lan­guage Model Internals

4 Jan 2023 13:01 UTC
130 points
18 comments9 min readLW link

AGI will have learnt util­ity functions

beren25 Jan 2023 19:42 UTC
35 points
3 comments13 min readLW link

FLI Pod­cast: Con­nor Leahy on AI Progress, Chimps, Memes, and Mar­kets (Part 1/​3)

10 Feb 2023 13:55 UTC
39 points
0 comments43 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

15 Feb 2023 18:11 UTC
100 points
26 comments4 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimi20 Aug 2022 12:56 UTC
23 points
1 comment2 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimi13 Aug 2022 10:01 UTC
33 points
2 comments2 min readLW link

Ab­stract­ing The Hard­ness of Align­ment: Un­bounded Atomic Optimization

adamShimi29 Jul 2022 18:59 UTC
66 points
3 comments16 min readLW link

Levels of Pluralism

adamShimi27 Jul 2022 9:35 UTC
34 points
0 comments14 min readLW link

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimi23 Jul 2022 11:40 UTC
37 points
5 comments3 min readLW link

Gra­di­ent hack­ing is ex­tremely difficult

beren24 Jan 2023 15:45 UTC
161 points
22 comments5 min readLW link

Why al­most ev­ery RL agent does learned optimization

Lee Sharkey12 Feb 2023 4:58 UTC
32 points
3 comments5 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

13 Dec 2022 15:41 UTC
142 points
22 comments22 min readLW link2 reviews

Ja­pan AI Align­ment Con­fer­ence Postmortem

20 Apr 2023 10:58 UTC
71 points
8 comments8 min readLW link

Ba­sic facts about lan­guage mod­els dur­ing training

beren21 Feb 2023 11:46 UTC
96 points
14 comments18 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC
7 points
0 comments4 min readLW link

A tech­ni­cal note on bil­in­ear lay­ers for interpretability

Lee Sharkey8 May 2023 6:06 UTC
54 points
0 comments1 min readLW link
(arxiv.org)

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

1 May 2023 16:47 UTC
96 points
10 comments30 min readLW link

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC
74 points
12 comments16 min readLW link

A cou­ple of ques­tions about Con­jec­ture’s Cog­ni­tive Emu­la­tion proposal

Igor Ivanov11 Apr 2023 14:05 UTC
30 points
1 comment3 min readLW link

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Connor Leahy29 Aug 2023 10:56 UTC
69 points
13 comments1 min readLW link
(www.youtube.com)

Con­jec­ture: A stand­ing offer for pub­lic de­bates on AI

Andrea_Miotti16 Jun 2023 14:33 UTC
29 points
1 comment2 min readLW link
(www.conjecture.dev)
No comments.