AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): call for applicants

CallumMcDougallApr 17, 2023, 8:30 PM
100 points
9 comments7 min readLW link

AI #8: Peo­ple Can Do Rea­son­able Things

ZviApr 20, 2023, 3:50 PM
100 points
16 comments55 min readLW link
(thezvi.wordpress.com)

The So­cial Align­ment Problem

irvingApr 28, 2023, 2:16 PM
99 points
13 comments8 min readLW link

Would we even want AI to solve all our prob­lems?

So8resApr 21, 2023, 6:04 PM
98 points
15 comments2 min readLW link

Given the Restrict Act, Don’t Ban TikTok

ZviApr 4, 2023, 2:40 PM
97 points
9 comments4 min readLW link
(thezvi.wordpress.com)

Why Si­mu­la­tor AIs want to be Ac­tive In­fer­ence AIs

Apr 10, 2023, 6:23 PM
96 points
9 comments8 min readLW link1 review

Com­mu­ni­cat­ing effec­tively un­der Knigh­tian norms

Richard_NgoApr 3, 2023, 10:39 PM
96 points
54 comments6 min readLW link

Scaf­folded LLMs as nat­u­ral lan­guage computers

berenApr 12, 2023, 10:47 AM
95 points
10 comments11 min readLW link

Con­tra Yud­kowsky on Doom from Foom #2

jacob_cannellApr 27, 2023, 12:07 AM
94 points
76 comments6 min readLW link

Ex­po­sure to Lizard­man is Lethal

Duncan Sabien (Inactive)Apr 2, 2023, 6:57 PM
91 points
97 comments3 min readLW link

Con­tra Yud­kowsky on AI Doom

jacob_cannellApr 24, 2023, 12:20 AM
89 points
111 comments9 min readLW link

Ca­pa­bil­ities and al­ign­ment of LLM cog­ni­tive architectures

Seth HerdApr 18, 2023, 4:29 PM
88 points
18 comments20 min readLW link

A Con­fes­sion about the LessWrong Team

RubyApr 1, 2023, 9:47 PM
87 points
5 comments2 min readLW link

Sin­gu­lar­i­ties against the Sin­gu­lar­ity: An­nounc­ing Work­shop on Sin­gu­lar Learn­ing The­ory and Alignment

Apr 1, 2023, 9:58 AM
87 points
0 comments1 min readLW link
(singularlearningtheory.com)

You can use GPT-4 to cre­ate prompt in­jec­tions against GPT-4

WitchBOTApr 6, 2023, 8:39 PM
87 points
8 comments2 min readLW link

The Agency Overhang

Jeffrey LadishApr 21, 2023, 7:47 AM
85 points
6 comments6 min readLW link

No con­vinc­ing ev­i­dence for gra­di­ent de­scent in ac­ti­va­tion space

BlaineApr 12, 2023, 4:48 AM
85 points
9 comments20 min readLW link

The benev­olence of the butcher

dr_sApr 8, 2023, 4:29 PM
84 points
33 comments6 min readLW link1 review

AI Safety via Luck

JozdienApr 1, 2023, 8:13 PM
82 points
7 comments11 min readLW link

Po­lio Lab Leak Caught with Wastew­a­ter Sampling

CullenApr 7, 2023, 1:06 AM
82 points
3 commentsLW link

An­thropic is fur­ther ac­cel­er­at­ing the Arms Race?

sapphireApr 6, 2023, 11:29 PM
82 points
22 comments1 min readLW link
(techcrunch.com)

The sur­pris­ing pa­ram­e­ter effi­ciency of vi­sion models

berenApr 8, 2023, 7:44 PM
81 points
28 comments4 min readLW link

AISafety.world is a map of the AIS ecosystem

Hamish DoodlesApr 6, 2023, 6:37 PM
80 points
0 comments1 min readLW link

AI #6: Agents of Change

ZviApr 6, 2023, 2:00 PM
79 points
13 comments47 min readLW link
(thezvi.wordpress.com)

In­tro­duc­ing Align­men­tSearch: An AI Align­ment-In­formed Con­ver­sional Agent

Apr 1, 2023, 4:39 PM
79 points
14 comments4 min readLW link

My ex­pe­rience get­ting fund­ing for my biolog­i­cal research

MetacelsusApr 16, 2023, 10:53 PM
78 points
10 comments5 min readLW link
(denovo.substack.com)

Lo­cat­ing Ful­crum Experiences

LoganStrohlApr 28, 2023, 8:14 PM
78 points
31 comments17 min readLW link

In­tro­duc­ing the Nuts and Bolts Of Naturalism

LoganStrohlApr 22, 2023, 6:31 PM
77 points
2 comments3 min readLW link

Re­search agenda: Su­per­vis­ing AIs im­prov­ing AIs

Apr 29, 2023, 5:09 PM
76 points
5 comments19 min readLW link

Ro­mance, mi­s­un­der­stand­ing, so­cial stances, and the hu­man LLM

Kaj_SotalaApr 27, 2023, 12:59 PM
75 points
32 comments16 min readLW link

I was Wrong, Si­mu­la­tor The­ory is Real

Robert_AIZIApr 26, 2023, 5:45 PM
75 points
7 comments3 min readLW link
(aizi.substack.com)

The Com­pu­ta­tional Anatomy of Hu­man Values

berenApr 6, 2023, 10:33 AM
74 points
30 comments30 min readLW link

[Question] Is this true? @tyler_m_john: [If we had started us­ing CFCs ear­lier, we would have ended most life on the planet]

tailcalledApr 10, 2023, 2:22 PM
73 points
15 comments1 min readLW link
(twitter.com)

All images from the WaitButWhy se­quence on AI

trevorApr 8, 2023, 7:36 AM
73 points
5 comments2 min readLW link

Repug­nant lev­els of violins

Solenoid_EntityApr 12, 2023, 5:11 PM
73 points
10 comments12 min readLW link

The Tox­o­plasma of AGI Doom and Ca­pa­bil­ities?

Robert_AIZIApr 24, 2023, 6:11 PM
72 points
12 comments1 min readLW link

Ja­pan AI Align­ment Con­fer­ence Postmortem

Apr 20, 2023, 10:58 AM
71 points
8 comments8 min readLW link

Power laws in Speedrun­ning and Ma­chine Learning

Apr 24, 2023, 10:06 AM
71 points
1 comment1 min readLW link
(arxiv.org)

Smar­tyHead­erCode: anoma­lous to­kens for GPT3.5 and GPT-4

AdamYedidiaApr 15, 2023, 10:35 PM
71 points
18 comments6 min readLW link

SERI MATS—Sum­mer 2023 Cohort

Apr 8, 2023, 3:32 PM
71 points
25 comments4 min readLW link

A decade of lurk­ing, a month of posting

Max HApr 9, 2023, 12:21 AM
70 points
4 comments5 min readLW link

[Linkpost] Sam Alt­man’s 2015 Blog Posts Ma­chine In­tel­li­gence Parts 1 & 2

OliviaJApr 28, 2023, 4:02 PM
70 points
4 comments9 min readLW link

Ap­prox­i­ma­tion is ex­pen­sive, but the lunch is cheap

Apr 19, 2023, 2:19 PM
70 points
3 comments16 min readLW link

AGI ruin mostly rests on strong claims about al­ign­ment and de­ploy­ment, not about society

Rob BensingerApr 24, 2023, 1:06 PM
70 points
8 comments6 min readLW link

Get­ting Started With Naturalism

LoganStrohlApr 23, 2023, 9:02 PM
69 points
4 comments11 min readLW link1 review

Why Are Max­i­mum En­tropy Distri­bu­tions So Ubiquitous?

johnswentworthApr 5, 2023, 8:12 PM
68 points
6 comments9 min readLW link

Mechanis­ti­cally in­ter­pret­ing time in GPT-2 small

Apr 16, 2023, 5:57 PM
68 points
6 comments21 min readLW link

Sub­scripts for Probabilities

niplavApr 13, 2023, 6:32 PM
67 points
9 comments5 min readLW link

Green goo is plausible

anithiteApr 18, 2023, 12:04 AM
67 points
31 comments4 min readLW link1 review

On “aiming for con­ver­gence on truth”

gjmApr 11, 2023, 6:19 PM
67 points
55 comments13 min readLW link