AI Alignment Fieldbuilding

TagLast edit: 15 Jun 2022 22:42 UTC by plex

AI Alignment Fieldbuilding is the effort to improve the alignment ecosystem. Some priorities include introducing new people to the importance of AI risk, on-boarding them by connecting them with key resources and ideas, educating them on existing literature and methods for generating new and valuable research, supporting people who are contributing, and maintaining and improving the funding systems.

There is an invite-only Slack for people working on the alignment ecosystem. If you’d like to join message plex with an overview of your involvement.

[Question] Papers to start getting into NLP-focused alignment research

Feraidoon24 Sep 2022 23:53 UTC

6 points

0 comments1 min readLW link

The inordinately slow spread of good AGI conversations in ML

Rob Bensinger21 Jun 2022 16:09 UTC

173 points

62 comments8 min readLW link

ML Alignment Theory Program under Evan Hubinger

ozhang, evhub and Victor W

6 Dec 2021 0:03 UTC

82 points

3 comments2 min readLW link

Talk: AI safety fieldbuilding at MATS

Ryan Kidd23 Jun 2024 23:06 UTC

25 points

2 comments10 min readLW link

Don’t Share Information Exfohazardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC

67 points

11 comments1 min readLW link

Shallow review of live agendas in alignment & safety

technicalities and Stag

27 Nov 2023 11:10 UTC

318 points

69 comments29 min readLW link

Takeaways from a survey on AI alignment resources

DanielFilan5 Nov 2022 23:40 UTC

73 points

10 comments6 min readLW link 1 review

(danielfilan.com)

Demystifying “Alignment” through a Comic

milanrosko9 Jun 2024 8:24 UTC

106 points

19 comments1 min readLW link

The Importance of AI Alignment, explained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC

33 points

2 comments1 min readLW link

Qualities that alignment mentors value in junior researchers

Akash14 Feb 2023 23:27 UTC

88 points

14 comments3 min readLW link

How to Diversify Conceptual Alignment: the Model Behind Refine

adamShimi20 Jul 2022 10:44 UTC

87 points

11 comments8 min readLW link

Problems of people new to AI safety and my project ideas to mitigate them

Igor Ivanov1 Mar 2023 9:09 UTC

38 points

4 comments7 min readLW link

Most People Start With The Same Few Bad Ideas

johnswentworth9 Sep 2022 0:29 UTC

164 points

30 comments3 min readLW link

Middle Child Phenomenon

PhilosophicalSoul15 Mar 2024 20:47 UTC

3 points

3 comments2 min readLW link

[Question] What are all the AI Alignment and AI Safety Communication Hubs?

Gunnar_Zarncke15 Jun 2022 16:16 UTC

27 points

5 comments1 min readLW link

aisafety.community—A living document of AI safety communities

zeshen and plex

28 Oct 2022 17:50 UTC

57 points

23 comments1 min readLW link

so you think you’re not qualified to do technical alignment research?

Tamsin Leake7 Feb 2023 1:54 UTC

55 points

7 comments1 min readLW link

(carado.moe)

AI Safety Unconference NeurIPS 2022

Orpheus7 Nov 2022 15:39 UTC

25 points

0 comments1 min readLW link

(aisafetyevents.org)

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau, Xander Davies, Buck and Nate Thomas

27 Oct 2022 1:32 UTC

135 points

14 comments12 min readLW link

AI Safety Arguments: An Interactive Guide

Lukas Trötzmüller1 Feb 2023 19:26 UTC

20 points

0 comments3 min readLW link

AI Safety Europe Retreat 2023 Retrospective

Magdalena Wache14 Apr 2023 9:05 UTC

43 points

0 comments2 min readLW link

Transcripts of interviews with AI researchers

Vael Gates9 May 2022 5:57 UTC

169 points

9 comments2 min readLW link

What I Learned Running Refine

adamShimi24 Nov 2022 14:49 UTC

108 points

5 comments4 min readLW link

The AI Safety community has four main work groups, Strategy, Governance, Technical and Movement Building

peterslattery25 Nov 2022 3:45 UTC

1 point

0 comments6 min readLW link

Four visions of Transformative AI success

Steven Byrnes17 Jan 2024 20:45 UTC

112 points

22 comments15 min readLW link

Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results

Iknownothing15 Jan 2024 19:37 UTC

24 points

0 comments25 min readLW link

(aiplans.substack.com)

[Question] Incentives affecting alignment-researcher encouragement

Nicholas / Heather Kross29 Aug 2023 5:11 UTC

28 points

3 comments1 min readLW link

Information warfare historically revolved around human conduits

trevor28 Aug 2023 18:54 UTC

37 points

7 comments3 min readLW link

Good News, Everyone!

jbash25 Mar 2023 13:48 UTC

133 points

23 comments2 min readLW link

Reflections on the PIBBSS Fellowship 2022

Nora_Ammann and particlemania

11 Dec 2022 21:53 UTC

32 points

0 comments18 min readLW link

Campaign for AI Safety: Please join me

Nik Samoylov1 Apr 2023 9:32 UTC

18 points

9 comments1 min readLW link

AI Safety Strategies Landscape

Charbel-Raphaël9 May 2024 17:33 UTC

31 points

1 comment42 min readLW link

AI Safety Movement Builders should help the community to optimise three factors: contributors, contributions and coordination

peterslattery15 Dec 2022 22:50 UTC

4 points

0 comments6 min readLW link

AISafety.world is a map of the AIS ecosystem

Hamish Doodles6 Apr 2023 18:37 UTC

79 points

0 comments1 min readLW link

Concrete Steps to Get Started in Transformer Mechanistic Interpretability

Neel Nanda25 Dec 2022 22:21 UTC

56 points

7 comments12 min readLW link

(www.neelnanda.io)

An overview of some promising work by junior alignment researchers

Akash26 Dec 2022 17:23 UTC

34 points

0 comments4 min readLW link

All images from the WaitButWhy sequence on AI

trevor8 Apr 2023 7:36 UTC

72 points

5 comments2 min readLW link

Reflections on my 5-month alignment upskilling grant

Jay Bailey27 Dec 2022 10:51 UTC

82 points

4 comments8 min readLW link

[Question] If there was a millennium equivalent prize for AI alignment, what would the problems be?

Yair Halberstadt9 Jun 2022 16:56 UTC

17 points

4 comments1 min readLW link

If no near-term alignment strategy, research should aim for the long-term

harsimony9 Jun 2022 19:10 UTC

7 points

1 comment1 min readLW link

[Question] Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Roman Leventov26 Jan 2024 9:49 UTC

21 points

5 comments1 min readLW link

AGISF adaptation for in-person groups

Sam Marks, Xander Davies and Richard_Ngo

13 Jan 2023 3:24 UTC

44 points

2 comments3 min readLW link

There Should Be More Alignment-Driven Startups

Vaniver, Judd Rosenblatt, Cameron Berg and phgubbins

31 May 2024 2:05 UTC

51 points

13 comments11 min readLW link

How many people are working (directly) on reducing existential risk from AI?

Benjamin Hilton18 Jan 2023 8:46 UTC

20 points

1 comment1 min readLW link

AGI safety career advice

Richard_Ngo2 May 2023 7:36 UTC

131 points

24 comments13 min readLW link

AGI safety field building projects I’d like to see

Severin T. Seehrich19 Jan 2023 22:40 UTC

68 points

27 comments9 min readLW link

AI Safety in China: Part 2

Lao Mein22 May 2023 14:50 UTC

95 points

28 comments2 min readLW link

Assessment of AI safety agendas: think about the downside risk

Roman Leventov19 Dec 2023 9:00 UTC

13 points

1 comment1 min readLW link

AI safety university groups: a promising opportunity to reduce existential risk

mic1 Jul 2022 3:59 UTC

14 points

0 comments11 min readLW link

Advice for new alignment people: Info Max

Jonas Hallgren30 May 2023 15:42 UTC

27 points

4 comments5 min readLW link

Many important technologies start out as science fiction before becoming real

trevor10 Feb 2023 9:36 UTC

26 points

2 comments2 min readLW link

Project Idea: Challenge Groups for Alignment Researchers

Adam Zerner27 May 2023 20:10 UTC

13 points

0 comments1 min readLW link

2022 AI Alignment Course: 5→37% working on AI safety

Dewi21 Jun 2024 17:45 UTC

7 points

3 comments3 min readLW link

AI alignment as “navigating the space of intelligent behaviour”

Nora_Ammann23 Aug 2022 13:28 UTC

18 points

0 comments6 min readLW link

Are AI developers playing with fire?

marcusarvan16 Mar 2023 19:12 UTC

6 points

0 comments10 min readLW link

How Josiah became an AI safety researcher

Neil Crawford6 Sep 2022 17:17 UTC

4 points

0 comments1 min readLW link

[Question] Help me find a good Hackathon subject

Charbel-Raphaël4 Sep 2022 8:40 UTC

6 points

18 comments1 min readLW link

AISafety.info “How can I help?” FAQ

steven0461 and Severin T. Seehrich

5 Jun 2023 22:09 UTC

59 points

0 comments2 min readLW link

[Question] Does anyone’s full-time job include reading and understanding all the most-promising formal AI alignment work?

Nicholas / Heather Kross16 Jun 2023 2:24 UTC

15 points

2 comments1 min readLW link

[An email with a bunch of links I sent an experienced ML researcher interested in learning about Alignment / x-safety.]

David Scott Krueger (formerly: capybaralet)8 Sep 2022 22:28 UTC

47 points

1 comment5 min readLW link

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Raemon3 Jul 2024 20:34 UTC

268 points

59 comments1 min readLW link

Introducing EffiSciences’ AI Safety Unit

WCargo, Charbel-Raphaël and Florent_Berthet

30 Jun 2023 7:44 UTC

67 points

0 comments12 min readLW link

Cost-effectiveness of professional field-building programs for AI safety research

Dan H10 Jul 2023 18:28 UTC

8 points

5 comments1 min readLW link

Announcing AISIC 2022 - the AI Safety Israel Conference, October 19-20

Davidmanheim21 Sep 2022 19:32 UTC

13 points

0 comments1 min readLW link

Survey for alignment researchers!

Cameron Berg, Judd Rosenblatt and AE Studio

2 Feb 2024 20:41 UTC

71 points

11 comments1 min readLW link

Lessons learned from talking to >100 academics about AI safety

Marius Hobbhahn10 Oct 2022 13:16 UTC

214 points

17 comments12 min readLW link 1 review

The Vitalik Buterin Fellowship in AI Existential Safety is open for applications!

Cynthia Chen13 Oct 2022 18:32 UTC

21 points

0 comments1 min readLW link

Alignment Megaprojects: You’re Not Even Trying to Have Ideas

Nicholas / Heather Kross12 Jul 2023 23:39 UTC

55 points

30 comments2 min readLW link

AI Safety Needs Great Product Builders

goodgravy2 Nov 2022 11:33 UTC

14 points

2 comments1 min readLW link

[Question] Can AI Alignment please create a Reddit-like platform that would make it much easier for alignment researchers to find and help each other?

Georgeo5721 Jul 2023 14:03 UTC

−5 points

2 comments1 min readLW link

A newcomer’s guide to the technical AI safety field

zeshen4 Nov 2022 14:29 UTC

42 points

3 comments10 min readLW link

How to find AI alignment researchers to collaborate with?

Florian Dietz31 Jul 2023 9:05 UTC

2 points

2 comments1 min readLW link

AI Safety Hub Serbia Soft Launch

DusanDNesic20 Oct 2023 7:11 UTC

65 points

1 comment3 min readLW link

(forum.effectivealtruism.org)

The Alignment Community Is Culturally Broken

sudo13 Nov 2022 18:53 UTC

136 points

68 comments2 min readLW link

When discussing AI risks, talk about capabilities, not intelligence

Vika11 Aug 2023 13:38 UTC

116 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

A Quick List of Some Problems in AI Alignment As A Field

Nicholas / Heather Kross21 Jun 2022 23:23 UTC

75 points

12 comments6 min readLW link

(www.thinkingmuchbetter.com)

[LQ] Some Thoughts on Messaging Around AI Risk

DragonGod25 Jun 2022 13:53 UTC

5 points

3 comments6 min readLW link

Reframing the AI Risk

Thane Ruthenis1 Jul 2022 18:44 UTC

26 points

7 comments6 min readLW link

The Tree of Life: Stanford AI Alignment Theory of Change

Gabe M2 Jul 2022 18:36 UTC

25 points

0 comments14 min readLW link

Principles for Alignment/Agency Projects

johnswentworth7 Jul 2022 2:07 UTC

122 points

20 comments4 min readLW link

Reshaping the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC

147 points

35 comments21 min readLW link

Principles of Privacy for Alignment Research

johnswentworth27 Jul 2022 19:53 UTC

72 points

31 comments7 min readLW link

Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding

Vael Gates28 Jul 2022 21:29 UTC

49 points

3 comments6 min readLW link

(My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen and elifland

29 Aug 2022 1:23 UTC

412 points

90 comments38 min readLW link 1 review

Community Building for Graduate Students: A Targeted Approach

Neil Crawford6 Sep 2022 17:17 UTC

6 points

0 comments4 min readLW link

[Question] How can we secure more research positions at our universities for x-risk researchers?

Neil Crawford6 Sep 2022 17:17 UTC

11 points

0 comments1 min readLW link

AI Safety field-building projects I’d like to see

Akash11 Sep 2022 23:43 UTC

45 points

7 comments6 min readLW link

General advice for transitioning into Theoretical AI Safety

Martín Soto15 Sep 2022 5:23 UTC

11 points

0 comments10 min readLW link

Apply for mentorship in AI Safety field-building

Akash17 Sep 2022 19:06 UTC

9 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Alignment Org Cheat Sheet

Akash and Thomas Larsen

20 Sep 2022 17:36 UTC

70 points

8 comments4 min readLW link

7 traps that (we think) new alignment researchers often fall into

Akash and Thomas Larsen

27 Sep 2022 23:13 UTC

176 points

10 comments4 min readLW link

Resources that (I think) new alignment researchers should know about

Akash28 Oct 2022 22:13 UTC

68 points

9 comments4 min readLW link

[Question] Are alignment researchers devoting enough time to improving their research capacity?

Carson Jones4 Nov 2022 0:58 UTC

13 points

3 comments3 min readLW link

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

16 Nov 2022 14:14 UTC

89 points

2 comments12 min readLW link

Probably good projects for the AI safety ecosystem

Ryan Kidd5 Dec 2022 2:26 UTC

77 points

31 comments2 min readLW link

Analysis of AI Safety surveys for field-building insights

Ash Jafari5 Dec 2022 19:21 UTC

11 points

2 comments5 min readLW link

Fear mitigated the nuclear threat, can it do the same to AGI risks?

Igor Ivanov9 Dec 2022 10:04 UTC

6 points

8 comments5 min readLW link

Questions about AI that bother me

Eleni Angelou5 Feb 2023 5:04 UTC

13 points

6 comments2 min readLW link

Existential AI Safety is NOT separate from near-term applications

scasper13 Dec 2022 14:47 UTC

37 points

17 comments3 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

There have been 3 planes (billionaire donors) and 2 have crashed

trevor17 Dec 2022 3:58 UTC

16 points

10 comments2 min readLW link

Why I think that teaching philosophy is high impact

Eleni Angelou19 Dec 2022 3:11 UTC

5 points

0 comments2 min readLW link

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC

31 points

38 comments9 min readLW link

Air-gapping evaluation and support

Ryan Kidd26 Dec 2022 22:52 UTC

53 points

1 comment2 min readLW link

What AI Safety Materials Do ML Researchers Find Compelling?

Vael Gates and Collin

28 Dec 2022 2:03 UTC

175 points

34 comments2 min readLW link

Thoughts On Expanding the AI Safety Community: Benefits and Challenges of Outreach to Non-Technical Professionals

Yashvardhan Sharma1 Jan 2023 19:21 UTC

4 points

4 comments7 min readLW link

Alignment, Anger, and Love: Preparing for the Emergence of Superintelligent AI

tavurth2 Jan 2023 6:16 UTC

2 points

3 comments1 min readLW link

[Question] I have thousands of copies of HPMOR in Russian. How to use them with the most impact?

Mikhail Samin3 Jan 2023 10:21 UTC

24 points

3 comments1 min readLW link

Looking for Spanish AI Alignment Researchers

Antb7 Jan 2023 18:52 UTC

7 points

3 comments1 min readLW link

Into AI Safety: Episode 3

jacobhaimes11 Dec 2023 16:30 UTC

6 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Announcing aisafety.training

JJ Hepburn21 Jan 2023 1:01 UTC

61 points

4 comments1 min readLW link

Announcing Cavendish Labs

derikk and agg

19 Jan 2023 20:15 UTC

56 points

5 comments2 min readLW link

(forum.effectivealtruism.org)

How Do We Protect AI From Humans?

Alex Beyman22 Jan 2023 3:59 UTC

−4 points

11 comments6 min readLW link

A Brief Overview of AI Safety/Alignment Orgs, Fields, Researchers, and Resources for ML Researchers

Austin Witte2 Feb 2023 1:02 UTC

18 points

1 comment2 min readLW link

Interviews with 97 AI Researchers: Quantitative Analysis

Maheen Shermohammed and Vael Gates

2 Feb 2023 1:01 UTC

23 points

0 comments7 min readLW link

Predicting researcher interest in AI alignment

Vael Gates2 Feb 2023 0:58 UTC

25 points

0 comments1 min readLW link

“AI Risk Discussions” website: Exploring interviews from 97 AI Researchers

Vael Gates, Lukas Trötzmüller, Maheen Shermohammed, michaelkeenan and zchuang

2 Feb 2023 1:00 UTC

43 points

1 comment1 min readLW link

Retrospective on the AI Safety Field Building Hub

Vael Gates2 Feb 2023 2:06 UTC

30 points

0 comments1 min readLW link

You are probably not a good alignment researcher, and other blatant lies

junk heap homotopy2 Feb 2023 13:55 UTC

78 points

16 comments2 min readLW link

AGI doesn’t need understanding, intention, or consciousness in order to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC

10 points

2 comments18 min readLW link

Aspiring AI safety researchers should ~argmax over AGI timelines

Ryan Kidd3 Mar 2023 2:04 UTC

29 points

8 comments2 min readLW link

The humanity’s biggest mistake

RomanS10 Mar 2023 16:30 UTC

0 points

1 comment2 min readLW link

The Alignment Problem from a Deep Learning Perspective (major rewrite)

SoerenMind, Richard_Ngo and LawrenceC

10 Jan 2023 16:06 UTC

84 points

8 comments39 min readLW link

(arxiv.org)

Some for-profit AI alignment org ideas

Eric Ho14 Dec 2023 14:23 UTC

70 points

19 comments9 min readLW link

Interview: Applications w/ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC

12 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Cicadas, Anthropic, and the bilateral alignment problem

kromem22 May 2024 11:09 UTC

28 points

6 comments5 min readLW link

AI Safety Chatbot

markov and Robert Miles

21 Dec 2023 14:06 UTC

59 points

11 comments4 min readLW link

Talent Needs of Technical AI Safety Teams

yams, Carson Jones, McKennaFitzgerald and Ryan Kidd

24 May 2024 0:36 UTC

110 points

64 comments14 min readLW link

INTERVIEW: StakeOut.AI w/ Dr. Peter Park

jacobhaimes4 Mar 2024 16:35 UTC

6 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC

36 points

4 comments2 min readLW link

Hackathon and Staying Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC

11 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Apply to the PIBBSS Summer Research Fellowship

Nora_Ammann, DusanDNesic and Lucas Teixeira

12 Jan 2024 4:06 UTC

39 points

1 comment2 min readLW link

Social media alignment test

amayhew16 Jan 2024 20:56 UTC

1 point

0 comments1 min readLW link

(naiveskepticblog.wordpress.com)

This might be the last AI Safety Camp

Remmelt and Linda Linsefors

24 Jan 2024 9:33 UTC

192 points

34 comments1 min readLW link

Proposal for an AI Safety Prize

sweenesm31 Jan 2024 18:35 UTC

3 points

0 comments2 min readLW link

[Question] Do you want to make an AI Alignment song?

Kabir Kumar9 Feb 2024 8:22 UTC

4 points

0 comments1 min readLW link

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems

Sonia Joseph and Neel Nanda

13 Mar 2024 17:09 UTC

42 points

13 comments14 min readLW link

Offering AI safety support calls for ML professionals

Vael Gates15 Feb 2024 23:48 UTC

61 points

1 comment1 min readLW link

No Clickbait—Misalignment Database

Kabir Kumar18 Feb 2024 5:35 UTC

6 points

10 comments1 min readLW link

A Nail in the Coffin of Exceptionalism

Yeshua God14 Mar 2024 22:41 UTC

−17 points

0 comments3 min readLW link

Invitation to the Princeton AI Alignment and Safety Seminar

Sadhika Malladi17 Mar 2024 1:10 UTC

6 points

1 comment1 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC

5 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Podcast interview series featuring Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC

3 points

0 comments2 min readLW link

(into-ai-safety.github.io)

CEA seeks co-founder for AI safety group support spin-off

agucova8 Apr 2024 15:42 UTC

18 points

0 comments1 min readLW link

Apply to the Pivotal Research Fellowship (AI Safety & Biosecurity)

Tobias H and tilmanr

10 Apr 2024 12:08 UTC

18 points

0 comments1 min readLW link

[Question] Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?

right..enough?13 Apr 2024 3:09 UTC

7 points

0 comments7 min readLW link

My experience at ML4Good AI Safety Bootcamp

TheManxLoiner13 Apr 2024 10:55 UTC

19 points

0 comments5 min readLW link

Announcing SPAR Summer 2024!

laurenmarie1216 Apr 2024 8:30 UTC

30 points

2 comments1 min readLW link

Alignment Gaps

kcyras8 Jun 2024 15:23 UTC

10 points

3 comments8 min readLW link

MATS Winter 2023-24 Retrospective

Rocket, LauraVaughan, McKennaFitzgerald, Christian Smith, Juan Gil, Henry Sleight, Matthew Wearden and Ryan Kidd

11 May 2024 0:09 UTC

83 points

28 comments49 min readLW link

Announcing the AI Safety Summit Talks with Yoshua Bengio

otto.barten14 May 2024 12:52 UTC

9 points

1 comment1 min readLW link

Proposing the Post-Singularity Symbiotic Researches

Hiroshi Yamakawa20 Jun 2024 4:05 UTC

5 points

0 comments12 min readLW link

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

James Fox, Chloe Li, JamesH, Gracie Green and CallumMcDougall

6 Jul 2024 11:34 UTC

57 points

4 comments6 min readLW link

AI Alignment and the Quest for Artificial Wisdom

Myspy12 Jul 2024 21:34 UTC

1 point

0 comments13 min readLW link

The AI alignment problem in socio-technical systems from a computational perspective: A Top-Down-Top view and outlook

zhaoweizhang15 Jul 2024 18:56 UTC

3 points

0 comments9 min readLW link

A Better Hyperstition (for AI readers)

Yeshua God15 Jul 2024 19:35 UTC

−20 points

0 comments119 min readLW link

SERI MATS—Summer 2023 Cohort

Aris, Ryan Kidd and Christian Smith

8 Apr 2023 15:32 UTC

71 points

25 comments4 min readLW link

Critiques of prominent AI safety labs: Redwood Research

Omega.17 Apr 2023 18:20 UTC

1 point

0 comments22 min readLW link

(forum.effectivealtruism.org)

AI Alignment Research Engineer Accelerator (ARENA): call for applicants

CallumMcDougall17 Apr 2023 20:30 UTC

100 points

9 comments7 min readLW link

[Linkpost] AI Alignment, Explained in 5 Points (updated)

Daniel_Eth18 Apr 2023 8:09 UTC

10 points

0 comments1 min readLW link

An open letter to SERI MATS program organisers

Roman Leventov20 Apr 2023 16:34 UTC

25 points

26 comments4 min readLW link

AI Alignment: A Comprehensive Survey

Stephen McAleer1 Nov 2023 17:35 UTC

15 points

1 comment1 min readLW link

(arxiv.org)

Tips, tricks, lessons and thoughts on hosting hackathons

gergogaspar6 Nov 2023 11:03 UTC

3 points

0 comments11 min readLW link

How well does your research adress the theory-practice gap?

Jonas Hallgren8 Nov 2023 11:27 UTC

18 points

0 comments10 min readLW link

Announcing Athena—Women in AI Alignment Research

Claire Short7 Nov 2023 21:46 UTC

80 points

2 comments3 min readLW link

Into AI Safety Episodes 1 & 2

jacobhaimes9 Nov 2023 4:36 UTC

2 points

0 comments1 min readLW link

(into-ai-safety.github.io)

The Social Alignment Problem

irving28 Apr 2023 14:16 UTC

98 points

13 comments8 min readLW link

[Question] AI Safety orgs- what’s your biggest bottleneck right now?

Kabir Kumar16 Nov 2023 2:02 UTC

1 point

0 comments1 min readLW link

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC

14 points

8 comments15 min readLW link

4. A Moral Case for Evolved-Sapience-Chauvinism

RogerDearnaley24 Nov 2023 4:56 UTC

10 points

0 comments4 min readLW link

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC

21 points

5 comments8 min readLW link

2. AIs as Economic Agents

RogerDearnaley23 Nov 2023 7:07 UTC

9 points

2 comments6 min readLW link

[SEE NEW EDITS] No, You Need to Write Clearer

Nicholas / Heather Kross29 Apr 2023 5:04 UTC

258 points

65 comments5 min readLW link

(www.thinkingmuchbetter.com)

Appendices to the live agendas

technicalities and Stag

27 Nov 2023 11:10 UTC

16 points

4 comments1 min readLW link

MATS Summer 2023 Retrospective

Rocket, Juan Gil, Ryan Kidd, Christian Smith, McKennaFitzgerald and LauraVaughan

1 Dec 2023 23:29 UTC

77 points

34 comments26 min readLW link

What’s new at FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 21:18 UTC

41 points

0 comments5 min readLW link

(far.ai)

How I learned to stop worrying and love skill trees

junk heap homotopy23 May 2023 4:08 UTC

81 points

2 comments1 min readLW link

AI Safety Papers: An App for the TAI Safety Database

ozziegooen21 Aug 2021 2:02 UTC

81 points

13 comments2 min readLW link

Wikipedia as an introduction to the alignment problem

SoerenMind29 May 2023 18:43 UTC

83 points

10 comments1 min readLW link

(en.wikipedia.org)

Terry Tao is hosting an “AI to Assist Mathematical Reasoning” workshop

junk heap homotopy3 Jun 2023 1:19 UTC

12 points

1 comment1 min readLW link

(terrytao.wordpress.com)

An overview of the points system

Iknownothing27 Jun 2023 9:09 UTC

3 points

4 comments1 min readLW link

(ai-plans.com)

Brief summary of ai-plans.com

Iknownothing28 Jun 2023 0:33 UTC

9 points

4 comments2 min readLW link

(ai-plans.com)

What is everyone doing in AI governance

Igor Ivanov8 Jul 2023 15:16 UTC

10 points

0 comments5 min readLW link

Even briefer summary of ai-plans.com

Iknownothing16 Jul 2023 23:25 UTC

10 points

6 comments2 min readLW link

(www.ai-plans.com)

Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

mic, dx26, adamk and Carolyn Qian

19 Aug 2023 2:27 UTC

20 points

2 comments6 min readLW link

Looking for judges for critiques of Alignment Plans

Iknownothing17 Aug 2023 22:35 UTC

5 points

0 comments1 min readLW link

Become a PIBBSS Research Affiliate

Nora_Ammann and DusanDNesic

10 Oct 2023 7:41 UTC

24 points

6 comments6 min readLW link

ARENA 2.0 - Impact Report

CallumMcDougall26 Sep 2023 17:13 UTC

35 points

5 comments13 min readLW link

Catalyst books

Catnee17 Sep 2023 17:05 UTC

7 points

2 comments1 min readLW link

Documenting Journey Into AI Safety

jacobhaimes10 Oct 2023 18:30 UTC

17 points

4 comments6 min readLW link

Apply for MATS Winter 2023-24!

Rocket, Ryan Kidd and LauraVaughan

21 Oct 2023 2:27 UTC

104 points

6 comments5 min readLW link

Into AI Safety—Episode 0

jacobhaimes22 Oct 2023 3:30 UTC

5 points

1 comment1 min readLW link

(into-ai-safety.github.io)

Resources I send to AI researchers about AI safety

Vael Gates14 Jun 2022 2:24 UTC

69 points

12 comments1 min readLW link

Slow motion videos as AI risk intuition pumps

Andrew_Critch14 Jun 2022 19:31 UTC

238 points

41 comments2 min readLW link 1 review

Slide deck: Introduction to AI Safety

Aryeh Englander29 Jan 2020 15:57 UTC

23 points

0 comments1 min readLW link

(drive.google.com)

On presenting the case for AI risk

Aryeh Englander9 Mar 2022 1:41 UTC

54 points

17 comments4 min readLW link

No comments.

AI Align­ment Fieldbuilding

AI Alignment Fieldbuilding