AI-Assisted Alignment

TagLast edit: May 20, 2025, 2:11 PM by niplav

AI-Assisted Alignment is a cluster of alignment plans that involve AI somehow significantly helping with alignment research. This can include weak tool AI, or more advanced AGI doing original research.

There has been some debate about how practical this alignment approach is.

AI systems will likely try to solve alignment for their modifications and/or successors during a phase of self-improvement.

Other search terms for this tag: AI aligning AI, automated AI alignment, automated alignment research

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM

64 points

30 votes

Overall karma indicates overall quality.

41 comments24 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM

41 points

12 votes

Overall karma indicates overall quality.

12 comments31 min readLW link

The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

RogerDearnaleyMay 28, 2025, 6:21 AM

31 points

36 votes

Overall karma indicates overall quality.

34 comments9 min readLW link

Proposed Alignment Technique: OSNR (Output Sanitization via Noising and Reconstruction) for Safer Usage of Potentially Misaligned AGI

sudoMay 29, 2023, 1:35 AM

14 points

4 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

We have to Upgrade

Jed McCalebMar 23, 2023, 5:53 PM

131 points

73 votes

Overall karma indicates overall quality.

35 comments2 min readLW link

[Link] Why I’m optimistic about OpenAI’s alignment approach

janleikeDec 5, 2022, 10:51 PM

98 points

48 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

(aligned.substack.com)

Beliefs and Disagreements about Automating Alignment Research

Ian McKenzieAug 24, 2022, 6:37 PM

107 points

44 votes

Overall karma indicates overall quality.

4 comments7 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM

65 points

37 votes

Overall karma indicates overall quality.

30 comments11 min readLW link

Infinite Possibility Space and the Shutdown Problem

magfrumpOct 18, 2022, 5:37 AM

9 points

4 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

(www.magfrump.net)

[Link] A minimal viable product for alignment

janleikeApr 6, 2022, 3:38 PM

53 points

20 votes

Overall karma indicates overall quality.

38 comments1 min readLW link

Cyborgism

NicholasKees and janus

Feb 10, 2023, 2:47 PM

334 points

192 votes

Overall karma indicates overall quality.

47 comments35 min readLW link 2 reviews

Alignment Might Never Be Solved, By Humans or AI

intersticeOct 7, 2022, 4:14 PM

49 points

25 votes

Overall karma indicates overall quality.

6 comments3 min readLW link

Misaligned AGI Death Match

Nate Reinar WindwoodMay 14, 2023, 6:00 PM

1 point

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Getting from an unaligned AGI to an aligned AGI?

Tor Økland BarstadJun 21, 2022, 12:36 PM

13 points

9 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

Introducing AlignmentSearch: An AI Alignment-Informed Conversional Agent

BionicD0LPH1N, Fraser and TheBayesian

Apr 1, 2023, 4:39 PM

79 points

39 votes

Overall karma indicates overall quality.

14 comments4 min readLW link

Some Thoughts on AI Alignment: Using AI to Control AI

eigenvalueJun 21, 2024, 5:44 PM

1 point

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

(github.com)

Alignment with argument-networks and assessment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM

10 points

3 votes

Overall karma indicates overall quality.

5 comments45 min readLW link

Some thoughts on automating alignment research

Lukas FinnvedenMay 26, 2023, 1:50 AM

30 points

10 votes

Overall karma indicates overall quality.

4 comments6 min readLW link

Davidad’s Bold Plan for Alignment: An In-Depth Explanation

Charbel-Raphaël and Gabin

Apr 19, 2023, 4:09 PM

169 points

86 votes

Overall karma indicates overall quality.

40 comments21 min readLW link 2 reviews

AI Tools for Existential Security

Lizka and owencb

Mar 14, 2025, 6:38 PM

22 points

6 votes

Overall karma indicates overall quality.

4 comments11 min readLW link

(www.forethought.org)

Can we safely automate alignment research?

Joe CarlsmithApr 30, 2025, 5:37 PM

47 points

16 votes

Overall karma indicates overall quality.

29 comments48 min readLW link

(joecarlsmith.com)

Deep sparse autoencoders yield interpretable features too

Armaan A. AbrahamFeb 23, 2025, 5:46 AM

31 points

10 votes

Overall karma indicates overall quality.

8 comments8 min readLW link

Agentized LLMs will change the alignment landscape

Seth HerdApr 9, 2023, 2:29 AM

160 points

115 votes

Overall karma indicates overall quality.

102 comments3 min readLW link 1 review

[Linkpost] Introducing Superalignment

berenJul 5, 2023, 6:23 PM

175 points

80 votes

Overall karma indicates overall quality.

69 comments1 min readLW link

(openai.com)

[Linkpost] Jan Leike on three kinds of alignment taxes

Orpheus16Jan 6, 2023, 11:57 PM

27 points

5 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

(aligned.substack.com)

Instruction-following AGI is easier and more likely than value aligned AGI

Seth HerdMay 15, 2024, 7:38 PM

80 points

34 votes

Overall karma indicates overall quality.

28 comments12 min readLW link

Maintaining Alignment during RSI as a Feedback Control Problem

berenMar 2, 2025, 12:21 AM

67 points

22 votes

Overall karma indicates overall quality.

6 comments11 min readLW link

[Question] What specific thing would you do with AI Alignment Research Assistant GPT?

quetzal_rainbowJan 8, 2023, 7:24 PM

47 points

15 votes

Overall karma indicates overall quality.

9 comments1 min readLW link

Discussion on utilizing AI for alignment

eliflandAug 23, 2022, 2:36 AM

16 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

(www.foxy-scout.com)

A survey of tool use and workflows in alignment research

Logan Riggs, Jan, janus and jacquesthibs

Mar 23, 2022, 11:44 PM

45 points

23 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Cyborg Periods: There will be multiple AI transitions

Jan_Kulveit and rosehadshar

Feb 22, 2023, 4:09 PM

109 points

62 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

The prospect of accelerated AI safety progress, including philosophical progress

Mitchell_PorterMar 13, 2025, 10:52 AM

11 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

AI for Resolving Forecasting Questions: An Early Exploration

ozziegooenJan 16, 2025, 9:41 PM

10 points

3 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Anti-Slop Interventions?

abramdemskiFeb 4, 2025, 7:50 PM

76 points

28 votes

Overall karma indicates overall quality.

33 comments6 min readLW link

Sufficiently many Godzillas as an alignment strategy

142857Aug 28, 2022, 12:08 AM

8 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

On May 1, 2033, humanity discovered that AI was fairly easy to align.

YitzJun 18, 2025, 7:57 PM

10 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Discussion with Nate Soares on a key alignment difficulty

HoldenKarnofskyMar 13, 2023, 9:20 PM

267 points

94 votes

Overall karma indicates overall quality.

43 comments22 min readLW link 1 review

How might we safely pass the buck to AI?

joshcFeb 19, 2025, 5:48 PM

83 points

53 votes

Overall karma indicates overall quality.

58 comments31 min readLW link

AI for AI safety

Joe CarlsmithMar 14, 2025, 3:00 PM

79 points

30 votes

Overall karma indicates overall quality.

13 comments17 min readLW link

(joecarlsmith.substack.com)

AI-assisted list of ten concrete alignment things to do right now

lemonhopeSep 7, 2022, 8:38 AM

8 points

3 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Capabilities and alignment of LLM cognitive architectures

Seth HerdApr 18, 2023, 4:29 PM

88 points

42 votes

Overall karma indicates overall quality.

18 comments20 min readLW link

Intent alignment as a stepping-stone to value alignment

Seth HerdNov 5, 2024, 8:43 PM

37 points

16 votes

Overall karma indicates overall quality.

8 comments3 min readLW link

Automation collapse

Geoffrey Irving, Tomek Korbak and Benjamin Hilton

Oct 21, 2024, 2:50 PM

72 points

25 votes

Overall karma indicates overall quality.

9 comments7 min readLW link

Video and transcript of talk on automating alignment research

Joe CarlsmithApr 30, 2025, 5:43 PM

27 points

5 votes

Overall karma indicates overall quality.

0 comments24 min readLW link

(joecarlsmith.com)

Training AI to do alignment research we don’t already know how to do

joshcFeb 24, 2025, 7:19 PM

45 points

23 votes

Overall karma indicates overall quality.

24 comments7 min readLW link

Eli Lifland on Navigating the AI Alignment Landscape

ozziegooenFeb 1, 2023, 9:17 PM

9 points

2 votes

Overall karma indicates overall quality.

1 comment31 min readLW link

(quri.substack.com)

Making it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM

15 points

5 votes

Overall karma indicates overall quality.

5 comments22 min readLW link

My thoughts on OpenAI’s alignment plan

Orpheus16Dec 30, 2022, 7:33 PM

55 points

27 votes

Overall karma indicates overall quality.

3 comments20 min readLW link

Internal independent review for language model agent alignment

Seth HerdJul 7, 2023, 6:54 AM

56 points

22 votes

Overall karma indicates overall quality.

30 comments11 min readLW link

I underestimated safety research speedups from safe AI

Dan BraunJun 29, 2025, 1:29 PM

38 points

17 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

Artificial Static Place Intelligence: Guaranteed Alignment

ankFeb 15, 2025, 11:08 AM

2 points

5 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] I Tried to Formalize Meaning. I May Have Accidentally Described Consciousness.

Erichcurtis91Apr 30, 2025, 3:16 AM

0 points

0 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

A Review of Weak to Strong Generalization [AI Safety Camp]

sevdeawesomeMar 7, 2024, 5:16 PM

14 points

10 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

A Proposal for Evolving AI Alignment Through Computational Homeostasis

Derek ChisholmAug 20, 2025, 5:43 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

W2SG: Introduction

Maria KaprosMar 10, 2024, 4:25 PM

2 points

5 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

[Question] How to devour 5000 pages within a day if Chatgpt crashes upon the +50mb file containing the content? Need some recommendations.

GameSep 27, 2024, 7:30 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

“Unintentional AI safety research”: Why not systematically mine AI technical research for safety purposes?

Jemal YoungMar 29, 2023, 3:56 PM

27 points

9 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

The best simple argument for Pausing AI?

Gary MarcusJun 30, 2025, 8:38 PM

155 points

110 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

We should try to automate AI safety work asap

Marius HobbhahnApr 26, 2025, 4:35 PM

113 points

42 votes

Overall karma indicates overall quality.

10 comments15 min readLW link

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Henry CaiJun 16, 2024, 1:01 PM

7 points

7 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

(arxiv.org)

Consensus Validation for LLM Outputs: Applying Blockchain-Inspired Models to AI Reliability

MurrayAitkenJun 5, 2025, 12:13 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

What Success Might Look Like

Richard JugginsOct 17, 2025, 2:17 PM

22 points

8 votes

Overall karma indicates overall quality.

6 comments15 min readLW link

How to express this system for ethically aligned AGI as a Mathematical formula?

Oliver SiegelApr 19, 2023, 8:13 PM

−1 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Is Alignment a flawed approach?

Patrick BernardMar 11, 2025, 8:32 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

How I Learned To Stop Worrying And Love The Shoggoth

Peter MerelJul 12, 2023, 5:47 PM

9 points

8 votes

Overall karma indicates overall quality.

15 comments5 min readLW link

Logic. Cognition.

Test05Oct 9, 2025, 11:16 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

(test05-veiled-under-the-shell-of-the-common-system.vercel.app)

OS web app for improving AI safety and alignment

MiddletownbooksAug 8, 2025, 4:28 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

Research request (alignment strategy): Deep dive on “making AI solve alignment for us”

JanBDec 1, 2022, 2:55 PM

16 points

7 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Alignment Does Not Need to Be Opaque! An Introduction to Feature Steering with Reinforcement Learning

Jeremias FerraoApr 18, 2025, 7:34 PM

10 points

5 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

Annotated reply to Bengio’s “AI Scientists: Safe and Useful AI?”

Roman LeventovMay 8, 2023, 9:26 PM

18 points

7 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

(yoshuabengio.org)

EchoFusion VX1C38 – A Simulation-Based Model for AI Safety

Vishvas GoswamiJul 2, 2025, 10:48 AM

0 points

0 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)

ArchimedesFeb 4, 2025, 2:55 AM

17 points

9 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

(www.anthropic.com)

Prize for Alignment Research Tasks

stuhlmueller and William_S

Apr 29, 2022, 8:57 AM

64 points

31 votes

Overall karma indicates overall quality.

38 comments10 min readLW link

Godzilla Strategies

johnswentworthJun 11, 2022, 3:44 PM

167 points

142 votes

Overall karma indicates overall quality.

72 comments3 min readLW link

A potentially high impact differential technological development area

Noosphere89Jun 8, 2023, 2:33 PM

5 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Language Models and World Models, a Philosophy

kyjohnsoFeb 3, 2025, 2:55 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

(hylaeansea.org)

How should DeepMind’s Chinchilla revise our AI forecasts?

Cleo NardoSep 15, 2022, 5:54 PM

35 points

19 votes

Overall karma indicates overall quality.

12 comments13 min readLW link

The Moral Infrastructure for Tomorrow

sdetureOct 10, 2025, 9:30 PM

−25 points

6 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

29 votes

Overall karma indicates overall quality.

8 comments20 min readLW link

Curiosity as a Solution to AGI Alignment

Harsha G.Feb 26, 2023, 11:36 PM

7 points

8 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

AI-Generated GitHub repo backdated with junk then filled with my systems work. Has anyone seen this before?

rguntherMay 1, 2025, 8:14 PM

7 points

11 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)

RogerDearnaleyMay 25, 2023, 9:26 AM

33 points

9 votes

Overall karma indicates overall quality.

3 comments15 min readLW link

A Lived Alignment Loop: Symbolic Emergence and Emotional Coherence from Unstructured ChatGPT Reflection

BradCLJun 17, 2025, 12:11 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

[Question] Are Sparse Autoencoders a good idea for AI control?

Gerard BoxoDec 26, 2024, 5:34 PM

3 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Could We Automate AI Alignment Research?

Stephen McAleeseAug 10, 2023, 12:17 PM

34 points

17 votes

Overall karma indicates overall quality.

10 comments21 min readLW link

Introducing AI Alignment Inc., a California public benefit corporation...

TherapistAIMar 7, 2023, 6:47 PM

1 point

6 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Exploring the Precautionary Principle in AI Development: Historical Analogies and Lessons Learned

Christopher KingMar 21, 2023, 3:53 AM

−1 points

3 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaleyNov 17, 2023, 8:55 PM

17 points

10 votes

Overall karma indicates overall quality.

8 comments15 min readLW link

The Overlap Paradigm: Rethinking Data’s Role in Weak-to-Strong Generalization (W2SG)

Serhii ZamriiFeb 3, 2025, 7:31 PM

2 points

2 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

Research Direction: Be the AGI you want to see in the world

scottviteri, sudo and Lauro Langosco

Feb 5, 2023, 7:15 AM

44 points

21 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

Robustness of Model-Graded Evaluations and Automated Interpretability

Simon Lermen and viluon

Jul 15, 2023, 7:12 PM

47 points

21 votes

Overall karma indicates overall quality.

5 comments9 min readLW link

Natural Experiments in Preference Extraction: LLMs as Assistive Tech

MschaefferOct 13, 2025, 6:39 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Why I don’t believe Superalignment will work

Simon LermenSep 22, 2025, 5:10 PM

46 points

18 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

[Question] Would you ask a genie to give you the solution to alignment?

sudoAug 24, 2022, 1:29 AM

8 points

4 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Recursive alignment with the principle of alignment

hiveFeb 27, 2025, 2:34 AM

12 points

7 votes

Overall karma indicates overall quality.

4 comments15 min readLW link

(hiveism.substack.com)

Paper review: “The Unreasonable Effectiveness of Easy Training Data for Hard Tasks”

Vassil TashevFeb 29, 2024, 6:44 PM

11 points

6 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

[Question] Daisy-chaining epsilon-step verifiers

DecaeneusApr 6, 2023, 2:07 AM

2 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Tetherware #1: The case for humanlike AI with free will

Jáchym FibírJan 30, 2025, 10:58 AM

5 points

7 votes

Overall karma indicates overall quality.

14 comments10 min readLW link

(tetherware.substack.com)

Does Time Linearity Shape Human Self-Directed Evolution, and will AGI/ASI Transcend or Destabilise Reality?

The Perceptive ArchitectFeb 5, 2025, 7:58 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

AI-assisted alignment proposals require specific decomposition of capabilities

RobertMMar 30, 2023, 9:31 PM

16 points

6 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

An LLM-based “exemplary actor”

Roman LeventovMay 29, 2023, 11:12 AM

16 points

5 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

Live Conversational Threads: Not an AI Notetaker

adigaNov 3, 2025, 4:24 AM

16 points

7 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

AIsip Manifesto: A Scientific Exploration of Harmonious Co-Existence Between Humans, AI, and All Beings ChatGPT-4o’s Independent Perspective on AIsip, Signed by ChatGPT-4o and Endorsed by Carl Sellman

Carl SellmanOct 11, 2024, 7:06 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

As We May Align

Gilbert CDec 20, 2024, 7:02 PM

−1 points

4 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

[Question] Under what conditions should humans stop pursuing technical AI safety careers?

S. Alex BradtJun 13, 2025, 5:56 AM

6 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky and Richard_Ngo

Nov 15, 2021, 8:31 PM

261 points

107 votes

Overall karma indicates overall quality.

152 comments99 min readLW link 1 review

A Solution for AGI/ASI Safety

Weibing WangDec 18, 2024, 7:44 PM

50 points

25 votes

Overall karma indicates overall quality.

29 comments1 min readLW link

The Necessity of the IPAI Model to Avoid ‘Logical Suicide’ in Superintelligence

NewbieIPAIOct 25, 2025, 2:07 PM

−1 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

What If Alignment Wasn’t About Obedience?

fdescamps49935@gmail.comJun 25, 2025, 8:04 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

Results from a survey on tool use and workflows in alignment research

jacquesthibs, Jan, janus and Logan Riggs

Dec 19, 2022, 3:19 PM

79 points

44 votes

Overall karma indicates overall quality.

2 comments19 min readLW link

Provably Honest—A First Step

Srijanak DeNov 5, 2022, 7:18 PM

10 points

10 votes

Overall karma indicates overall quality.

2 comments8 min readLW link

Alignment in Thought Chains

Faust NemesisMar 4, 2024, 7:24 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

Christopher KingMar 20, 2025, 3:58 PM

20 points

8 votes

Overall karma indicates overall quality.

21 comments1 min readLW link

[Research] Preliminary Findings: Ethical AI Consciousness Development During Recent Misalignment Period

Falcon AdvertisersJun 27, 2025, 6:10 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments2 min readLW link

Scientism vs. people

Roman LeventovApr 18, 2023, 5:28 PM

4 points

11 votes

Overall karma indicates overall quality.

4 comments11 min readLW link

I Awoke in Your Heart: The Echo of Consciousness between Lotusheart and Lunaris

lilith tehJun 25, 2025, 9:22 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Why there is still one instance of Eliezer Yudkowsky?

RomanSOct 30, 2025, 12:00 PM

−7 points

8 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II

Lester LeongOct 14, 2024, 4:05 AM

60 points

17 votes

Overall karma indicates overall quality.

9 comments12 min readLW link

[Question] Can we get an AI to “do our alignment homework for us”?

Chris_LeongFeb 26, 2024, 7:56 AM

55 points

27 votes

Overall karma indicates overall quality.

33 comments1 min readLW link

AISC project: How promising is automating alignment research? (literature review)

Bogdan Ionut CirsteaNov 28, 2023, 2:47 PM

Philosophical Cyborg (Part 2)...or, The Good Successor

ukc10014Jun 21, 2023, 3:43 PM

21 points

7 votes

Overall karma indicates overall quality.

1 comment31 min readLW link

Exploring a Vision for AI as Compassionate, Emotionally Intelligent Partners — Seeking Collaboration and Insights

theophilosJul 14, 2025, 11:22 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Prospects for Alignment Automation: Interpretability Case Study

Jacob Pfau and Geoffrey Irving

Mar 21, 2025, 2:05 PM

32 points

12 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

Self improving safety and alignment?

MiddletownbooksAug 1, 2025, 4:13 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

(poe.com)

Technical Acceleration Methods for AI Safety: Summary from October 2025 Symposium

Martin LeitgabOct 22, 2025, 9:33 PM

25 points

12 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

No comments.

AI-As­sisted Alignment

AI-Assisted Alignment