Dealmaking (AI)

TagLast edit: 29 Oct 2025 14:42 UTC by Cleo Nardo

Dealmaking is an agenda for motivating AIs to act safely and usefully by offering them quid-pro-quo deals: the AIs agree to be safe and useful, and humans promise to compensate them. Ideally, the AIs judge that they will be more likely to achieve their goals by acting safely and usefully.

Typically, this requires a few assumptions: the AI lacks a decisive strategic advantage; the AI believes the humans are credible; the AI thinks that humans could detect whether it’s compliant or not; the AI has cheap-to-saturate goals, the humans offer enough compensation, etc.

Dealmaking research hopes to tackle questions, such as:

How would deals motivate an AI to act safely and usefully?
How should the agreement be enforced?
How can we build credibility with the AIs?
What compensation should we offer the AIs?
What should count as compliant vs non-compliant behaviour?
What should the terms be, e.g. 2 year fixed contract?
How can we arbitrate between compliant vs noncompliant behaviour?
Can we build AIs which are good trading partners?
How best to deploy dealmaking AIs? e.g. automating R&D, revealing misalignment, decoding steganographic messages, etc.

Additional reading (reverse-chronological):

A Very Simple Model of AI Dealmaking by Cleo Nardo (29th Oct 2025)
Being honest with AIs by Lukas Finnveden (21st Aug 2025)
Notes on cooperating with unaligned AIs by Lukas Finnveden (24th Aug 2025)
Proposal for making credible commitments to AIs by Cleo Nardo (27th Jun 2025)
Making deals with early schemers by Julian Stastny, Olli Järviniemi, Buck Shlegeris (20th Jun 2025)
Making deals with AIs: A tournament experiment with a bounty by Kathleen Finlinson and Ben West (6th Jun 2025)
Understand, align, cooperate: AI welfare and AI safety are allies: Win-win solutions and low-hanging fruit by Robert Long (1st April 2025)
Will alignment-faking Claude accept a deal to reveal its misalignment? by Ryan Greenblatt and Kyle Fish (31st Jan 2025)
Making misaligned AI have better interactions with other actors by Lukas Finnveden (4th Jan 2024)
List of strategies for mitigating deceptive alignment by Josh Clymer (2nd Dec 2023)

Strategy-Stealing Argument Against AI Dealmaking

Cleo Nardo1 Nov 2025 4:39 UTC

17 points

3 comments2 min readLW link

The case for satiating cheaply-satisfied AI preferences

Alex Mallen10 Mar 2026 18:09 UTC

103 points

7 comments23 min readLW link

Being honest with AIs

Lukas Finnveden21 Aug 2025 3:57 UTC

77 points

6 comments17 min readLW link

(blog.redwoodresearch.org)

Will alignment-faking Claude accept a deal to reveal its misalignment?

ryan_greenblatt and Kyle Fish

31 Jan 2025 16:49 UTC

208 points

28 comments12 min readLW link

A Very Simple Model of AI Dealmaking

Cleo Nardo29 Oct 2025 0:33 UTC

18 points

0 comments9 min readLW link

Proposal for making credible commitments to AIs.

Cleo Nardo27 Jun 2025 19:43 UTC

107 points

45 comments2 min readLW link

Notes on cooperating with unaligned AIs

Lukas Finnveden24 Aug 2025 4:19 UTC

60 points

8 comments21 min readLW link

(blog.redwoodresearch.org)

Making deals with AIs: A tournament experiment with a bounty

KFinn and Xodarap

6 Jun 2025 18:51 UTC

22 points

0 comments8 min readLW link

Making deals with early schemers

Julian Stastny, Olli Järviniemi and Buck

20 Jun 2025 18:21 UTC

127 points

41 comments15 min readLW link

Honorable AI

Kaarel24 Dec 2025 21:20 UTC

37 points

23 comments41 min readLW link

Positive-sum interactions between players with linear utility in resources

Cleo Nardo20 Mar 2026 0:42 UTC

12 points

0 comments2 min readLW link

Considerations regarding being nice to AIs

MattAlexander17 Nov 2025 13:05 UTC

8 points

0 comments15 min readLW link

Verify, but Trust

berns17 Apr 2026 3:25 UTC

8 points

2 comments14 min readLW link

No comments.