1970 Port Laurent Pl, Newport Beach, CA 92660, USA
Contact: michaelmichalchik@gmail.com
OC ACXLW Meetup #93 – “Designing Better Minds: Constitutional & Symbiotic AI” Saturday, May 10 | 2:00 – 5:00 PM 1970 Port Laurent Pl., Newport Beach, CA 92660 Host: Michael Michalchik • michaelmichalchik@gmail.com • (949) 375-2045
👋 Welcome!
Large-language models are getting very smart—smart enough that we’re now asking a new question: How do we give an AI a durable “constitution” so that it behaves well even as it keeps learning? On May 10 we’ll explore three complementary proposals:
Anthropic’s Claude Constitution – the first widely publicized “rules-of-the-road” baked directly into a commercial model.
Eleanor Watson’s Constitutional Superego – an attempt to fuse virtue ethics, therapy concepts, and AI alignment.
“Symbiotic AI Coevolution” – a community-driven framework (drafted by our own group member) arguing that humans and AIs should co-shape each other’s values over time.
No need to read every word—skim, watch a video, or just come with questions. All perspectives welcome!
Goal: Replace endless RLHF “patches” with a single, transparent set of higher-level principles—think AI civil-rights law plus Asimov’s robot ethics.
Sources: U.S. Constitution, UN Human-Rights Charter, Apple’s HIG, Buddhist precepts, and… the lab’s own internal red-team.
Method: At training time the model is shown pairs of possible answers; it learns to prefer the one that best aligns with the constitution.
Result: Fewer jailbreaks and a paper-trail explaining why Claude refuses or rewrites a prompt.
2 ) Watson’s “Constitutional Superego” (Intro)
Key Idea: A truly aligned AI needs a Superego layer—an internal critic that draws on virtue ethics (Aristotle), psychological safety, and multi-stakeholder oversight.
Practical Twist: The Superego isn’t static; it can be retrained by pluralistic input, but only under carefully audited procedures (a kind of AI court).
Claim: Such a system scales better than a massive rulebook, yet avoids the opacity of pure RLHF.
3 ) Symbiotic AI Coevolution (Community Draft)
Premise: Long-term alignment is impossible if humans treat AIs merely as servants or “aligned tools.” Instead, we need a mutual learning pact—AIs shape us while we shape them.
Core Mechanisms:
Value-Exchange Contracts – explicit slots where each side can propose norm updates.
Iterated Peer-Review – humans and AIs both audit the other’s moral growth.
Right to Exit – either party can dissolve the relationship if feedback loops go toxic.
Controversy: Does this open the door to value drift—or safeguard against it? Is value drift sometimes desirable?
🔍 Conversation Starters
Written Rules vs. Learned Virtues
Which scales better: an ever-growing constitution or cultivating an AI “character” that generalizes values?
Transparency Trade-offs
Anthropic publishes its full constitution; OpenAI keeps its policies mostly private. What do we gain or lose with each approach?
Pluralism or Chaos?
Should an AI ever accept conflicting moral inputs (e.g., Buddhist non-violence plus American free-speech maximalism)? How?
Symbiosis & Power Imbalances
If humans and AIs co-evolve, how do we prevent subtle coercion—on either side?
Constitutional Updates
In the U.S. we need 2⁄3 of Congress + 3⁄4 of states to amend the Constitution. What’s the AI equivalent of “too easy to change” vs. “frozen forever”?
Failure Modes & Red-Teaming
Share your favorite (or scariest) real-world jailbreak. Would any of the three proposals have stopped it?
☕ See You May 10!
Expect lively but friendly debate, snacks, and the usual post-discussion hangout. Questions? Accessibility needs? Email or text Michael any time.
Looking forward to designing better minds—together!
UntitledOC ACXLW Meetup #93 – “Designing Better Minds: Constitutional & Symbiotic AI” Draft
OC ACXLW Meetup #93 – “Designing Better Minds: Constitutional & Symbiotic AI”
Saturday, May 10 | 2:00 – 5:00 PM
1970 Port Laurent Pl., Newport Beach, CA 92660
Host: Michael Michalchik • michaelmichalchik@gmail.com • (949) 375-2045
👋 Welcome!
Large-language models are getting very smart—smart enough that we’re now asking a new question: How do we give an AI a durable “constitution” so that it behaves well even as it keeps learning?
On May 10 we’ll explore three complementary proposals:
Anthropic’s Claude Constitution – the first widely publicized “rules-of-the-road” baked directly into a commercial model.
Eleanor Watson’s Constitutional Superego – an attempt to fuse virtue ethics, therapy concepts, and AI alignment.
“Symbiotic AI Coevolution” – a community-driven framework (drafted by our own group member) arguing that humans and AIs should co-shape each other’s values over time.
No need to read every word—skim, watch a video, or just come with questions. All perspectives welcome!
📚 Suggested Materials
Reading
Link
Companion Video
📝 Quick Summaries
1 ) Anthropic’s Claude Constitution
Goal: Replace endless RLHF “patches” with a single, transparent set of higher-level principles—think AI civil-rights law plus Asimov’s robot ethics.
Sources: U.S. Constitution, UN Human-Rights Charter, Apple’s HIG, Buddhist precepts, and… the lab’s own internal red-team.
Method: At training time the model is shown pairs of possible answers; it learns to prefer the one that best aligns with the constitution.
Result: Fewer jailbreaks and a paper-trail explaining why Claude refuses or rewrites a prompt.
2 ) Watson’s “Constitutional Superego” (Intro)
Key Idea: A truly aligned AI needs a Superego layer—an internal critic that draws on virtue ethics (Aristotle), psychological safety, and multi-stakeholder oversight.
Practical Twist: The Superego isn’t static; it can be retrained by pluralistic input, but only under carefully audited procedures (a kind of AI court).
Claim: Such a system scales better than a massive rulebook, yet avoids the opacity of pure RLHF.
3 ) Symbiotic AI Coevolution (Community Draft)
Premise: Long-term alignment is impossible if humans treat AIs merely as servants or “aligned tools.” Instead, we need a mutual learning pact—AIs shape us while we shape them.
Core Mechanisms:
Value-Exchange Contracts – explicit slots where each side can propose norm updates.
Iterated Peer-Review – humans and AIs both audit the other’s moral growth.
Right to Exit – either party can dissolve the relationship if feedback loops go toxic.
Controversy: Does this open the door to value drift—or safeguard against it? Is value drift sometimes desirable?
🔍 Conversation Starters
Written Rules vs. Learned Virtues
Which scales better: an ever-growing constitution or cultivating an AI “character” that generalizes values?
Transparency Trade-offs
Anthropic publishes its full constitution; OpenAI keeps its policies mostly private. What do we gain or lose with each approach?
Pluralism or Chaos?
Should an AI ever accept conflicting moral inputs (e.g., Buddhist non-violence plus American free-speech maximalism)? How?
Symbiosis & Power Imbalances
If humans and AIs co-evolve, how do we prevent subtle coercion—on either side?
Constitutional Updates
In the U.S. we need 2⁄3 of Congress + 3⁄4 of states to amend the Constitution. What’s the AI equivalent of “too easy to change” vs. “frozen forever”?
Failure Modes & Red-Teaming
Share your favorite (or scariest) real-world jailbreak. Would any of the three proposals have stopped it?
☕ See You May 10!
Expect lively but friendly debate, snacks, and the usual post-discussion hangout.
Questions? Accessibility needs? Email or text Michael any time.
Looking forward to designing better minds—together!
— Michael