AI Safety Thursdays: Understanding The Self-Other Overlap Approach

Juliana Eberschlag16 May 2025 19:36 UTC

2 points

[Open in Google Maps] [Open in local app]
22 May 2025, 6:00 pm—9:00 pm
30 Adelaide St E, Toronto, ON M5C 3G8, Canada

Description

Leo Zovic presents on a less-explored technique that optimizes models to maintain similar internal representations when reasoning about themselves and others.

This scalable approach not only reduces deceptive behavior in AI systems but can perfectly classify deceptive agents based on their self-other overlap values.

Event Schedule

6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
8:00 to 9:00 - Breakout Discussions

Juliana Eberschlag16 May 2025 19:36 UTC

2 points

0 comments1 min readLW link

No comments.