AI Safety Thursdays: Understanding The Self-Other Overlap Approach

Description

​Leo Zovic presents on a less-explored technique that optimizes models to maintain similar internal representations when reasoning about themselves and others.

​This scalable approach not only reduces deceptive behavior in AI systems but can perfectly classify deceptive agents based on their self-other overlap values.

​​​Event Schedule

​6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
8:00 to 9:00 - Breakout Discussions

No comments.