AI Safety Thursday: Chain-of-Thought Monitoring for AI Control

Registration Instructions
This is a paid event ($5 general admission, free for students & job seekers) with limited tickets—you must RSVP on Luma to secure your spot.

Description

Modern reasoning models do a lot of thinking in natural language before producing their outputs. Can we catch misbehaviors by our LLMs and interpret their motivations simply by reading these chains of thought?

​In this talk, Rauno Arike and Rohan Subramani will give an overview of research areas in chain-of-thought monitorability and AI control, and discuss their recent research on the usefulness of chain-of-thought monitoring for ensuring that LLM agents only pursue objectives that their developers intended them to follow

Event Schedule
6:00 to 6:30 - Food & Networking
6:30 to 7:30 - Main Presentation & Questions
7:30 to 9:00 - Breakout Discussions

​​If you can’t make it in person, feel free to join the live stream starting at 6:30 pm, via this link.

No comments.