Towards Causal Foundations of Safe AGI

This sequence will give our take on how causality underpins many critical aspects of safe AGI, including agency, incentives, misspecification, generalisation, fairness, and corrigibility. We summarise past work and point to open questions.

By the Causal Incentives Working Group

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

Causal­ity: A Brief Introduction

Agency from a causal perspective

In­cen­tives from a causal perspective

Re­ward Hack­ing from a Causal Perspective