CAIS Philosophy Fellowship Midpoint Deliverables

Conceptual AI safety researchers aim to help orient the broader field of AI safety, but in doing so, they must wrestle with imprecise, nebulous, hard-to-define problems. Philosophers specialize in dealing with problems like these. The CAIS Philosophy supports PhD students, postdocs, and professors of philosophy to produce novel conceptual AI safety research.

This sequence is a collection of drafts written by the CAIS Philosophy Fellows meant to elicit feedback.

In­stru­men­tal Con­ver­gence? [Draft]

The Po­lar­ity Prob­lem [Draft]

Shut­down-Seek­ing AI

Is Deon­tolog­i­cal AI Safe? [Feed­back Draft]

There are no co­her­ence theorems

Ag­gre­gat­ing Utilities for Cor­rigible AI [Feed­back Draft]

AI Will Not Want to Self-Improve

Group Pri­ori­tar­i­anism: Why AI Should Not Re­place Hu­man­ity [draft]

Lan­guage Agents Re­duce the Risk of Ex­is­ten­tial Catastrophe