The Shard Theory of Human Values

Written by Quintin Pope, Alex Turner, Charles Foster, and Logan Smith. Card image generated by DALL-E 2:

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

Hu­man val­ues & bi­ases are in­ac­cessible to the genome

Re­ward is not the op­ti­miza­tion target

Gen­eral al­ign­ment properties

Shard The­ory: An Overview