I think Shard Theory is one of the most promising approaches on human values that I’ve seen on LW, and I’m very happy to see this work posted. (Of course, I’m probably biased in that I also count my own approaches to human values among the most promising and Shard Theory shares a number a similarities with it—e.g. this post talks about something-like-shards issuing mutually competitive bids that get strengthened or weakened depending on how environmental factors activate those shards, and this post talked about values and world-models being learned in an intertwined manner.)
I think Shard Theory is one of the most promising approaches on human values that I’ve seen on LW, and I’m very happy to see this work posted. (Of course, I’m probably biased in that I also count my own approaches to human values among the most promising and Shard Theory shares a number a similarities with it—e.g. this post talks about something-like-shards issuing mutually competitive bids that get strengthened or weakened depending on how environmental factors activate those shards, and this post talked about values and world-models being learned in an intertwined manner.)