The thing I have in mind as north star looks closest to the GD Challenge in scope, but somewhat closer to the CIP one in implementation? The diff is something like:
Focus on superintelligence, which opens up a large possibility-space while rendering many problems people are usually focused on straightforwardly solved (consult rigorous futurists to get a sense of the options).
Identifying cruxes on how people’s values might end up, and using the kinds of deliberative mechanism design in your post here to help people clarify their thinking and find bridges.
I’m glad you’re seeing the challenges of consequentialism. I think the next crux is something like: My guess that consequentialism is a weed which grows in the cracks of any strong cognitive system, and that without formal guarantees of non-consequentialism, any attempt to build an ecosystem of the kind you describe will end up being eaten by processes which are unboundedly goal-seeking. I don’t know of any write-up that hits exactly the notes you’d want here, but some maybe decent intuition pumps in this direction include: The Parable of Predict-O-Matic, Why Tool AIs Want to Be Agent AIs, Averting the convergent instrumental strategy of self-improvement, Averting instrumental pressures, and other articles under arbital corrigibility.
I’d be open to having an on the record chat, but it’s possible we’d get into areas of my models which seem too exfohazardous for public record.
Great! If there are such areas, in the spirit of d/acc, I’d be happy to use a local language model to paraphrase them away and co-edit in an end-to-end-encrypted way to confirm before publishing.
The thing I have in mind as north star looks closest to the GD Challenge in scope, but somewhat closer to the CIP one in implementation? The diff is something like:
Focus on superintelligence, which opens up a large possibility-space while rendering many problems people are usually focused on straightforwardly solved (consult rigorous futurists to get a sense of the options).
Identifying cruxes on how people’s values might end up, and using the kinds of deliberative mechanism design in your post here to help people clarify their thinking and find bridges.
I’m glad you’re seeing the challenges of consequentialism. I think the next crux is something like: My guess that consequentialism is a weed which grows in the cracks of any strong cognitive system, and that without formal guarantees of non-consequentialism, any attempt to build an ecosystem of the kind you describe will end up being eaten by processes which are unboundedly goal-seeking. I don’t know of any write-up that hits exactly the notes you’d want here, but some maybe decent intuition pumps in this direction include: The Parable of Predict-O-Matic, Why Tool AIs Want to Be Agent AIs, Averting the convergent instrumental strategy of self-improvement, Averting instrumental pressures, and other articles under arbital corrigibility.
I’d be open to having an on the record chat, but it’s possible we’d get into areas of my models which seem too exfohazardous for public record.
Great! If there are such areas, in the spirit of d/acc, I’d be happy to use a local language model to paraphrase them away and co-edit in an end-to-end-encrypted way to confirm before publishing.