Mechanism Design/Aligning Incentives seems good too. Agree there are choices about the name, and I guess scope too. Do you mean it to be material about how to align incentives but exclude related stuff of examples where incentives failed to be aligned. Would Boeing 737 MAX MCAS as an agent corrigibility failure be part of it?
”Resource Bounded Epistemics” sounds like a cool category. So does “Interdisciplinary Analogies”, or should it be “Interdisciplinary Applications”?
Anyhow, these are great. More are welcome.
Fake Frameworks, yeah, hmm. We might consider “only authors can apply these tags”, I’m not sure. Those might make sense for general “epistemic state” tags.
+1 for a Mechanism Design/Aligning Incentives tag. I think “incentive design” would be a good name for this category. This would encompass material on specification gaming, tampering, impact measures, etc. Including specific examples of misaligned incentives under this umbrella seems fine as well.
Is the “aligning incentives” tag you are interested in something AI specific or should it apply to general human institutions / social systems? I could see a case for either, but that impacts what tag names we should use.
These are really good.
Embedded Agency is a clear win.
Mechanism Design/Aligning Incentives seems good too. Agree there are choices about the name, and I guess scope too. Do you mean it to be material about how to align incentives but exclude related stuff of examples where incentives failed to be aligned. Would Boeing 737 MAX MCAS as an agent corrigibility failure be part of it?
”Resource Bounded Epistemics” sounds like a cool category. So does “Interdisciplinary Analogies”, or should it be “Interdisciplinary Applications”?
Anyhow, these are great. More are welcome.
Fake Frameworks, yeah, hmm. We might consider “only authors can apply these tags”, I’m not sure. Those might make sense for general “epistemic state” tags.
+1 for a Mechanism Design/Aligning Incentives tag. I think “incentive design” would be a good name for this category. This would encompass material on specification gaming, tampering, impact measures, etc. Including specific examples of misaligned incentives under this umbrella seems fine as well.
Is the “aligning incentives” tag you are interested in something AI specific or should it apply to general human institutions / social systems? I could see a case for either, but that impacts what tag names we should use.
I was thinking of an AI specific tag, it seems a bit too broad otherwise.