Here are some maybe useful tags. Interpret these as ideas, not requests.
Mechanism Design (I think I am imagining including systemization that aligns incentives within yourself in here, which maybe means you would want a more general name like “Aligning Incentives” but I think I prefer “Mechanism Design”)
Fake Frameworks (When I first thought of this, I was thinking of people tagging their own posts. Maybe it is a little weird to have people tagging each other’s posts as fake. )
Embedded Agency (Where I am imagining this as being largely for technical work) (In particular, I personally would get more use out one big embedded agency tag than a bunch of smaller tags, since I feel like all the most interesting stuff in embedded agency cuts across tags like “decision theory”)
Something like the class including: Toward a New Technical Explanation of Technical Explanation, Embedded World Models, technical logical uncertainty work, things about dealing with the fact that Bayes is not a viable strategy for embedded agents. “Embedded World Models” “Resource Bounded Epistemics” “Embedded Epistemics” “Post-Bayesianism” I would hope the name here does not make people think it should only be for technical things.
Something like the class including: How I Lost 100 Pounds Using TDT, Humans Are Embedded Agents Too, Inner alignment in the brain, Sources of intuitions and data on AGI, things about applying AI alignment theory to human rationality and vice versa. Maybe more generally about applying results from one field to another field. “Interdisciplinary Analogies”?
Mechanism Design/Aligning Incentives seems good too. Agree there are choices about the name, and I guess scope too. Do you mean it to be material about how to align incentives but exclude related stuff of examples where incentives failed to be aligned. Would Boeing 737 MAX MCAS as an agent corrigibility failure be part of it?
”Resource Bounded Epistemics” sounds like a cool category. So does “Interdisciplinary Analogies”, or should it be “Interdisciplinary Applications”?
Anyhow, these are great. More are welcome.
Fake Frameworks, yeah, hmm. We might consider “only authors can apply these tags”, I’m not sure. Those might make sense for general “epistemic state” tags.
+1 for a Mechanism Design/Aligning Incentives tag. I think “incentive design” would be a good name for this category. This would encompass material on specification gaming, tampering, impact measures, etc. Including specific examples of misaligned incentives under this umbrella seems fine as well.
Is the “aligning incentives” tag you are interested in something AI specific or should it apply to general human institutions / social systems? I could see a case for either, but that impacts what tag names we should use.
Fake Frameworks (When I first thought of this, I was thinking of people tagging their own posts. Maybe it is a little weird to have people tagging each other’s posts as fake. )
Maybe the tag should just be called “Frameworks” then?
(And the Frameworks page can make clear that frameworks may vary from hand-wavy intuition building to carving reality at the joints based on a high confidence underlying model.)
Here are some maybe useful tags. Interpret these as ideas, not requests.
Mechanism Design (I think I am imagining including systemization that aligns incentives within yourself in here, which maybe means you would want a more general name like “Aligning Incentives” but I think I prefer “Mechanism Design”)
Fake Frameworks (When I first thought of this, I was thinking of people tagging their own posts. Maybe it is a little weird to have people tagging each other’s posts as fake. )
Embedded Agency (Where I am imagining this as being largely for technical work) (In particular, I personally would get more use out one big embedded agency tag than a bunch of smaller tags, since I feel like all the most interesting stuff in embedded agency cuts across tags like “decision theory”)
Something like the class including: Toward a New Technical Explanation of Technical Explanation, Embedded World Models, technical logical uncertainty work, things about dealing with the fact that Bayes is not a viable strategy for embedded agents. “Embedded World Models” “Resource Bounded Epistemics” “Embedded Epistemics” “Post-Bayesianism” I would hope the name here does not make people think it should only be for technical things.
Something like the class including: How I Lost 100 Pounds Using TDT, Humans Are Embedded Agents Too, Inner alignment in the brain, Sources of intuitions and data on AGI, things about applying AI alignment theory to human rationality and vice versa. Maybe more generally about applying results from one field to another field. “Interdisciplinary Analogies”?
These are really good.
Embedded Agency is a clear win.
Mechanism Design/Aligning Incentives seems good too. Agree there are choices about the name, and I guess scope too. Do you mean it to be material about how to align incentives but exclude related stuff of examples where incentives failed to be aligned. Would Boeing 737 MAX MCAS as an agent corrigibility failure be part of it?
”Resource Bounded Epistemics” sounds like a cool category. So does “Interdisciplinary Analogies”, or should it be “Interdisciplinary Applications”?
Anyhow, these are great. More are welcome.
Fake Frameworks, yeah, hmm. We might consider “only authors can apply these tags”, I’m not sure. Those might make sense for general “epistemic state” tags.
+1 for a Mechanism Design/Aligning Incentives tag. I think “incentive design” would be a good name for this category. This would encompass material on specification gaming, tampering, impact measures, etc. Including specific examples of misaligned incentives under this umbrella seems fine as well.
Is the “aligning incentives” tag you are interested in something AI specific or should it apply to general human institutions / social systems? I could see a case for either, but that impacts what tag names we should use.
I was thinking of an AI specific tag, it seems a bit too broad otherwise.
Maybe the tag should just be called “Frameworks” then?
(And the Frameworks page can make clear that frameworks may vary from hand-wavy intuition building to carving reality at the joints based on a high confidence underlying model.)
I am worried about “Frameworks” being too broad. Like, isn’t every model a framework of some kind?