(Self-review.) I oppose including this post in a Best-of-2019 collection. I stand by what I wrote, but, as with “Relevance Norms”, this was a “defensive” post; it exists as a reaction to “Meta-Honesty”’s candidacy in the 2018 Review, rather than trying to advance new material on its own terms.
The analogy between patch-resistence in AI alignment and humans finding ways to dodge the spirit of deontological rules, is very important, but not enough to carry the entire post.
A standalone canon-potential explanation of why I think we need a broader conception of honesty than avoiding individually false statements would look more like “Algorithms of Deception” (although that post didn’t do so great karma-wise; I’m not sure whether because people don’t want to read code, it was slow to get Frontpaged (as I recall), or if it’s bad for some other reason).
I intend to reply to Fiddler’s review, but likely not in a timely manner.
(Self-review.) I oppose including this post in a Best-of-2019 collection. I stand by what I wrote, but, as with “Relevance Norms”, this was a “defensive” post; it exists as a reaction to “Meta-Honesty”’s candidacy in the 2018 Review, rather than trying to advance new material on its own terms.
The analogy between patch-resistence in AI alignment and humans finding ways to dodge the spirit of deontological rules, is very important, but not enough to carry the entire post.
A standalone canon-potential explanation of why I think we need a broader conception of honesty than avoiding individually false statements would look more like “Algorithms of Deception” (although that post didn’t do so great karma-wise; I’m not sure whether because people don’t want to read code, it was slow to get Frontpaged (as I recall), or if it’s bad for some other reason).
I intend to reply to Fiddler’s review, but likely not in a timely manner.