I think the arguments here apply much better to the AGI alignment case than to the case of HPMOR. The structure of the post suggests (? not sure) that HPMOR is meant to be the “easier” case, the one in which the reader will assent to the arguments more readily, but it didn’t work that way on me.
In both cases, we have some sort of metric for what it would mean to succeed, and (perhaps competing) inside- and outside-view arguments for how highly we should expect to score on that metric. (More precisely, what probabilities we should assign to achieving different scores.) In both cases, this post tends to dismiss facts which involve social status as irrelevant to the outside view.
But what if our success metric depends on some facts which involve social status? Then we definitely shouldn’t ignore these facts, (even) in the inside view. And this is the situation we are in with HPMOR, at least, if perhaps less so with AGI alignment.
There are some success metrics for HPMOR mentioned in this post which can be evaluated largely without reference to status stuff (like “has it conveyed the experience of being rational to many people?”). But when specific successes—known to have been achieved in the actual world—come up, many of them are clearly related to status. If you want to know whether your fic will become one of the most reviewed HP fanfics on a fanfiction site, then it matters how it will be received by the sorts of people who review HP fanfics on those sites—including their status hierarchies. (Of course, this will be less important if we expect most of the review-posters to be people who don’t read HP fanfic normally and have found out about the story through another channel, but its importance is always nonzero, and very much so for some hypothetical scenarios.)
TBH, I don’t understand why so much of this post focuses on pure popularity metrics for HPMOR, ones that don’t capture whether it is having the intended effect on readers. (Even something like “many readers consider it the best book they’ve ever read” does not tell you much without specifying more about the readership; consider that if you were optimizing for this metric, you would have an incentive to select for readers who have read as few books as possible.)
I guess the idea may be that it is possible to surprise someone like Pat by hitting a measurable indictor of high status (because Pat thinks that’s too much of a status leap relative to the starting position), where Pat would be less surprised by HPMOR hitting idiosyncratic goals that are not common in HP fanfiction (and thus are not high status to him). But this pattern of surprise levels seems obviously correct to me! If you are trying to predict an indicator of status in a community, you should use information about the status system in that community in your inside view. (And likewise, if the indicator is unrelated to status, you may be able to ignore status information.)
In short, this post condemns using status-related facts for forecasting, even when they are relevant (because we are forecasting other status-related facts). I don’t mean the next statement as Bulverism, but as a hopefully useful hypothesis: it seems possible that the concept of status regulation has encouraged this confusion, by creating a pattern to match to (“argument involving status and the existing state of a field, to the effect that I shouldn’t expect to be capable of something”), even when some arguments matching that pattern are good arguments.
I think the arguments here apply much better to the AGI alignment case than to the case of HPMOR. The structure of the post suggests (? not sure) that HPMOR is meant to be the “easier” case, the one in which the reader will assent to the arguments more readily, but it didn’t work that way on me.
In both cases, we have some sort of metric for what it would mean to succeed, and (perhaps competing) inside- and outside-view arguments for how highly we should expect to score on that metric. (More precisely, what probabilities we should assign to achieving different scores.) In both cases, this post tends to dismiss facts which involve social status as irrelevant to the outside view.
But what if our success metric depends on some facts which involve social status? Then we definitely shouldn’t ignore these facts, (even) in the inside view. And this is the situation we are in with HPMOR, at least, if perhaps less so with AGI alignment.
There are some success metrics for HPMOR mentioned in this post which can be evaluated largely without reference to status stuff (like “has it conveyed the experience of being rational to many people?”). But when specific successes—known to have been achieved in the actual world—come up, many of them are clearly related to status. If you want to know whether your fic will become one of the most reviewed HP fanfics on a fanfiction site, then it matters how it will be received by the sorts of people who review HP fanfics on those sites—including their status hierarchies. (Of course, this will be less important if we expect most of the review-posters to be people who don’t read HP fanfic normally and have found out about the story through another channel, but its importance is always nonzero, and very much so for some hypothetical scenarios.)
TBH, I don’t understand why so much of this post focuses on pure popularity metrics for HPMOR, ones that don’t capture whether it is having the intended effect on readers. (Even something like “many readers consider it the best book they’ve ever read” does not tell you much without specifying more about the readership; consider that if you were optimizing for this metric, you would have an incentive to select for readers who have read as few books as possible.)
I guess the idea may be that it is possible to surprise someone like Pat by hitting a measurable indictor of high status (because Pat thinks that’s too much of a status leap relative to the starting position), where Pat would be less surprised by HPMOR hitting idiosyncratic goals that are not common in HP fanfiction (and thus are not high status to him). But this pattern of surprise levels seems obviously correct to me! If you are trying to predict an indicator of status in a community, you should use information about the status system in that community in your inside view. (And likewise, if the indicator is unrelated to status, you may be able to ignore status information.)
In short, this post condemns using status-related facts for forecasting, even when they are relevant (because we are forecasting other status-related facts). I don’t mean the next statement as Bulverism, but as a hopefully useful hypothesis: it seems possible that the concept of status regulation has encouraged this confusion, by creating a pattern to match to (“argument involving status and the existing state of a field, to the effect that I shouldn’t expect to be capable of something”), even when some arguments matching that pattern are good arguments.