I would say Corrigibility paper shares the same “feel” with certain cryptography papers. I think it is true that this feel is distinct, and not true that it means they are “not real”.
For example, what does it mean for cryptosystem to be secure? This is an important topic with impressive achievements, but it does feel different from bolts and nuts of cryptography like how to perform differential cryptanalysis. Indistinguishability under chosen plain text attack, the standard definition of semantic security in cryptography, does sound like “make up rules and then pretend they describe reality and prove results”.
In a sense, I think all math papers with focus on definitions (as opposed to proofs) feel like this. Proofs are correct but trivial, so definitions are the real contribution, but applicability of definitions to the real world seems questionable. Proof-focused papers feel different because they are about accepted definitions whose applicability to the real world is not in question.
In a sense, I think all math papers with focus on definitions (as opposed to proofs) feel like this.
I suspect one of the reasons OP feels dissatisfied about the corrigibility paper is that it is not the equivalent of Shannon’s seminal results, which generally gave the correct definition of terms, but instead merely gesturing at a problem (“we have no idea how to formalize corrigibility!”).
That being said, I resonate a lot with this part of the reply:
Proofs [in conceptual/definition papers] are correct but trivial, so definitions are the real contribution, but applicability of definitions to the real world seems questionable. Proof-focused papers feel different because they are about accepted definitions whose applicability to the real world is not in question.
I would say Corrigibility paper shares the same “feel” with certain cryptography papers. I think it is true that this feel is distinct, and not true that it means they are “not real”.
For example, what does it mean for cryptosystem to be secure? This is an important topic with impressive achievements, but it does feel different from bolts and nuts of cryptography like how to perform differential cryptanalysis. Indistinguishability under chosen plain text attack, the standard definition of semantic security in cryptography, does sound like “make up rules and then pretend they describe reality and prove results”.
In a sense, I think all math papers with focus on definitions (as opposed to proofs) feel like this. Proofs are correct but trivial, so definitions are the real contribution, but applicability of definitions to the real world seems questionable. Proof-focused papers feel different because they are about accepted definitions whose applicability to the real world is not in question.
I suspect one of the reasons OP feels dissatisfied about the corrigibility paper is that it is not the equivalent of Shannon’s seminal results, which generally gave the correct definition of terms, but instead merely gesturing at a problem (“we have no idea how to formalize corrigibility!”).
That being said, I resonate a lot with this part of the reply: