P. comments on ELK prize results

P. 10 Mar 2022 12:56 UTC
2 points
0
Since the point of the contest is to discover a method for which there is no conceivable way it could fail, I grant you that given the uncertainty about what would really happen that counterexample suffices. But I still believe that in practice the interpreter will win. I think you are overextending your intuition, neural networks are incomprehensible to you because you are human, but activations in a network are what they are because they have been optimized so that the rest of the network can process (“understand”) them. So if a value can be written and read by a network, another one could do the same, since they are both being optimized to do it. It is only through complex cryptographic magic that we can avoid that (and maybe not even then if it turns out that IO can only create programs that are exponentially large).
- paulfchristiano 10 Mar 2022 16:06 UTC
  2 points
  0
  Parent
  I also believe that gradient descent can learn to use the activations of an obfuscated reporter (and indeed I frequently rely on some currently-plausible intuitive assumption like “gradient descent can’t obfuscate something from itself”). But this isn’t enough for these proposals to work, they need the relationship between the reporter and its obfuscated version to be quantitatively “simpler” than the relationship between the direct translator and human simulator.