johnswentworth comments on Alignment By Default

johnswentworth 15 Aug 2020 18:27 UTC
LW: 6 AF: 3
0
AF
This comment definitely wins the award for best comment on the post so far. Great ideas, highly relevant links.
I especially like the deliberate noise idea. That plays really nicely with natural abstractions as information-relevant-far-away: we can intentionally insert noise along particular dimensions, and see how that messes with prediction far away (either via causal propagation or via loss of information directly). As long as most of the noise inserted is not along the dimensions relevant to the high-level abstraction, denoising should be possible. So it’s very plausible that denoising autoencoders are fairly-directly incentivized to learn natural abstractions. That’ll definitely be an interesting path to pursue further.
Assuming that the denoising autoencoder objective more-or-less-directly incentivizes natural abstractions, further refinements on that setup could very plausibly turn into a useful “ease of interpretability” objective.
- John_Maxwell 18 Aug 2020 9:31 UTC
  LW: 2 AF: 1
  0
  AF Parent
  
  This comment definitely wins the award for best comment on the post so far.
  
  Thanks!
  
  I don’t consider myself an expert on the unsupervised learning literature by the way, I expect there is more cool stuff to be found.