HiroSakuraba comments on AGI Safety FAQ / all-dumb-questions-allowed thread

HiroSakuraba 9 Jun 2022 17:20 UTC
7 points
0
Is anyone at MIRI or Anthropic creating diagnostic tools for monitoring neural networks? Something that could analyze for when a system has bit-flip errors versus errors of logic, and eventually evidence of deception.
- Jay Bailey 12 Jun 2022 10:59 UTC
  2 points
  0
  Parent
  Chris Olah is/was the main guy working on interpretability research, and he is a co-founder of Anthropic. So Anthropic would definitely be aware of this idea.
  - Charbel-Raphaël 13 Jun 2022 15:26 UTC
    1 point
    0
    Parent
    I’ve not seen the idea of bit flip idea before, and anthropic are quasi-alone on that, they might have missed it