Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 31 Jan 2025 15:33 UTC
3 points
0
Deepseek-r1 seems to explore diverse areas of thought space, frequently using “Wait” and “Alternatively” to abandon current thought and do something else
Given a deepseek-r1 CoT, it should be possible to distill this into an “idealized reconstruction” containing only the salient parts.
C.f Daniel Kokotajlo’s shoggoth + face idea
C.f. the “historical” vs “rational reconstruction” Shieber writing style
- Daniel Tan 31 Jan 2025 15:35 UTC
  2 points
  0
  Parent
  What if the correct way to do safety post training is to train a different aligned model on top (the face) instead of directly trying to align the base model?