Nathan Helm-Burger comments on [missing post]

Nathan Helm-Burger 14 Jan 2023 0:18 UTC
3 points
0
Thanks for this! I continue to be excited about the possibility of making large black-box models safer by breaking them down into smaller modular components, and where possible, making those components be more interpretable distillations of the learned logic, like synthesized programs. Related post with further links in comments: https://www.lesswrong.com/posts/xERh9dkBkHLHp7Lg6/making-it-harder-for-an-agi-to-trick-us-with-stvs