Thanks for your reply! I think I basically agree with all of your points. I feel a lot of frustration around the fact that we don’t seem to have adequate infohazard policies to address this. It seems like a fundamental trade-off between security and openness/earnestness of discussion does exist though.
It could be the case that this community is not the correct place to enforce this rules, as there does still exist a substantial gap between “this thing could work” and “we have a working system”. This is doubly true in DL where implementation details matter a great deal.
My tentative heuristic for whether you should publish a post that is potentially infohazardy is “Has company-X-who-cares-mostly-about-capabilities likely thought about this already?”. It’s obviously non-trivial to answer that question but I’m pretty sure most companies who build LLMs have looked at Chinchilla and come to similar conclusions as this post. In case you’re unsure, write up the post in a google doc and ask someone who has thought more about infohazards whether they would publish it or not.
Also, I think Leon underestimates how fast a post can spread even if it is just intended for an alignment audience on LW.
Thanks for your reply! I think I basically agree with all of your points. I feel a lot of frustration around the fact that we don’t seem to have adequate infohazard policies to address this. It seems like a fundamental trade-off between security and openness/earnestness of discussion does exist though.
It could be the case that this community is not the correct place to enforce this rules, as there does still exist a substantial gap between “this thing could work” and “we have a working system”. This is doubly true in DL where implementation details matter a great deal.
My tentative heuristic for whether you should publish a post that is potentially infohazardy is “Has company-X-who-cares-mostly-about-capabilities likely thought about this already?”. It’s obviously non-trivial to answer that question but I’m pretty sure most companies who build LLMs have looked at Chinchilla and come to similar conclusions as this post. In case you’re unsure, write up the post in a google doc and ask someone who has thought more about infohazards whether they would publish it or not.
Also, I think Leon underestimates how fast a post can spread even if it is just intended for an alignment audience on LW.