The original reasoning that Eliezer gave if I remember correctly was that it’s better to make people realize there are unknown unknowns, rather than taking one specific strategy and saying “oh, I know how I would have stopped that particular strategy”
Another reason for such a rule could be to allow the use of basilisk-like threats and other infohazards without worrying about them convincing others beyond the gatekeeper.
That said, @datawitch @ra I’m interested in reading the logs if you’d allow.
The original reasoning that Eliezer gave if I remember correctly was that it’s better to make people realize there are unknown unknowns, rather than taking one specific strategy and saying “oh, I know how I would have stopped that particular strategy”
Yes, this was Eliezer’s reasoning and both me and Ra ended up keeping the rule unchanged.
Another reason for such a rule could be to allow the use of basilisk-like threats and other infohazards without worrying about them convincing others beyond the gatekeeper.
That said, @datawitch @ra I’m interested in reading the logs if you’d allow.