publishing alignment research and exfohazards

Link post

(edit: i mean exfohazard, not infohazard)

to me, turning my thoughts into posts that i then publish on my blog and sometimes lesswrong serves the following purposes:

  • in conversations, i can easily link to a post of mine rather than explaining myself again (the original primary purpose of this blog!)

  • having a more formally written-down version of my thoughts helps me think about them more clearly

  • future posts — whether written by me or others — can link to my posts, contributing to a web of related ideas

  • i can get feedback on my ideas, whether it be through comments on lesswrong or responses on discord

however, i’ve come to increasingly want to write and publish posts which i’ve determined — either on my own or with the advice of a trusted peers — to be potentially infohazardous, notably with regards to potentially helping AI capability progress.

on one hand, there is no post of mine i wouldn’t trust, say, yudkowsky reading; on the other i can’t just, like, DM him and everyone else i trust a link to an unlisted post every time i make one.

it would be nice to have a platform — or maybe a lesswrong feature — which lets me choose which persons or groups can read a post, with maybe a little ⚠ sign next to its title.

note that such a platform/​feature would need something more complex than just a binary “trusted” flag: just because i can make a post that the Important People can read, doesn’t mean i should be trusted to read everything else that they can read; and there might be people whom i trust to read some posts of mine but not others.

maybe trusted recipients could be grouped by orgs — such as “i trust MIRI” or “i trust The Standard List Of Trusted Persons”. maybe something like the ability to post on the alignment forum is a reasonable proxy for “trustable person”?

i am aware that this seems hard to figure out, let alone implement. perhaps there is a much easier alternative i’m not thinking about; for the moment, i’ll just stick to making unlisted posts and sending them to the very small intersection of people i trust with infohazards and people for whom it’s socially acceptable for me to DM links to new posts of mine.