[Question] If I encounter a capabilities paper that kinda spooks me, what should I do with it?

the gears to ascension3 Feb 2023 21:37 UTC

28 points

If I encounter a capabilities paper that kinda spooks me, what should I do with it? I’m inclined to share it as a draft post with some people I think should know about it. I have encountered such a paper, and I found it in a capabilities discussion group who will have no hesitation about using it to try to accumulate power for themselves, in denial about any negative effects it could have. It runs on individual computers.

edit for capabilities folks finding this later: it was a small potatoes individual paper again lol. but the approach is one I still think very promising.

What links here?

the gears to ascension's comment on Open & Welcome Thread—January 2023 by DragonGod (3 Feb 2023 21:35 UTC; 2 points)

the gears to ascension3 Feb 2023 21:37 UTC

28 points

8 comments1 min readLW link

Nathan Helm-Burger 6 Feb 2023 17:05 UTC
12 points
0
There is an organizational structure in the process of being developed explicitly for handling this. In the meantime please reach out to the EA community health team attn: ‘AGI risk landscape watch team’. https://docs.google.com/forms/d/e/1FAIpQLScJooJD0Sm2csCYgd0Is6FkpyQa3ket8IIcFzd_FcTRU7avRg/viewform

(I’ve been talking to the people involved and can assure you that I believe them to be both trustworthy and competent.)
NicholasKees 4 Feb 2023 3:49 UTC
4 points
0
I think it’s really important for everyone to always have a trusted confidant, and to go to them directly with this sort of thing first before doing anything. It is in fact a really tough question, and no one will be good at thinking about this on their own. Also, for situations that might breed a unilateralist’s curse type of thing, strongly err on the side of NOT DOING ANYTHING.
Seth Herd 3 Feb 2023 23:14 UTC
4 points
−1
I’d say share it here on LessWrong in comments on relevant articles as well as in a post. I’d say that maximizes the upside of safety oriented people knowing about it, while minimally helping to popularize it among capabilities groups.
mako yass 9 Feb 2023 8:34 UTC
1 point
0
Email it to Anthropic?
And I guess they shouldn’t say too much in return, but they should at least indicate that they understand it. If they don’t, send it to the next friendliest practical alignment researchers, and if they don’t signal understanding either, and if have to go far enough down on your list that telling the next one on the list would no longer be a net positive act, then you’ll have a major community issue to yell about.
Alexander Gietelink Oldenziel 4 Feb 2023 1:09 UTC
1 point
0
Currently there seems to be no good way to deal with this conundrum.
One wonders whether the AI alignment community should setup some sort of encrypted partial-sharing partial-transparancy protocol for these kinds of situations.

ChristianKl 3 Feb 2023 23:12 UTC
3 points
1
Do you expect that there are papers that spook you but that wouldn’t get attention if you don’t tell other people about it?
- the gears to ascension 3 Feb 2023 23:13 UTC
  9 points
  5
  Parent
  attention isn’t binary. Giving a paper more attention because I think it is very powerful could still be a spark that gets it to spread much faster, if the folks who’ve seen it don’t realize how powerful it is yet. This is extremely common; mere combinations of papers are often enough for the abstract of the obvious followup to nearly write itself in the heads of competent researchers. In general, the most competent capabilities researchers do not announce paper lists from the rooftops for this reason—they try to build the followup and then they announce that. In general I don’t think I am being watched by many high competence folks, and the ones who are, probably simply explore the internet manually the same way I do. But it’s something that I always have in mind, and occasionally I see a paper that really raises my hackles.
  - awg 3 Feb 2023 23:18 UTC
    7 points
    5
    Parent
    If it truly raises your hackles then maybe it’s worth sharing with at least one or two people who are working in safety research directly? Spreading it by ones and twos amongst people who would use the information for good (as it were) doesn’t seem too dangerous to me.