lukemarks comments on The Security Mindset, S-Risk and Publishing Prosaic Alignment Research

lukemarks 23 Apr 2023 1:31 UTC
4 points
−1
It’s fine to make the mistake of publishing something if the mistake you made was assuming “this is great research”, but if the mistake was “this is safe to publish because I’m new to research”, the consequences can be irreversible. I probably fall into the category of ‘wildly overthinking the harms of publishing due to inexperience’, but it seems to me like a simple assessment using the ABC model I outlined in the post should take only a few minutes and could quickly inform someone of whether or not they might want to show their research to someone more experienced before publishing.
I am personally having this dilemma. I have something I want to publish, but I’m unsure of whether I should listen to the voice telling me “you’re so new to this, this is not going to have any real impact anyway” or the voice that’s telling me “if it does have some impact or was hypothetically implemented in a generally intelligent system this could reduce extinction risk but inflate s-risk”. It was a difficult decision, but I decided I would rather show someone more experienced, which is what I am doing currently. This post was intended to be a summary of why/how I converged upon that decision.
- Neel Nanda 24 Apr 2023 18:42 UTC
  8 points
  7
  Parent
  
  but it seems to me like a simple assessment using the ABC model I outlined in the post should take only a few minutes
  
  Empirically, many people new to the field get very paralysed and anxious about fears of doing accidental harm, in a way that I believe has significant costs. I haven’t fully followed the specific model you outline, but it seems to involve ridiculously hard questions around the downstream consequences of your work, which I struggle to robustly apply to my work (indirect effects are really hard man!). Ditto, telling someone that they need to ask someone more experienced to sanity check can have significant costs in terms of social anxiety (I personally sure would publish fewer blog posts if I felt a need to run each one by someone like Chris Olah first!)
  
  Having significant costs doesn’t mean that doing this is bad, per se, but there needs to be major benefits to match these costs, and I’m just incredibly unconvinced that people’s first research projects meet these. Maybe if you’ve gotten a bunch of feedback from more experienced people that your work is awesome? But also, if you’re in that situation, then you can probably ask them whether they’re concerned.
- habryka 23 Apr 2023 3:35 UTC
  7 points
  2
  Parent
  It’s fine to make the mistake of publishing something if the mistake you made was assuming “this is great research”, but if the mistake was “this is safe to publish because I’m new to research”, the consequences can be irreversible.
  “Irreversible consequences” is not that huge of a deal. The consequences of writing almost any internet comment are irreversible. I feel like you need to argue for also the expected magnitude of the consequences being large, instead of them just being irreversible.
  - lukemarks 23 Apr 2023 3:41 UTC
    1 point
    0
    Parent
    I agree with this sentiment in response to the question of “will this research impact capabilities more than it will alignment?”, but not in response to the question of “will this research (if implemented) elevate s-risks?”. Partial alignment inflating s-risk is something I am seriously worried about, and prosaic solutions especially could lead to a situation like this.
    
    If your research not influencing s-risks negatively is dependent on it not being implemented, and you think that it your research is good enough to post about, don’t you see the dilemma here?