Some examples of justifications I have given to myself are “You’re so new to this, this is not going to have any real impact anyway”,
I think this argument is just clearly correct among people new to the field—thinking that your work may be relevant to alignment is motivating and exciting and represents the path to eventually doing useful things, but it’s also very likely to be wrong. Being repeatedly wrong is what improvement feels like!
People new to the field tend to wildly overthink the harms of publishing, in a way that increases their anxiety and makes them much more likely to bounce off. This is a bad dynamic, and I wish people would stop promoting it
As someone who is quite concerned about the AI Alignment field having had a major negative impact via accelerating AI capabilities, I also agree with this. It’s really quite unlikely for your first pieces of research to make a huge difference. I think the key people who I am worried will drive forward capabilities are people who have been in the field for quite a while and have found traction on the broader AGI problems and questions (as well as people directly aiming towards accelerating capabilities, though the worry there is somewhat different in nature).
It’s fine to make the mistake of publishing something if the mistake you made was assuming “this is great research”, but if the mistake was “this is safe to publish because I’m new to research”, the consequences can be irreversible. I probably fall into the category of ‘wildly overthinking the harms of publishing due to inexperience’, but it seems to me like a simple assessment using the ABC model I outlined in the post should take only a few minutes and could quickly inform someone of whether or not they might want to show their research to someone more experienced before publishing.
I am personally having this dilemma. I have something I want to publish, but I’m unsure of whether I should listen to the voice telling me “you’re so new to this, this is not going to have any real impact anyway” or the voice that’s telling me “if it does have some impact or was hypothetically implemented in a generally intelligent system this could reduce extinction risk but inflate s-risk”. It was a difficult decision, but I decided I would rather show someone more experienced, which is what I am doing currently. This post was intended to be a summary of why/how I converged upon that decision.
but it seems to me like a simple assessment using the ABC model I outlined in the post should take only a few minutes
Empirically, many people new to the field get very paralysed and anxious about fears of doing accidental harm, in a way that I believe has significant costs. I haven’t fully followed the specific model you outline, but it seems to involve ridiculously hard questions around the downstream consequences of your work, which I struggle to robustly apply to my work (indirect effects are really hard man!). Ditto, telling someone that they need to ask someone more experienced to sanity check can have significant costs in terms of social anxiety (I personally sure would publish fewer blog posts if I felt a need to run each one by someone like Chris Olah first!)
Having significant costs doesn’t mean that doing this is bad, per se, but there needs to be major benefits to match these costs, and I’m just incredibly unconvinced that people’s first research projects meet these. Maybe if you’ve gotten a bunch of feedback from more experienced people that your work is awesome? But also, if you’re in that situation, then you can probably ask them whether they’re concerned.
It’s fine to make the mistake of publishing something if the mistake you made was assuming “this is great research”, but if the mistake was “this is safe to publish because I’m new to research”, the consequences can be irreversible.
“Irreversible consequences” is not that huge of a deal. The consequences of writing almost any internet comment are irreversible. I feel like you need to argue for also the expected magnitude of the consequences being large, instead of them just being irreversible.
I agree with this sentiment in response to the question of “will this research impact capabilities more than it will alignment?”, but not in response to the question of “will this research (if implemented) elevate s-risks?”. Partial alignment inflating s-risk is something I am seriously worried about, and prosaic solutions especially could lead to a situation like this.
If your research not influencing s-risks negatively is dependent on it not being implemented, and you think that it your research is good enough to post about, don’t you see the dilemma here?
I think this argument is just clearly correct among people new to the field—thinking that your work may be relevant to alignment is motivating and exciting and represents the path to eventually doing useful things, but it’s also very likely to be wrong. Being repeatedly wrong is what improvement feels like!
People new to the field tend to wildly overthink the harms of publishing, in a way that increases their anxiety and makes them much more likely to bounce off. This is a bad dynamic, and I wish people would stop promoting it
As someone who is quite concerned about the AI Alignment field having had a major negative impact via accelerating AI capabilities, I also agree with this. It’s really quite unlikely for your first pieces of research to make a huge difference. I think the key people who I am worried will drive forward capabilities are people who have been in the field for quite a while and have found traction on the broader AGI problems and questions (as well as people directly aiming towards accelerating capabilities, though the worry there is somewhat different in nature).
It’s fine to make the mistake of publishing something if the mistake you made was assuming “this is great research”, but if the mistake was “this is safe to publish because I’m new to research”, the consequences can be irreversible. I probably fall into the category of ‘wildly overthinking the harms of publishing due to inexperience’, but it seems to me like a simple assessment using the ABC model I outlined in the post should take only a few minutes and could quickly inform someone of whether or not they might want to show their research to someone more experienced before publishing.
I am personally having this dilemma. I have something I want to publish, but I’m unsure of whether I should listen to the voice telling me “you’re so new to this, this is not going to have any real impact anyway” or the voice that’s telling me “if it does have some impact or was hypothetically implemented in a generally intelligent system this could reduce extinction risk but inflate s-risk”. It was a difficult decision, but I decided I would rather show someone more experienced, which is what I am doing currently. This post was intended to be a summary of why/how I converged upon that decision.
Empirically, many people new to the field get very paralysed and anxious about fears of doing accidental harm, in a way that I believe has significant costs. I haven’t fully followed the specific model you outline, but it seems to involve ridiculously hard questions around the downstream consequences of your work, which I struggle to robustly apply to my work (indirect effects are really hard man!). Ditto, telling someone that they need to ask someone more experienced to sanity check can have significant costs in terms of social anxiety (I personally sure would publish fewer blog posts if I felt a need to run each one by someone like Chris Olah first!)
Having significant costs doesn’t mean that doing this is bad, per se, but there needs to be major benefits to match these costs, and I’m just incredibly unconvinced that people’s first research projects meet these. Maybe if you’ve gotten a bunch of feedback from more experienced people that your work is awesome? But also, if you’re in that situation, then you can probably ask them whether they’re concerned.
“Irreversible consequences” is not that huge of a deal. The consequences of writing almost any internet comment are irreversible. I feel like you need to argue for also the expected magnitude of the consequences being large, instead of them just being irreversible.
I agree with this sentiment in response to the question of “will this research impact capabilities more than it will alignment?”, but not in response to the question of “will this research (if implemented) elevate s-risks?”. Partial alignment inflating s-risk is something I am seriously worried about, and prosaic solutions especially could lead to a situation like this.
If your research not influencing s-risks negatively is dependent on it not being implemented, and you think that it your research is good enough to post about, don’t you see the dilemma here?