I appreciate you spelling this out. I think having the concept of personal attacks here is somewhat of a distraction but is one that will be present for almost everyone, and so having someone be willing to explicitly say that this is their reaction seems helpful to me for the purpose of bringing it into explicit context that people who are doing this work are humans and thus sensitive to the various dimensions of respect and other dimensions of social evaluation of them as a person, as separate from their work. Even though I think this sensitivity is hard to avoid, I also think it can cause serious problems when someone is in fact motivated for the wrong reason and thus can’t be convinced by treating them as making a correctable mistake.
I actually am still considering applying to this program, but I’m leaning towards the conclusion that all the ideas I have for what to do in prosaic safety which are short term tractable would be net negative. I continue to think that there’s a unilateralist’s curse thing happening, where folks who are willing to do the short term research are selected for being ones who don’t have a detailed enough picture of the medium starkly-superintelligent-system-alignment research problem term to realize when they’re doing something counterproductive. That’s the thing that I mostly think is happening here. I noticed a lot of urge to go do work that would be respected when deciding whether to apply, but decided I’d rather not bet on my ideas being insufficiently capabilities-enhancing to undo the tenuous alignment benefit I think they’d provide.
I appreciate you spelling this out. I think having the concept of personal attacks here is somewhat of a distraction but is one that will be present for almost everyone, and so having someone be willing to explicitly say that this is their reaction seems helpful to me for the purpose of bringing it into explicit context that people who are doing this work are humans and thus sensitive to the various dimensions of respect and other dimensions of social evaluation of them as a person, as separate from their work. Even though I think this sensitivity is hard to avoid, I also think it can cause serious problems when someone is in fact motivated for the wrong reason and thus can’t be convinced by treating them as making a correctable mistake.
I actually am still considering applying to this program, but I’m leaning towards the conclusion that all the ideas I have for what to do in prosaic safety which are short term tractable would be net negative. I continue to think that there’s a unilateralist’s curse thing happening, where folks who are willing to do the short term research are selected for being ones who don’t have a detailed enough picture of the medium starkly-superintelligent-system-alignment research problem term to realize when they’re doing something counterproductive. That’s the thing that I mostly think is happening here. I noticed a lot of urge to go do work that would be respected when deciding whether to apply, but decided I’d rather not bet on my ideas being insufficiently capabilities-enhancing to undo the tenuous alignment benefit I think they’d provide.