Man, this is a tough question. Evaluating the quality of research in the field is already a tough problem that everybody disagrees on, and as a result people disagree on what sort of people are well-suited to the work. Evaluating it for yourself without already being an expert in the field is even harder. With that in mind, I’ll give an answer which I think a reasonably-broad chunk of people would agree with, but with the caveat that it is very very incomplete.
I had a chat with Evan Hubinger a few weeks ago where we were speculating on how our evaluations of grant applications would compare. (I generally don’t evaluate grant applications, but Evan does.) We have very different views on what-matters-most in alignment, and agreed that our rankings would probably differ a lot. But we think we’d probably mostly agree on the binary cutoff—i.e. which applications are good enough to get funding at all. That’s because at the moment, money is abundant enough that it makes sense to invest in projects based on views which I think are probably wrong but at least have some plausible model under which they could be valuable. If there’s a project where Evan would assign it high value, and Evan’s model is itself a model-which-I-think-is-probably-wrong-but-still-plausible, then that’s enough to merit a grant. (It’s a hits-based grantmaking model.) Likewise, I’d expect Evan to view things-I’d-consider-high-value in a similar way.
Assuming that speculation is correct, the main grants which would not be funded are those which (as far as the grant evaluator can tell) don’t have any plausible model under which they’d be valuable. Thus the importance of building your own understanding of the whole high-level problem and answering the Hamming Questions: if you can do that, then you have a model under which your research will be valuable, and all that’s left is to communicate that model and your plan.
Now back to your perspective. You’re already hanging around and commenting on LessWrong, so right out the gate I have a somewhat-higher-than-default prior that you can evaluate the “some model under which the research is valuable” criterion. You’re likely to already have the concepts of Bottom Line and Trying to Try and so forth (even if you haven’t read those exact posts); you probably already have some intuition for the difference between a plan designed to actually-do-the-thing, versus a plan designed to look-like-it’s-doing-the-thing or to look-like-it’s-trying-to-do-the-thing. That doesn’t mean you already have enough of a model of the alignment/agency problems or a promising thread to tackle them, but hopefully you can at least tell if and when you do have those things.
Based on your comment, I’m more motivated to just sit down and (actually) try to solve AI Safety for X weeks, write up my results and do an application. What is your 95% confidence interval for what X needs to be to reduce the odds of a false negative (i.e. my grant gets rejected but shouldn’t have been) to a single digit?
I’m thinking of doing maybe 8 weeks. Maybe more if I can fall back on research engineering so that I haven’t wasted my time completely.
My main modification to that plan would be “writing up your process is more important than writing up your results”; I think that makes a false negative much less likely.
8 weeks seems like it’s on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:
If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on them, 8 weeks is probably sufficient
If your goal is to see whether you have any large insights or make any significant progress, that usually happens for me on a timescale of ~3 months
It sounds like you want to do something closer to the latter, so 12-16 weeks is probably more appropriate?
Man, this is a tough question. Evaluating the quality of research in the field is already a tough problem that everybody disagrees on, and as a result people disagree on what sort of people are well-suited to the work. Evaluating it for yourself without already being an expert in the field is even harder. With that in mind, I’ll give an answer which I think a reasonably-broad chunk of people would agree with, but with the caveat that it is very very incomplete.
I had a chat with Evan Hubinger a few weeks ago where we were speculating on how our evaluations of grant applications would compare. (I generally don’t evaluate grant applications, but Evan does.) We have very different views on what-matters-most in alignment, and agreed that our rankings would probably differ a lot. But we think we’d probably mostly agree on the binary cutoff—i.e. which applications are good enough to get funding at all. That’s because at the moment, money is abundant enough that it makes sense to invest in projects based on views which I think are probably wrong but at least have some plausible model under which they could be valuable. If there’s a project where Evan would assign it high value, and Evan’s model is itself a model-which-I-think-is-probably-wrong-but-still-plausible, then that’s enough to merit a grant. (It’s a hits-based grantmaking model.) Likewise, I’d expect Evan to view things-I’d-consider-high-value in a similar way.
Assuming that speculation is correct, the main grants which would not be funded are those which (as far as the grant evaluator can tell) don’t have any plausible model under which they’d be valuable. Thus the importance of building your own understanding of the whole high-level problem and answering the Hamming Questions: if you can do that, then you have a model under which your research will be valuable, and all that’s left is to communicate that model and your plan.
Now back to your perspective. You’re already hanging around and commenting on LessWrong, so right out the gate I have a somewhat-higher-than-default prior that you can evaluate the “some model under which the research is valuable” criterion. You’re likely to already have the concepts of Bottom Line and Trying to Try and so forth (even if you haven’t read those exact posts); you probably already have some intuition for the difference between a plan designed to actually-do-the-thing, versus a plan designed to look-like-it’s-doing-the-thing or to look-like-it’s-trying-to-do-the-thing. That doesn’t mean you already have enough of a model of the alignment/agency problems or a promising thread to tackle them, but hopefully you can at least tell if and when you do have those things.
Based on your comment, I’m more motivated to just sit down and (actually) try to solve AI Safety for X weeks, write up my results and do an application. What is your 95% confidence interval for what X needs to be to reduce the odds of a false negative (i.e. my grant gets rejected but shouldn’t have been) to a single digit?
I’m thinking of doing maybe 8 weeks. Maybe more if I can fall back on research engineering so that I haven’t wasted my time completely.
My main modification to that plan would be “writing up your process is more important than writing up your results”; I think that makes a false negative much less likely.
8 weeks seems like it’s on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:
If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on them, 8 weeks is probably sufficient
If your goal is to see whether you have any large insights or make any significant progress, that usually happens for me on a timescale of ~3 months
It sounds like you want to do something closer to the latter, so 12-16 weeks is probably more appropriate?