By “upper bound”, I meant “upper bound b on the definite integral ∫bap(x)dx”. I.e., for the kind of hacky thing I’m doing here, the integral is very sensitive to the choice of bounds a,b. For example, the integral does not converge for a=0. I think all my data here should be treated as incomplete and all my calculations crude estimates at best.
I edited the original comment to say ”∞ might be a bad upper bound” for clarity.
Yeah, I think we’re in agreement, I’m just saying the phrase “upper bound” is not useful compared to eg providing various estimates for various bounds & making a table or graph, and a derivative of the results wrt the parameter estimate you inferred.
So, conduct a sensitivity analysis on the definite integral with respect to choices of integration bounds? I’m not sure this level of analysis is merited given the incomplete data and unreliable estimation methodology for the number of independent researchers. Like, I’m not even confident that the underlying distribution is a power law (instead of, say, a composite of power law and lognormal distributions, or a truncated power law), and the value of p(1) seems very sensitive to data in the vicinity, so I wouldn’t want to rely on this estimate except as a very crude first pass. I would support an investigation into the number of independent researchers in the ecosystem, which I would find useful.
I think again we are on the same page & this sounds reasonable, I just want to argue that “lower bound” and “upper bound” are less-than-informative descriptions of the uncertainty in the estimates.
I definitely think that people should not look at my estimates and say “here is a good 95% confidence interval upper bound of the number of employees in the AI safety ecosystem.” I think people should look at my estimates and say “here is a good 95% confidence interval lower bound of the number of employees in the AI safety ecosystem,” because you can just add up the names. I.e., even if there might be 10x the number of employees as I estimated, I’m at least 95% confident that there are more than my estimate obtained by just counting names (obviously excluding the 10% fudge factor).
By “upper bound”, I meant “upper bound b on the definite integral ∫bap(x)dx”. I.e., for the kind of hacky thing I’m doing here, the integral is very sensitive to the choice of bounds a,b. For example, the integral does not converge for a=0. I think all my data here should be treated as incomplete and all my calculations crude estimates at best.
I edited the original comment to say ”∞ might be a bad upper bound” for clarity.
Yeah, I think we’re in agreement, I’m just saying the phrase “upper bound” is not useful compared to eg providing various estimates for various bounds & making a table or graph, and a derivative of the results wrt the parameter estimate you inferred.
So, conduct a sensitivity analysis on the definite integral with respect to choices of integration bounds? I’m not sure this level of analysis is merited given the incomplete data and unreliable estimation methodology for the number of independent researchers. Like, I’m not even confident that the underlying distribution is a power law (instead of, say, a composite of power law and lognormal distributions, or a truncated power law), and the value of p(1) seems very sensitive to data in the vicinity, so I wouldn’t want to rely on this estimate except as a very crude first pass. I would support an investigation into the number of independent researchers in the ecosystem, which I would find useful.
I think again we are on the same page & this sounds reasonable, I just want to argue that “lower bound” and “upper bound” are less-than-informative descriptions of the uncertainty in the estimates.
I definitely think that people should not look at my estimates and say “here is a good 95% confidence interval upper bound of the number of employees in the AI safety ecosystem.” I think people should look at my estimates and say “here is a good 95% confidence interval lower bound of the number of employees in the AI safety ecosystem,” because you can just add up the names. I.e., even if there might be 10x the number of employees as I estimated, I’m at least 95% confident that there are more than my estimate obtained by just counting names (obviously excluding the 10% fudge factor).