Researcher at Missing Measures
Adam Scholl
Yep, definitely! The reason why these are big tomes is IMO largely downstream of the distribution methods at the time.
What distribution differences do you mean? Kepler and Bacon lived before academic journals, but I think all the others could easily have published papers; indeed Newton, Darwin and Maxwell published many, and while Carnot didn’t many around him did, so he would have known it was an option.
It seems more likely to me that they chose to write up these ideas as books rather than papers simply because the ideas were more “book-sized” than “paper-sized,” i.e. because they were trying to discover and describe a complicated cluster of related ideas that was inferentially far from existing understanding, and this tends to be hard to do briefly.
I think that is for most forms of intellectual progress, a better way of developing both ideas and pedagogical content knowledge
It sounds like you’re imagining that the process of writing such books tends to involve a bunch of waterfall-style batching, analogous to e.g. finishing the framing in each room of a house before moving on to the flooring, or something like that? If so, I’m confused why; at least my own experience with large writing projects has involved little of this, I think, though I’m sure writing processes vary widely.
I was pretty with you until this paragraph:
In many ways Inkhaven is an application of single piece flow to the act of writing. I do not believe intellectual progress must consist of long tomes that take months or years to write. Intellectual labor should aggregate minute-by-minute with revolutionary insights aggregating from hundreds of small changes. Publishing daily moves intellectual progress much closer to single piece flow.
Of course intellectual progress doesn’t always require tomes, but I think in many fields of science, important conceptual progress has historically occurred so dominantly via tomes that they can almost be considered its unit. Take for example well-regarded tomes like Astronomia Nova, Instauratio Magna, Principia, Reflections on the Motive Power of Fire, On the Origin of Species, or A Treatise on Electricity and Magnetism—would you guess the discovery or propagation of these ideas would have been more efficient if undertaken somehow more in single piece flow-style? My sense is that tomes are just a pretty natural byproduct of ambitious, large inferential distance-crossing investigations like these.
I do think I’d feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn’t really mean to express a claim here about the overall sign of MATS; I know little about the program.
Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people’s whole careers, which is wild; I think it’s easy to underestimate how valuable this can be. But I think it’s similarly easy to underestimate their risk, given that the source of this leverage—that you’re investing relatively little time getting to know them, etc, relative to the time they’ll spend doing… something as a result—also means you have unusually limited visibility into what the effects will be.
Given this, I think it’s worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn’t made the problem known, since I roughly can’t imagine worlds where we do survive where the problem wasn’t made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won’t. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn’t what you mean exactly, and there are ways we could have made the problem known that seemed less like “freaking out”? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).
Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety [...] If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I’d guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have “paradoxical impact.”
I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it’s worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.
Yeah, certainly there are other possible forms of bias besides financial conflicts of interest; as you say, I think it’s worth trying to avoid those too.
Sure, but humanity currently has so little ability to measure or mitigate AI risk that I doubt it will be obvious in any given case that the survival of the human race is at stake, or that any given action would help. And I think even honorable humans tend to be vulnerable to rationalization amidst such ambiguity, which (as I model it) is why society generally prefers that people in positions of substantial power not have extreme conflicts of interest.
I’m going to try to make sure that my lifestyle and financial commitments continue to make me very financially comfortable both with leaving Anthropic, and with Anthropic’s equity (and also: the AI industry more broadly – I already hold various public AI-correlated stocks) losing value, but I recognize some ongoing risk of distorting incentives, here.
Why do you feel comfortable taking equity? It seems to me that one of the most basic precautions one ought ideally take when accepting a job like this (e.g. evaluating Claude’s character/constitution/spec), is to ensure you won’t personally stand to lose huge sums of money should your evaluation suggest further training or deployment is unsafe.
(You mention already holding AI-correlated stocks—I do also think it would be ideal if folks with influence over risk assessment at AGI companies divested from these generally, though I realize this is difficult given how entangled they are with the market as a whole. But I’d expect AGI company staff typically have much more influence over their own company’s value than that of others, so the COI seems much more extreme).
They typically explain where the room is located right after giving you the number, which is almost like making a memory palace entry for you. Perhaps the memory is more robust when it includes a location along with the number?
I agree AI minds might be very different, and best described with different measures. But I think we currently have little clue what those differences are, and so for now humans remain the main source of evidence we have about agents. Certainly human-applicability isn’t a necessary condition for measures of AI agency; it just seems useful as a sanity check to me, given the context that nearly all of our evidence about (non-trivial) agents so far comes from humans.
Sorry, looking again at the messiness factors fewer are about brute force than I remembered; will edit.
But they do indeed all strike me as quite narrow external validity checks, given that the validity in question is whether the benchmark predicts when AI will gain world-transforming capabilities.
“messiness” factors—factors that we expect to (1) be representative of how real-world tasks may systematically differ from our tasks
I felt very confused reading this claim in the paper. Why do you think they are representative? It seems to me that real-world problems obviously differ systematically from these factors, too—e.g., solving them often requires having novel thoughts.
I think there is more empirical evidence of robust scaling laws than of robust horizon length trends, but broadly I agree—I think it’s also quite unclear how scaling laws should constrain our expectations about timelines.
(Not sure I understand what you mean about the statistical analyses, but fwiw they focused only on very narrow checks for external validity
—mostly just on whether solutions were possible to brute force).
I agree it seems plausible that AI could accelerate progress by freeing up researcher time, but I think the case for horizon length predicting AI timelines is even weaker in such worlds. Overall I expect the benchmark would still mostly have the same problems—e.g., that the difficulty of tasks (even simple ones) is poorly described as a function of time cost; that benchmarkable proxies differ critically from their non-benchmarkable targets; that labs probably often use these benchmarks as explicit training targets, etc.—but also the additional (imo major) source of uncertainty about how much freeing up researcher time would accelerate progress.
The “Length” of “Horizons”
Fwiw, in my experience LLMs lie far more than early Wikipedia or any human I know, and in subtler and harder to detect ways. My spot checks for accuracy have been so dismal/alarming that at this point I basically only use them as search engines to find things humans have said.
I’m really excited to hear this, and wish you luck :)
My thinking benefited a lot from hanging around CFAR workshops, so for whatever it’s worth I do recommend attending them; my guess is that most people who like reading LessWrong but haven’t tried attending a workshop would come away glad they did.
I’d guess the items linked in the previous comment will suffice? Just buy one mask, two adapters and two filters and screw them together.
I live next to a liberally-polluting oil refinery so have looked into this a decent amount, and unfortunately there do not exist reasonably priced portable sensors for many (I’d guess the large majority) of toxic gasses. I haven’t looked into airplane fumes in particular, but the paper described in the WSJ article lists ~130 gasses of concern, and I expect detecting most such things at relevant thresholds would require large infrared spectroscopy installations or similar.
(I’d also guess that in most cases we don’t actually know the relevant thresholds of concern, beyond those which cause extremely obvious/severe acute effects; for gasses I’ve researched, the literature on sub-lethal toxicity is depressingly scant, I think partly because many gasses are hard/expensive to measure, and also because you can’t easily run ethical RCTs on their effects.
I had the same thought, but hesitated to recommend it because I’ve worn a gas mask before on flights (when visiting my immunocompromised Mom), and many people around me seemed scared by it.
By my lights, half-face respirators look much less scary than full gas masks for some reason, but they generally have a different type of filter connection (“bayonet”) than the NATO-standard 40mm connection for gas cartridges. It looks like there are adapters, though, so perhaps one could make a less scary version this way? (E.g. to use a mask like this with filters like these).
I am interested in debating the principle here (e.g. whether it sometimes makes sense to write books, whether/why most scientific progress so far has involved writing books, etc), but I feel less interested in debating your gut take on the tradeoffs Aysja and I are making personally, since I expect you know nearly nothing about what those are? Most obviously, the dominant term has been illness rather than choices, but I expect you also have near-zero context on the choices, which we have spent really a lot of time and effort considering. I would… I guess be up for describing those in person, if you want.