This seems wrong to me in some important ways (at least as general theoretical research advice). Like, some of the advice you give seems to anti-predict important scientific advances.
Generally, unguided exploration is seldom that useful.
Following this advice, for instance, would suggest that Darwin not go on the Beagle, i.e., not spend five years exploring the globe (basically just for fun) as a naturalist. But his experiences on the Beagle were exactly what led him to the seeds of natural selection, as he began to notice subtleties like how animals changed ever so slightly as one moves up a continent. It also seems like it screens out a bunch of Faraday’s experimental work on electricity, much of which he did because it seemed interesting or fun, rather than backchaining from some predetermined goal. Like, he has an entire lecture series on candles, which was mostly just him over and over saying “And isn’t it weird that this thing happens, too?? What happens if we change this?” And they’re great, and a lot of that exploratory work laid the groundwork for Maxwell’s later work on electromagnetism.
Cutting off research avenues that are fun to think about, but ultimately not that productive.
Similarly, I think this is one of the main failure modes with modern scientific research. When I look at academia one of the things I’m most hoping for is that people follow their taste more, and that they have more fun! Because often things that are open-ended and fun to play around with hold a deeper kind of logic that you’re attracted to, but haven’t articulated yet. If you only stick to things that seem immediately productive then you (roughly) never find truly novel or cool ideas. E.g., both Babbage and Shannon tinkered around with different coding type projects when they were younger (cipher cracking and barbed wire telegraphs, respectively), and I think it’s not crazy to assume that this sort of playing around with representing information abstractly may have helped with their later, more ambitious projects (general computers, information theory). Also, many Nobel prize winners say they wouldn’t have been able to do their seminal in the current environment because, e.g., “Today I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough.” (Higgs). Certainly, some things are dead ends and it can be a bit hard to know that in advance, but if you prematurely screen off all of them you screen off the great ideas, too.
I think Altman puts it nicely, here: “Good ideas—actually, no, great ideas are fragile. Great ideas are easy to kill…. All the best ideas when I first heard them sound bad. And all of us, myself included, are much more affected by what other people think of us and our ideas than we like to admit. If you are just four people in your own door, and you have an idea that sounds bad but is great, you can keep that self-delusion going. If you’re in a coworking space, people laugh at you, and no one wants to be the kid picked last at recess. So you change your idea to something that sounds plausible but is never going to matter. It’s true that coworking spaces do kill off the very worst ideas, but a band-pass filter for startups is a terrible thing because they kill off the best ideas, too.” (Emphasis mine). Likewise, I think it is perhaps quite load-bearing the way that many great scientists spent significant portions of their thinking years alone (famously, Newton did this when he came up with Principia, but Darwin and Shannon too, etc.)
On timescales of days and weeks, you should be able to point to concrete examples that constitute “units of progress” towards your final goal.
This also feels pretty wrong to me. Certainly that would be nice and perhaps something to try to aim for, but I don’t think it’s always the case and I don’t think the lack of it is that strong of evidence in favor of “not making progress.” Again, using Darwin as an example—after he noticed that species were mutable he spent about a year and a half trying to figure out why. He had one main insight a few months in—that breeders introduced changes via artificial selection—but he didn’t put it together for some time the way that nature could act as a selector. And in that year between “artificial” and “natural” selection, I would not say that he was making obvious, concrete progress on the solution because the solution wasn’t made from obvious steps. He had the right questions, and he read a lot, wrote a lot, talked to breeders, etc., but mostly he just held onto his confusion for a long time. And then one day in a flash of insight, shortly after reading Malthus, the solution came to him in a carriage ride. Certainly not all research looks like this, but I do think it’s an illustrative example of how good theoretical work can come out of non-obvious units of progress.
I know at the beginning you mentioned that this is advice for a particular kind of research from your perspective, and I do think that it’s useful in certain domains. But I worry it’s easy to forget, at the end of a document with many high-level tips, that it’s not general advice on how to do good theoretical alignment work, period. And because I do think that some of this advice anti-predicts great scientific work—in particular the sort that I think alignment is currently most lacking, and the sort that would be the most helpful, were we to have it—I wanted to push back a bit on the idea that many people might walk away with, i.e., that this is general advice for theoretical work in alignment.
This seems wrong to me in some important ways (at least as general theoretical research advice). Like, some of the advice you give seems to anti-predict important scientific advances.
Following this advice, for instance, would suggest that Darwin not go on the Beagle, i.e., not spend five years exploring the globe (basically just for fun) as a naturalist. But his experiences on the Beagle were exactly what led him to the seeds of natural selection, as he began to notice subtleties like how animals changed ever so slightly as one moves up a continent. It also seems like it screens out a bunch of Faraday’s experimental work on electricity, much of which he did because it seemed interesting or fun, rather than backchaining from some predetermined goal. Like, he has an entire lecture series on candles, which was mostly just him over and over saying “And isn’t it weird that this thing happens, too?? What happens if we change this?” And they’re great, and a lot of that exploratory work laid the groundwork for Maxwell’s later work on electromagnetism.
Similarly, I think this is one of the main failure modes with modern scientific research. When I look at academia one of the things I’m most hoping for is that people follow their taste more, and that they have more fun! Because often things that are open-ended and fun to play around with hold a deeper kind of logic that you’re attracted to, but haven’t articulated yet. If you only stick to things that seem immediately productive then you (roughly) never find truly novel or cool ideas. E.g., both Babbage and Shannon tinkered around with different coding type projects when they were younger (cipher cracking and barbed wire telegraphs, respectively), and I think it’s not crazy to assume that this sort of playing around with representing information abstractly may have helped with their later, more ambitious projects (general computers, information theory). Also, many Nobel prize winners say they wouldn’t have been able to do their seminal in the current environment because, e.g., “Today I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough.” (Higgs). Certainly, some things are dead ends and it can be a bit hard to know that in advance, but if you prematurely screen off all of them you screen off the great ideas, too.
I think Altman puts it nicely, here: “Good ideas—actually, no, great ideas are fragile. Great ideas are easy to kill…. All the best ideas when I first heard them sound bad. And all of us, myself included, are much more affected by what other people think of us and our ideas than we like to admit. If you are just four people in your own door, and you have an idea that sounds bad but is great, you can keep that self-delusion going. If you’re in a coworking space, people laugh at you, and no one wants to be the kid picked last at recess. So you change your idea to something that sounds plausible but is never going to matter. It’s true that coworking spaces do kill off the very worst ideas, but a band-pass filter for startups is a terrible thing because they kill off the best ideas, too.” (Emphasis mine). Likewise, I think it is perhaps quite load-bearing the way that many great scientists spent significant portions of their thinking years alone (famously, Newton did this when he came up with Principia, but Darwin and Shannon too, etc.)
This also feels pretty wrong to me. Certainly that would be nice and perhaps something to try to aim for, but I don’t think it’s always the case and I don’t think the lack of it is that strong of evidence in favor of “not making progress.” Again, using Darwin as an example—after he noticed that species were mutable he spent about a year and a half trying to figure out why. He had one main insight a few months in—that breeders introduced changes via artificial selection—but he didn’t put it together for some time the way that nature could act as a selector. And in that year between “artificial” and “natural” selection, I would not say that he was making obvious, concrete progress on the solution because the solution wasn’t made from obvious steps. He had the right questions, and he read a lot, wrote a lot, talked to breeders, etc., but mostly he just held onto his confusion for a long time. And then one day in a flash of insight, shortly after reading Malthus, the solution came to him in a carriage ride. Certainly not all research looks like this, but I do think it’s an illustrative example of how good theoretical work can come out of non-obvious units of progress.
I know at the beginning you mentioned that this is advice for a particular kind of research from your perspective, and I do think that it’s useful in certain domains. But I worry it’s easy to forget, at the end of a document with many high-level tips, that it’s not general advice on how to do good theoretical alignment work, period. And because I do think that some of this advice anti-predicts great scientific work—in particular the sort that I think alignment is currently most lacking, and the sort that would be the most helpful, were we to have it—I wanted to push back a bit on the idea that many people might walk away with, i.e., that this is general advice for theoretical work in alignment.