There’s a lot of different kinds-of-value which mentorship can provide, but I’ll break it into two main classes:
Things which can-in-principle be provided by other channels, but can be accelerated by 1-on-1 mentorship.
Things for which 1-on-1 mentorship is basically the only channel.
The first class includes situations where mentorship is a direct substitute for a textbook, in the same way that a lecture is a direct substitute for a textbook. But it also includes situations where mentorship adds value, especially via feedback. A lecture or textbook only has space to warn against the most common failure-modes and explain “how to steer”, and learning to recognize failure-modes or steer “in the wild” takes practice. Similar principles apply to things which must be learned-by-doing: many mistakes will be made, many wrong turns, and without a guide, it may take a lot of time and effort to figure out the mistakes and which turns to take. A mentor can spot failure-modes as they come up, point them out (which potentially helps build recognition), point out the right direction when needed, and generally save a lot of time/effort which would otherwise be spent being stuck. A mentor still isn’t strictly necessary in these situations—one can still gain the relevant skills from a textbook or a project—but it may take longer that way.
For these use-cases, there’s a delicate balance. On the one hand, the mentee needs to explore and learn to recognize failure-cases and steer on their own, not become reliant on the mentor’s guidance. On the other hand, the mentor does need to make sure the mentee doesn’t spend too much time stuck. The socratic method is often useful here, as are the techniques of research conversation support role. Also, once a mistake has been made and then pointed out, or once the mentor has provided some steering, it’s usually worth explicitly explaining the more general pattern and how this instance fits it. (This also includes things like pointing out a different frame and then explaining how this frame works more generally—that’s a more meta kind of “steering”.)
The second class is mostly illegible knowledge/skills—things which a mentor wouldn’t explicitly notice or doesn’t know how to explain. For these, demonstration is the main channel. Feedback can be provided to some degree by demonstrating, then having the mentee try, or vice-versa. In general, it won’t be obvious exactly what the mentor is doing differently than the mentee, or how to explain what the mentor is doing differently, but the mentee will hopefully pick it up anyway, at least enough to mimic it.
Some of this I’ve written about before:
Specializing in Problems We Don’t Understand largely talks about what-and-how-to-study, and the “formal study” parts of the apprenticeship should generally follow that. Aysajan’s recent post is an example of that: it’s taking chapter 2 of Jaynes’ Logic of Science and applying it in other contexts.
Comprehensive Information Gathering exercises. Aysajan’s first non-formal-study project is to read through lists of unsolved problems on wikipedia, as well as all of the course descriptions in a course catalogue from either MIT or Caltech.
Those definitely don’t cover all of it, though.
So far, other than those, we’ve mostly been kicking around smaller problems. For instance, the last couple days we were talking about general approaches for gearsy modelling in the context of a research problem Aysajan’s been working on (specifically, modelling a change in India’s farm subsidy policy). We also spent a few days on writing exercises—approximately everyone benefits from more practice in that department.
We’ve also done a few exercises to come up with Hard Problems to focus on. (“What sci-fi technologies or magic powers would you like to have?” was a particularly good one, and the lists of unsolved problems are also intended to generate ideas.) Once Aysajan has settled on ~10-20 Hard Problems to focus on (initially), those will drive the projects. You should see posts on whatever he’s working on fairly frequently.
There seem to be some steps missing in the middle here. The current outline seems to be:
Small symbolic acts of resistance
Common knowledge of resistance
An actual organization able and ready to take power after the regime collapses, whose rallying cry is “democracy!” rather than some other popular thing
An actually democratic government (i.e. not just a dictator/council whose rallying cry is “democracy!”)
A stable actually-democratic government (i.e. a majority faction or one-time election winner doesn’t just permanently lock everyone else out of the political process)
… those question marks seem to be in all the places which I’d expect to be hardest—i.e. the places where I’d expect revolutionaries to most often fail.
Live human being is indeed the harder version. I recommend the easier version first, harder version after.
The latter seems pretty hard to do, practically, with current technology, without using rockets (to at least setup an ‘efficient’ system initially).
Ah, but what specific bottlenecks make it hard? What are the barriers, and what chunking of the problem do they suggest?
Also: it’s totally fine to assume that you can use rockets for setup, and then go back and remove that assumption later if the rocket-based initial setup is itself the main bottleneck to implementation.
Word on the grapevine: it sounds like they might just be adding a bunch of parameters in a way that’s cheap to train but doesn’t actually work that well (i.e. the “mixture of experts” thing).
It would be highly entertaining if ML researchers got into an arms race on parameter count, then Goodharted on it. Sounds like exactly the sort of thing I’d expect not-very-smart funding agencies to throw lots of money at. Perhaps the Goodharting would be done by the funding agencies themselves, by just funding whichever projects say they will use the most parameters, until they end up with lots of tiny nails. (Though one does worry that the agencies will find out that we can already do infinite-parameter-count models!)
That said, I haven’t looked into it enough myself to be confident that that’s what’s happening here. I’m just raising the hypothesis from entropy.
The problem is difficult for two main reasons:
a huge fraction of the genome consists of dead transposons
assuming the model is correct, different cells will have different numbers of live transposons
The first point makes it difficult-in-general to count transposons in the genome, especially with high-throughput sequencing (HTS). HTS usually breaks the genome into small pieces, sequences them separately, then computationally reconstructs the whole thing. But if there’s many copies of similar sequence, this strategy is prone to err/uncertainty, and that’s exactly the case for all those transposon-copies.
That said, tools for reliably sequencing transposons are an active research area and progress is being made, so it will probably be cheaper in the not-too-distant future.
One way to circumvent this whole issue is to look at the amount of transposon RNA in a cell, rather than DNA. This doesn’t tell us anything about live transposon count—there could be a bunch of fresh copies which are being suppressed in a healthy cell. But it will tell us how active the transposons are right now. In practice, I expect this would mainly measure senescent cells (since they’re the only cells where I’d expect lots of transposon RNA), but that’s a hypothesis which would be useful to test.
Great comment—these were both things I thought about putting in the post, but didn’t quite fit.
Goodhart, in particular, is a huge reason to avoid relying on many bits of selection, even aside from the exponential problem. Of course we also have to be careful of Goodhart when designing training programs, but at least there we have more elbow room to iterate and examine the results, and less incentive for the trainees to hack the process.
So, one simple model which I expect to be a pretty good approximation: IQ/g-factor is a thing and is mostly not trainable, and then skills are roughly-independently-distributed after controlling for IQ.
For selection in this model, we can select for a high-g-factor group as the first step, but then we still run into the exponential problem as we try to select further within that group (since skills are conditionally independent given g-factor).
This won’t be a perfect approximation, of course, but we can improve the approximation as much as desired by adding more factors to the model. The argument for the exponential problem goes through: select first for the factors, and then the skills will be approximately-independent within that group. (And if the factors themselves are independent—as they are in many factor models—then we get the exponential problem in the first step too.)
Does training scale linearly? Does it take just twice as much time to get someone to 4 bits (top 3% in world, one in every school class) and from 4 to 8 bits (one in 1000)?
This is a good point. The exponential → linear argument is mainly for independent skills: if they’re uncorrelated in the population then they should multiply for selection; if they’re independently trained then they should add for training. (And note that these are not quite the same notion of “independent”, although they’re probably related.) It’s potentially different if we’re thinking about going from 90th to 95th percentile vs 50th to 75th percentile on one axis.
(I’ll talk about the other two points in response to Gunnar’s comment.)
Suggestion: find ways for candidates to work closely with top tier people such that it doesn’t distract those people too much.
In particular, I currently think an apprenticeship-like model is the best starting point for experiments along these lines. Eli also recently pointed out to me that this lines up well with Bloom’s two-sigma problem: one-on-one tutoring works ~two standard deviations better than basically anything else in education.
Strongly agree with this. Good explanation, too.
I won’t give any spoilers, but I recommend “how to efficiently reach orbit without using a rocket” as a fun exercise. More generally, the goal is to reach orbit in a way which does not have exponentially-large requirements in terms of materials/resources/etc. (Rockets have exponential fuel requirements; see the rocket equation.)
A (likely) counterexample is elastin: it seems to not be broken down at all in humans. So if new elastin is produced (e.g. as part of a wound-healing response), it just sticks around indefinitely.
This is in contrast to homeostatic equilibrium, which describes most things in biological systems, but not elastin.
Writers do sometimes use “accumulation”/”depletion” to refer to things in homeostatic equilibrium, but I find this terminology misleading at best, and in most cases I think the writer theirself is confused about the distinction and why it matters.
Meta-note: I think the actual argument here is decent, but using the phrase “power dynamics” will correctly cause a bunch of people to dismiss it without reading the details. “Power”, as political scientists use the term, is IMO something like a principle component which might have some statistical explanatory power, but is actively unhelpful for building gears-level models.
I would suggest instead the phrase “bargaining dynamics”, which I think points to the gearsy parts of “power” while omitting the actively-unhelpful parts.
I don’t know much about plants, other than that they’re radically different, and do all sorts of crazy shit with their transposons.
So, de Gray gave that mechanism for ROS export (which I think was one of his best contributions on the theory side of things, it was plausible and well-grounded and quite novel). It is a mechanism which can happen, although I don’t know of experimental evidence for whether it’s the main mechanism for ROS export, especially in senescent cells. And that also still leaves the question of ROS import into other cells—not so relevant for atherosclerosis, but quite relevant to the exponential acceleration of aging. Also, it leaves open the question of ROS transport between mitochondria/cytoplasm/nucleus, which is necessary to explain the DNA damage part of the senescence feedback loop.