In the day I would be reminded of those men and women,
Brave, setting up signals across vast distances,
Considering a nameless way of living, of almost unimagined values.
Emrik
I feel like the terms for public/private beliefs are gonna crash with the fairly established terminology for independent impressions and all-things-considered beliefs (I’ve seen these referred to as “public” and “private” beliefs before, but I can’t remember the source). The idea is that sometimes you want to report your independent impressions rather than your Aumann-updated model of the world, because if everyone does the latter it can lead to double-counting of evidence and information cascades.
Information cascades develop consistently in a laboratory situation in which other incentives to go along with the crowd are minimized. Some decision sequences result in reverse cascades, where initial misrepresentative signals start a chain of incorrect [but individually rational] decisions that is not broken by more representative signals received later. - (Anderson & Holt, 1998)
I don’t want people to conflate the above socioepistemological ideas with the importantly different concepts in this post, so I prefer flagging my beliefs as “legible” or “illegible” to give a sense of how productive/educational I expect talking to me about them will be.
Bonus point: The failure mode of not admitting your own illegible/private beliefs can lead to myopic empiricism, whereby you stunt your epistemic growth by refusing to update on a large class of evidence. Severe cases often exhibit an unnatural tendency to consume academic papers over blog posts.
Yes! The way I’d like it is if LW had a “research group” feature that anyone could start, and you could post privately to your research group.
(Update: I’m less optimistic about this than I was when I wrote this comment, but I still think it seems promising.)
Multiplier effects: Delaying timelines by 1 year gives the entire alignment community an extra year to solve the problem.
This is the most and fastest I’ve updated on a single sentence as far back as I can remember. I am deeply gratefwl for learning this, and it’s definitely worth Taking Seriously. Hoping to look into it in January unless stuff gets in the way.
Have other people written about this anywhere?
I have one objection to claim 3a, however: Buying-time interventions are plausibly more heavy-tailed than alignment research in some cases because 1) the bottleneck for buying time is social influence and 2) social influence follows a power law due to preferential attachment. Luckily, the traits that make for top alignment researchers have limited (but not insignificant) overlap with the traits that make for top social influencers. So I think top alignment researchers should still not switch in most cases on the margin.
Good points, but I feel like you’re a bit biased against foxes. First of all, they’re cute (see diagram). You didn’t even mention that they’re cute, yet you claim to present a fair and balanced case? Hedgehog hogwash, I say.
Anyway, I think the skills required for forecasting vs model-building are quite different. I’m not a forecaster, but if I were, I would try to read much more and more widely so I’m not blindsided by stuff I didn’t even know that I didn’t know. Forecasting is caring more about the numbers; model-building is caring more about how the vertices link up, whatever their weights. Model-building is for generating new hypotheses that didn’t exist before; forecasting is discriminating between what already exists.
I try to build conceptual models, and afaict I get much more than 80% of the benefit from 20% of the content that’s already in my brain. There are some very general patterns I’ve thought so deeply on that they provide usefwl perspectives on new stuff I learn weekly. I’d rather learn 5 things deeply, and remember sub-patterns so well that they fire whenever I see something slightly similar, compared to 50 things so shallowly that the only time I think about them is when I see the flashcards. Knowledge not pondered upon in the shower is no knowledge at all.
I’m confused. (As in, actually confused. The following should hopefwly point at what pieces I’m missing in order to understand what you mean by a “problem” for the notion.)
Vingean agency “disappears when we look at it too closely”
I don’t really get why this would be a problem. I mean, “agency” is an abstraction, and every abstraction becomes predictably useless once you can compute the lower layer perfectly, at least if you assume compute is cheap. Balloons!
Imagine you’ve never seen a helium balloon before, and you see it slowly soaring to the sky. You could have predicted this by using a few abstractions like density of gases and Archimedes’ principle. Alternatively, if you had the resources, you could make the identical prediction (with inconsequentially higher precision) by extrapolating from the velocities and weights of all the individual molecules, and computed that the sum of forces acting on the bottom of the balloon exceeds the sum acting on the top. I don’t see how the latter being theoretically possible implies a “problem” for abstractions like “density” and “Archimedes’ principle”.
To be honest, the fact that Eliezer is being his blunt unfiltered self is why I’d like to go to him first if he offered to evaluate my impact plan re AI. Because he’s so obviously not optimising for professionalism, impressiveness, status, etc. he’s deconfounding his signal and I’m much better able to evaluate what he’s optimising for.[1] Hence why I’m much more confident that he’s actually just optimising for roughly the thing I’m also optimising for. I don’t trust anyone who isn’t optimising purely to be able to look at my plan and think “oh ok, despite being a nobody this guy has some good ideas” if that were true.
And then there’s the Graham’s Design Paradox thing. I think I’m unusually good at optimising purely, and I don’t think people who aren’t around my level or above would be able to recognise that. Obviously, he’s not the only one, but I’ve read his output the most, so I’m more confident that he’s at least one of them.
- ^
Yes, perhaps a consequentialist would be instrumentally motivated to try to optimise more for these things, but the fact that Eliezer doesn’t do that (as much) just makes it easier to understand and evaluate him.
- ^
I’m curious exactly what you meant by “first order”.
Just that the trade-off is only present if you think of “individual rationality” as “let’s forget that I’m part of a community for a moment”. All things considered, there’s just rationality, and you should do what’s optimal.
First-order: Everyone thinks that maximizing insight production means doing IDA* over idea tree. Second-order: Everyone notices that everyone will think that, so it’s no longer optimal for maximizing insights produces overall. Everyone wants to coordinate with everyone else in order to parallelize their search (assuming they care about the total sum of insights produced). You can still do something like IDA* over your sub-branches.
This may have answered some of your other questions. Assuming you care about the alignment problem being solved, maximizing your expected counterfactual thinking-contribution means you should coordinate with your research community.
And, as you note, maximizing personal credit is unaligned as a separate matter. But if we’re all motivated by credit, our coordination can break down by people defecting to grab credit.
How much should you focus on reading what other people do, vs doing your own things?
This is not yet at practical level, but: Let’s say we want to approach something like a community-wide optimal trade-off between exploring and exploiting, and we can’t trivially check what everyone else is up to. If we think the optimum is something obviously silly like “75% of researchers should Explore, and the rest should Exploit,” and I predict that 50% of researchers will follow the rule I follow, and all the uncoordinated researchers will all Exploit, then it is rational for me to randomize my decision with a coinflip.
It gets newcomblike when I can’t check, but I can still follow a mix that’s optimal given an expected number of cooperating researchers and what I predict they will predict in turn. If predictions are similar, the optimum given those predictions is a Schelling point. Of course, in the real world, if you actually had important practical strategies for optimizing community-level research strategies, you would just write it up and get everyone to coordinate that way.
I worry for people who are only reading other people’s work, like they have to “catch up” to everyone else before they have any original thoughts of their own.
You touch on many things I care about. Part (not the main part) of why I want people to prioritize searching neglected nodes more is because Einstellung is real. Once you’ve got a tool in your brain, you’re not going to know how to not use it, and it’ll be harder to think of alternatives. You want to increase your chance of attaining neglected tools and perspectives to attack long-standing open problems with. After all, if the usual tools were sufficient, why are they long-standing open problems? If you diverge from the most common learning paths early, you’re more likely to end up with a productively different perspective.
It’s too easy to misunderstand the original purpose of the question, and do work that technically satisfies it but really doesn’t do what was wanted in a broader context.
I’ve taken to calling this “bandwidth”, cf. Owen Cotton-Barratt.
Re the “Depth-first vs Breadth-first” distinction for idea development: IDA* is ok as far as a loose analogy to personally searching the idea tree goes, but I think this is another instance where there’s a (first-order) trade-off between individual epistemic rationality and social epistemology.
What matters is that someone discovers good ideas on AI alignment, not whether any given person does. As such, we can coordinate with other researchers in order to search different branches of the idea tree, and this is more like multithreaded/parallel/distributed tree search.
We want to search branches that are neglected, in our comparative advantage, and we shouldn’t be trying to maximise the chance that we personally discover the best idea. Instead, we should collectively act according to the rule that maximises the chance that someone in the community discovers the best idea. Individually, we are parallel threads of the same search algorithm.
- Are you allocated optimally in your own estimation? by 20 Aug 2022 21:19 UTC; 28 points) (EA Forum;
- 8 Jun 2022 21:37 UTC; 19 points) 's comment on Deference Culture in EA by (EA Forum;
This is one of the most important reasons why hubris is so undervalued. People mistakenly think the goal is to generate precise probability estimates for frequently-discussed hypotheses (a goal in which deference can make sense). In a common-payoff-game research community, what matters is making new leaps in model space, not converging on probabilities. We (the research community) are bottlenecked by insight-production, not marginally better forecasts or decisions. Feign hubris if you need to, but strive to install it as a defense against model-dissolving deference.
- 11 Oct 2022 6:38 UTC; 3 points) 's comment on Why defensive writing is bad for community epistemics by (EA Forum;
- 31 Aug 2022 3:57 UTC; 1 point) 's comment on Open Thread: June — September 2022 by (EA Forum;
Coming back to this a few showers later.
A “cheat” is a solution to a problem that is invariant to a wide range of specifics about how the sub-problems (e.g. “hard parts”) could be solved individually. Compared to an “honest solution”, a cheat can solve a problem with less information about the problem itself.
A b-cheat (blind) is a solution that can’t react to its environment and thus doesn’t change or adapt throughout solving each of the individual sub-problems (e.g. plot armour). An a-cheat (adaptive/perceptive) can react to information it perceives about each sub-problem, and respond accordingly.
ML is an a-cheat because even if we don’t understand the particulars of the information-processing task, we can just bonk it with an ML algorithm and it spits out a solution for us.
In order to have a hope of finding an adequate cheat code, you need to have a good grasp of at least where the hard parts are even if you’re unsure of how they can be tackled individually. And constraining your expectation over what the possible sub-problems or sub-solutions should look like will expand the range of cheats you can apply, because now they need to be invariant to a smaller space of possible scenarios.
If effort spent on constraining expectation expands the search space, then it makes sense to at least confirm that there are no fully invariant solutions at the shallow layer before you iteratively deepen and search a larger range.
This relates to Wason’s 2-4-6 problem, where if the true rule is very simple like “increasing numbers”, subjects continuously try to test for models that are much more complex before they think to check the simplest models.
This is of course because they have the reasonable expectation that the human is more likely to make up such rules, but that’s kinda the point: we’re biased to think of solutions in the human range.
Limiting case analysis is when you set one or more variables of the object you’re analysing to their extreme values. This may give rise to limiting cases that are easier to analyse and could give you greater insights about the more general thing. It assumes away an entire dimension of variability, and may therefore be easier to reason about. For example, thinking about low-bandwidth oracles (e.g. ZFP oracle) with cleverly restrained outputs may lead to general insights that could help in a wider range of cases. They’re like toy problems.
”The art of doing mathematics consists in finding that special case which contains all the germs of generality.” — David Hilbert
Multiplex case analysis is sorta the opposite, and it’s when you make as few assumptions as possible about one or more variables/dimensions of the problem while reasoning about it. While it leaves open more possibilities, it could also make the object itself more featureless, fewer patterns, easier to play with in your working memory.
One thing to realise is that it constrains the search space for cheats, because your cheat now has to be invariant to a greater space of scenarios. This might make the search easier (smaller search space), but it also requires a more powerfwl or a more perceptive/adaptive cheat. It may make it easier to explore nodes at the base of the search tree, where discoveries or eliminations could be of higher value.
This can be very usefwl for extricating yourself from a stuck perspective. When you have a specific problem, a problem with a given level of entropy, your brain tends to get stuck searching for solutions in a domain that matches the entropy of the problem. (speculative claim)It relates to one of Tversky’s experiments (I have not vetted this), where subjects were told to iteratively bet on a binary outcome (A or B), where P(A)=0.7. They got 2 money for correct and 0 for incorrect. Subjects tended to try to bet on A with frequency that matched the frequency of the outcome. Whereas the highest EV strategy is to always bet on A.
This also relates to the Inventor’s Paradox.
”The more ambitious plan may have more chances of success […] provided it is not based on a mere pretension but on some vision of the things beyond those immediately present.” ‒ Pólya
Consider the problem of adding up all the numbers from 1 to 99. You could attack this by going through 99 steps of addition like so:
Or you could take a step back and find a more general problem-solving technique (an a-cheat). Ask yourself, how do you solve all 1-iterative addition problems? You could rearrange it as:
To land on this, you likely went through the realisation that you could solve any such series with and add if is odd.
The point being that sometimes it’s easier to solve “harder” problems. This could be seen as, among other things, an argument for worst-case alignment.
- 10 Aug 2022 23:38 UTC; 8 points) 's comment on Emrik’s Quick takes by (EA Forum;
How do you account for the fact that the impact of a particular contribution to object-level alignment research can compound over time?
Let’s say I have a technical alignment idea now that is both hard to learn and very usefwl, such that every recipient of it does alignment research a little more efficiently. But it takes time before that idea disseminates across the community.
At first, only a few people bother to learn it sufficiently to understand that it’s valuable. But every person that does so adds to the total strength of the signal that tells the rest of the community that they should prioritise learning this.
Not sure if this is the right framework, but let’s say that researchers will only bother learning it if the strength of the signal hits their person-specific threshold for prioritising it.
Number of researchers are normally distributed (or something) over threshold height, and the strength of the signal starts out below the peak of the distribution.
Then (under some assumptions about the strength of individual signals and the distribution of threshold height), every learner that adds to the signal will, at first, attract more than one learner that adds to the signal, until the signal passes the peak of the distribution and the idea reaches satiation/fixation in the community.
If something like the above model is correct, then the impact of alignment research plausibly goes down over time.
But the same is true of a lot of time-buying work (like outreach). I don’t know how to balance this, but I am now a little more skeptical of the relative value of buying time.
Importantly, this is not the same as “outreach”. Strong technical alignment ideas are most likely incompatible with almost everyone outside the community, so the idea doesn’t increase the number of people working on alignment.
Would be cool if LessWrong hosted subforums/bubbles/research-groups for anyone who wanted to start one and invite their friends. You would have the ability to write a post only to your bubble (visible on your bubble’s frontpage or a private filter to the main frontpage) or choose to crosspost it to main as well. Having the bubbles be on LW provides them a little prestige boost and could stimulate some folk to initiate new research covens for alignment or whatever (or *cough* social epistemology research bubble maybe).
You could also have the option to filter karma so you only see the karma assigned by people in your bubble. Or, just like you can subscribe to get notified when people post, you could “subscribe” to prioritise their karma too. You could make a custom karma-filter individual to you by subscribing to people or groups whose opinions you trust. And the individual-filtered karma could be transitive as well, according to some parameters you set yourself—similar to plex&co’s EigenTrust project except it’d be EigenKarma. There’s more cool stuff here, but I’m probably never going to actually finish a post about it, so better suggest it briefly to someone than not suggest it at all.
OK, done daydreaming. Back to work.
- 31 Oct 2022 18:42 UTC; 14 points) 's comment on publishing alignment research and exfohazards by (
- 1 Nov 2022 15:56 UTC; 11 points) 's comment on Draft Amnesty Day: an event we might run on the Forum by (EA Forum;
- 17 Nov 2022 22:47 UTC; 2 points) 's comment on Elliot Temple’s Quick takes by (EA Forum;
“I can move my mind so it is as though I’ve never seen a water bottle before”
I liken this to one of my favourite concepts, shoshin—”a beginner’s mind”. Entering a state of shoshin requires perceptual dexterity.
One of the problems it tries to overcome, and which you describe in different words, is the Einstellung effect—when your perception of a problem is stuck in some way. And that’s one of the reasons perceptual dexterity is so important in original research (and especially math & philosophy).
This is a great parable. I’m often mildly reluctant to talk about some of my pre-formal ideas in case it gets finished up proper by others and I counterfactually lose social credit. I usually do it anyway, especially for stuff I don’t plan on “finishing up”. But I can see how this reluctance is like heavy molasses poured all over a research community, and it makes us much less effective.
In my experience, the “finishing stage” of making an idea precise enough to be presented is not where the germs of generality are—the parts of ideas that can be used to build other ideas with in a compounding fashion.[1] If I’m just researching or working on something in order to build up a repertoire of tools in order to personally use them for other problems, then I don’t need to go through the expensive “finishing” stage of making the infrastructure for all the middle steps legible to others.
There’s an essay by fields medalist William Thurston[2] with several related points, but it’s worth reading in its entirety.
“First I will discuss briefly the theory of foliations, which was my first subject, starting when I was a graduate student. (It doesn’t matter here whether you know what foliations are.)
At that time, foliations had become a big center of attention among geometric topologists, dynamical systems people, and differential geometers. I fairly rapidly proved some dramatic theorems. I proved a classification theorem for foliations, giving a necessary and sufficient condition for a manifold to admit a foliation. I proved a number of other significant theorems. I wrote respectable papers and published at least the most important theorems. It was hard to find the time to write to keep up with what I could prove, and I built up a backlog.
An interesting phenomenon occurred. Within a couple of years, a dramatic evacuation of the field started to take place. I heard from a number of mathematicians that they were giving or receiving advice not to go into foliations—they were saying that Thurston was cleaning it out. People told me (not as a complaint, but as a compliment) that I was killing the field. Graduate students stopped studying foliations, and fairly soon, I turned to other interests as well.
… When I started working on foliations, I had the conception that what people wanted was to know the answers. I thought that what they sought was a collection of powerful proven theorems that might be applied to answer further mathematical questions. But that’s only one part of the story. More than the knowledge, people want personal understanding. And in our credit-driven system, they also want and need theorem-credits.
… I’ll skip ahead a few years, to the subject that Jaffe and Quinn alluded to, when I began studying 3-dimensional manifolds and their relationship to hyperbolic geometry.
… In reaction to my experience with foliations and in response to social pressures, I concentrated most of my attention on developing and presenting the infrastructure in what I wrote and in what I talked to people about
… There has been and there continues to be a great deal of thriving mathematical activity. By concentrating on building the infrastructure and explaining and publishing definitions and ways of thinking but being slow in stating or in publishing proofs of all the “theorems” I knew how to prove, I left room for many other people to pick up credit. There has been room for people to discover and publish other proofs of the geometrization theorem.
In this episode (which still continues) I think I have managed to avoid the two worst possible outcomes: either for me not to let on that I discovered what I discovered and proved what I proved, keeping it to myself (perhaps with the hope of proving the Poincare conjecture), or for me to present an unassailable and hard-to-learn theory with no practitioners to keep it alive and to make it grow.
(...) I think that what I have done has not maximized my “credits”. I have been in a position not to feel a strong need to compete for more credits. Indeed, I began to feel strong challenges from other things besides proving new theorems. I do think that my actions have done well in stimulating mathematics.”
Thurston was a Togo.
I’d be very sceptical of applying something like this on experts in a rich-domain/somewhat-pre-paradigmatic field like, say, conceptual alignment. Their expertise is their particular set of tools. And in a rich domain like this, there are likely to be many other tools that lets you work on the problems productively. Even if you concluded that the paradigmatic tools seem most suited for the problems, you may still wish to maximise the chance that you’ll end up with a productively different set of tools, just because they allow you to pursue a neglected angle of attack. If you look overmuch to how experts are doing it, you’ll Einstellung yourself into their paradigm and end up hacking at an area of the wall that’s proven to be very sturdy indeed.
A concrete suggestion for a buying-time intervention is to develop plans and coordination mechanisms (e.g. assurance contracts) for major AI actors/labs to agree to pay a fixed percentage alignment tax (in terms of compute) conditional on other actors also paying that percentage. I think it’s highly unlikely that this is new to you, but didn’t want to bystander just in case.
A second point is that there is a limited number of supercomputers that are anywhere close to the capacity of top supercomputers. The #10 most powerfwl is 0.005% as powerfwl as the #1. So it could be worth looking into facilitating coordination between them.
Perhaps one major advantage of focusing on supercomputer coordination is that the people who can make the relevant decisions[1] may not actually have any financial incentives to participate in the race for new AI systems. They have financial incentives to let companies use their hardware to train AIs, naturally, but they could be financially indifferent to how those AIs are trained.
In fact, if they can manage to coordinate it via something like assurance contract, they may have a collective incentive to demand that AIs are trained in safer alignment-tax-paying ways, because then companies have to buy more computing time for the same level of AI performance. That’s too much to hope for. The main point is just that their incentives may not have a race dynamic.
Who knows.
- ^
Maybe the relevant chain of command goes up to high government in some cases, or maybe there are key individuals or small groups who have relevant power to decide.
- ^
Hmm, I’m noticing that a surprisingly large portion of my recent creative progress can be traced down to a single “isthmus” (a key pattern that helps you connect many other patterns). It’s the trigger-action-plan of
IF you see an interesting pattern that doesn’t have a name
THEN invent a new word and make a flashcard for it
This may not sound like much, and it wouldn’t to me either if I hadn’t seen it make a profound difference.
Interesting patterns are powerups, and if you just go “huh, that’s interesting” and then move on with your life, you’re totally wasting their potential. Making a name for it makes it much more likely that you’ll be able to spontaneously see the pattern elsewhere (isthmus-passing insights). And making a flashcard for it makes sure you access it when you have different distributions of activation levels over other ideas, making it more likely that you’ll end up making synthetic (isthmus-centered) insights between them. (For this reason, I’m also strongly against the idea of dissuading people from using jargon as long as the jargon makes sense. I think people should use more jargon, even if it seems embarrassingly supercilious and perhaps intimidating to outsiders).- 10 Aug 2022 23:38 UTC; 8 points) 's comment on Emrik’s Quick takes by (EA Forum;
A “cheat” is a solution to a problem that is invariant to a wide range of scenarios for how the hard parts could be solved individually.
ML itself is a cheat. Even if we don’t understand the particulars of the information-processing task, we can just bonk it with an ML algorithm and it spits out a solution for us.
But in order to have a hope of finding an adequate cheat code, you need to have a good grasp of at least where the hard parts are even if you’re unsure about how they could be tackled individually. And constraining your expectation over what the possible subsolutions should look like expands the range of cheats you could apply, because now they need to be invariant to a smaller space of possible scenarios.[1]
Insofar as you’re saying that we can’t hope to find remotely adequate cheats unless we start with a rough understanding of what we even need to cheat over, I agree. I don’t think you’re saying that we shouldn’t be looking for cheats in the first place, but it could be interpreted that way. Yes, it has the problem that it doesn’t build upon itself as well as directly challenging the hard parts, but, realistically, I think the solution has to look like some kind of cheat.
- ^
There’s this funny dynamic where if you expand the range of plausible solutions you can search through (e.g. by constraining your expectation for what they need to be invariant to), it might become harder to locate a particular area of the search space. If effort spent on constraining expectation expands the search space, then it makes sense to at least confirm that there are no fully invariant solutions at the top layer before you iterate and search a broader range.
- ^
An “isthmus” and a “bottleneck” are opposites. An isthmus provides a narrow but essential connection between two things (landmass, associations, causal chains). A bottleneck is the same except the connection is held back by its limited bandwidth. In the case of a bottleneck, increasing its bandwidth is top priority. In the case of an isthmus, keeping it open or discovering it in the first place is top priority.
I have a habit of making up pretty words for myself to remember important concepts, so I’m calling it an “isthmus variable” when it’s the thing you need to keep mentally keep track of in order to connect input with important task-relevant parts of your network.
When you’re optimising the way you optimise something, consider that “isthmus variables” is an isthmus variable for this task.
- 21 Aug 2022 10:43 UTC; 4 points) 's comment on Emrik’s Shortform by (
- 21 Aug 2022 10:51 UTC; 2 points) 's comment on Emrik’s Quick takes by (EA Forum;
Kinda surprised you didn’t mention purpose-tracking, for while you’re trying to do a thing—any thing. Arguably the most important skill I acquired from the Sequences, and that’s a high bar.