For a while, I’ve thought that the strategy of “split the problem into a complete set of necessary sub-goals” is incomplete. It produces problem factorizations, but it’s not sufficient to produce good problem factorizations—it usually won’t cut reality at clean joints. That was my main concern with Evan’s factorization, and it also applies to all of these, but I couldn’t quite put my finger on what the problem was.
I think I can explain it now: when I say I want a factorization of alignment to “cut reality at the joints”, I think what I mean is that each subproblem should involve only a few components of the system (ideally just one).
Inner/outer alignment discussion usually assumes that our setup follows roughly the structure of today’s ML: there’s a training process before deployment, there’s a training algorithm with training data/environment and training objective, there’s a trained architecture and initial parameters. These are the basic “components” which comprise our system. There’s a lot of variety in the details, but these components are usually each good abstractions—i.e. there’s nice well-defined APIs/Markov blankets between each component.
Ideally, alignment should be broken into subproblems which each depend only on one or a few components. For instance, outer alignment would ideally be a property of only the training objective (though that requires pulling out the pointers problem part somehow). Inner alignment would ideally be a property of only the training architecture, initial parameters, and training algorithm (though that requires pulling out the Dr Nefarious issue somehow). Etc. The main thing we don’t want here is a “factorization” where some subsystems are involved in all of the subproblems, or where there’s heavy overlap.
Why is this useful? You could imagine that we want to divide the task of building an aligned AI up between several people. To avoid constant git conflicts, we want each group to design one or a few different subsystems, without too much overlap between them. Each of those subsystems is designed to solve a particular subproblem. Of course we probably won’t actually have different teams like that, but there are analogous benefits: when different subsystems solve different subproblems, we can tackle those subproblems relatively-independently, solve one without creating problems for the others. That’s the point of a problem factorization.
A few salient examples of this idea particularly relevant to the Berkeley rationalist community (which I didn’t put in the post because I don’t want them taking over everything)...First, from the discussion on Takeaways From One Year Of Lockdown, on why various Berkeley rationalist group houses locked in to very-obviously-too-paranoid lockdown rules for basically the whole year:
By the time you get to the point where maybe you should take stock and re-evaluate everything, it’s not really about “does the paranoia make sense?” it’s “do you have the spare energy and emotional skills to change your S1 attitudes to a lot of things, while the crisis is still kinda ongoing.”
(from Raemon). This one is a very strong of example of externalities from (lack of) emotional slack: if one or more people in a house lack the emotional slack to to reevaluate things, then the rules can’t be renegotiated, and everyone is locked in.
Second, on the MIRI location optimization thread, I talked a bit about the illegible benefits of living in low-density areas:
suppose one day the build-something mood strikes me, and I want to go make a trebuchet. In a lower-density place, with a few acres of land around the house, I can just go out in the backyard and do that. (I grew up low-density, and did this sort of thing pretty regularly as a teenager.) In the middle of San Francisco, the project is entirely out of the question; the only way to do it within the city at all would probably involve getting a bunch of permits and buy-in from lots of people.
This is basically about space slack, and how lots of space slack also creates a kind of social-permissibility-slack—i.e. if we have enough space, then there’s lots of things which we can just go do without having to get permission from lots of people. (The linked comment also talks about car ownership, which is a major way to create tons of mobility slack.)
Third, there’s the long-standing complaint that housing costs make the Bay Area an especially bad place for a rationalist hub. If we buy the argument from this post, then the lack of financial slack from expensive housing isn’t just an individual issue; it creates negative externalities (or at least prevents positive externalities) for rationalist groups in the area.
And taken all together, these point toward a general lack of slack in the Berkeley community, across multiple different flavors. I don’t know if this is systematic, or just three independent low-rolls. Lack of financial and space slack both clearly stem from location (and the expensive urban location probably leads to under-ownership of cars, too, which means less mobility slack). Emotional is less obvious, though it’s not hard to imagine that lack of slack in some flavors induces lack of slack in other flavors—e.g. lack of financial and space slack might lead to stress.
Important policies have so many effects that it is near impossible to keep track of them all. In addition, some effects tend to dwarf all others, so it is critical to catch every last one. (Perhaps they follow a Paretian distribution?) It follows that any quantitative analysis of policy effects tends to be seriously flawed.
I don’t think this is the right way to frame the problem.
It is true that even unimportant policies have so many effects that it is de-facto impossible to calculate them all. And it is true that one or a few effects tend to dwarf all others. But that does not mean that it’s critical to catch every last one. The effects which dwarf all others will typically be easier to notice, in some sense, precisely because they are big, dramatic, important effects. But “big/important effect” is not necessarily the same as “salient effect”, so in order for this work in practice, we have to go in looking for the big/important effects with open eyes rather than just asking the already-salient questions.
For instance, in the pot/IQ example, we can come at the problem from either “end”:
What things tend to be really important to humans, in the aggregate, and how does pot potentially impact those? Things like IQ, long-term health, monetary policy, technological development, countries coming out of poverty, etc, are “big things” in terms of what humans care about, so we should ask if pot potentially has predictable nontrivial effects on any of them.
On what things does pot have very large impact, and how much do we care? Pot probably has a big impact on things like recreational activity or how often people are sober. So, how do those things impact the things we care about most?
If people think about the problem in a principled way like this, then I expect they’ll come up with hypotheses like the pot-IQ thing. There just aren’t that many things which are highly important to humans in the aggregate, or that many things on which any given variable has a large expected effect. (Note the use of “expected effect”—variables may have lots of large effects via the butterfly effect, but that’s relevant to decision-making only insofar as we can predict the effects.)
The trick is that we have to think about the problem in a principled way from the start, not just get caught up in whatever questions other people have already brought to our attention.
[Epistemic status: highly speculative]
Smoke from California/Oregon wildfires reaching the East Coast opens up some interesting new legal/political possibilities. The smoke is way outside state borders, all the way on the other side of the country, so that puts the problem pretty squarely within federal jurisdiction. Either a federal agency could step in to force better forest management on the states, or a federal lawsuit could be brought for smoke-induced damages against California/Oregon. That would potentially make it a lot more difficult for local homeowners to block controlled burns.
Vigorous, open-ended epistemic and moral competition is hard. Neutrality and collaboration can be useful, but are always context-sensitive and provisional. They are ongoing negotiations, weighing all the different consequences and strategies. A fighting couple can’t skip past all the messy heated asymmetric conflicts with some rigid absolutes about civil discourse.
I agree with this. The intended message is not that cooperation is always the right choice, but that monstrous morals alone should not be enough to rule out cooperation. Fighting is still sometimes the best choice.
I would word the intended message as “whether or not someone shares our values is not directly relevant to whether one should cooperate with them”. Moral alignment is not directly relevant to the decision; it enters only indirectly, in reasoning about things like the need for enforcement or reputational costs. Monstrous morals should not be an immediate deal-breaker in their own right; they should weigh on the scales via trust and reputation costs, but that weight is not infinite.
I don’t really think of it as “pro-cooperation” or “anti-cooperation”; there is no “pro-cooperation” “side” which I’m trying to advocate here.
The effectiveness or ineffectiveness of MAD as a strategy is not actually relevant to whether nuclear war is or is not a zero-sum game. That’s purely a question of payoffs and preferences, not strategy.
You’re losing something real if you ally yourself with someone you’re not value-aligned with, and you’re not losing something real if you’re allying yourself with someone you are value-aligned with, but mistakenly think is your enemy. The amount of power people like you with your value has loses strength because now another group that wants to destroy you has more power.
The last sentence of this paragraph highlights the assumption: you are assuming, without argument, that the game is zero-sum. That gains in power for another group that wants to destroy you is necessarily worse for you.
This assumption fails most dramatically in the case of three or more players. For instance, in your example of the Spanish civil war, it’s entirely plausible that the anarchist-communist alliance was the anarchists’ best bet—i.e. they honestly preferred the communists over the fascists, the fascists wanted to destroy them even more than the communists, and an attempt at kingmaking was only choice the anarchists actually had the power to make. In that world, fighting everyone would have seen them lose without any chance of gains at all.
In general, the key feature of a two-player zero-sum game is that anything which is better for your opponent is necessarily worse for you, so there is no incentive to cooperate. But this cannot ever hold between all three players in a three-way game: if “better for player 1” implies both “worse for player 2″ and “worse for player 3”, then player 2 and player 3 are incentivized to cooperate against player 1. Three player games always incentivize cooperation between at least some players (except in the trivial case where there’s no interaction at all between some of the players). Likewise in games with more than three players. Two-player games are a weird special case.
That all remains true even if all three+ players hate each other and want to destroy each other.
But in a counter example, if group_A values “biscuits for all” and group_B values “all biscuits for group_B,” then group_B will find it very available and easy to think of strategies which result in biscuits for group_B and not group_A. If someone is having trouble imagining this, that may be because it’s difficult to imagine someone only wanting the cookies for themselves, so they assume the other group wouldn’t defect, because “cookies for all? What’s so controversial about that?” Except group_B fundamentally doesn’t want group_A getting their biscuits, so any attempt at cooperation is going to be a mess, because group_A has to keep double-checking to make sure group_B is really cooperating, because it’s just so intuitive to group_B not to that they’ll have trouble avoiding it. And so giving group_B power is like giving someone power when you know they’re later going to use it to hurt you and take your biscuits.
Note that, in this example, you aren’t even trying to argue that there’s no potential for mutual gains. Your actual argument is not that the game is zero-sum, but rather that there is overhead to enforcing a deal.
It’s important to flag this, because it’s exactly the sort of reasoning which is prone to motivated stopping. Overhead and lack of trust are exactly the problems which can be circumvented by clever mechanism design or clever strategies, but the mechanisms/strategies are often nonobvious.
I usually don’t use paper or spreadsheet for Fermi estimates; that would make them too expensive. Also, my Fermi estimates tend to overlap heavily with big-O estimates.
When programming, I tend to keep a big-O/Fermi estimate for the runtime and memory usage in the back of my head. The big-O part of it is usually just “linear-ish” (for most standard data structure operations and loops over nested data structures), “quadratic” (for looping over pairs), “cubic-ish” (matrix operations), or “exponential” (in which case I usually won’t bother doing it at all). The Fermi part of it is then, roughly, how big a data structure can I run this on while still getting reasonable runtime? Assume ~1B ops per second, so for linear-ish I can use a data structure with ~1B entries, for cubic-ish ~1k entries, for exponential ~30 entries.
This obviously steers algorithm/design choice, but more importantly it steers debugging. If I’m doing a loop which should be linear-ish over a data structure with ~1M elements, and it’s taking more than a second, then something is wrong. Examples where this comes up:
scikit implementations of ML algorithms—twice I found that they were using quadratic algorithms for things which should have been linear. Eventually I gave up on scikit, since it was so consistently terrible.
SQL queries in large codebases. Often, some column needs an index, or the query optimizer fails to use an existing index for a complicated query, and this makes queries which should be linear instead quadratic. In my experience, this is one of the most common causes of performance problems in day-to-day software engineering.
Aside from programming, it’s also useful when using other peoples’ software. If the software is taking visible amounts of time to do something which I know should be linear, then the software is buggy, and I should maybe look for a substitute or a setting which can fix the problem.
I also do a lot of Fermi estimates when researching a topic or making a model. Often these estimates calculate what a physicist would call “dimensionless quantitites”—we take some number, and express it in terms of some related number with the same units. For instance:
If I’m reading about government expenditures or taxes, I usually want it as a fraction of GDP.
When looking at results from a linear regression, the coefficients aren’t very informative, but the correlation is. It’s essentially a dimensionless regression coefficient, and gives a good idea of effect size.
Biological examples (the bionumbers book is great for this sort of thing):
When thinking about reaction rates or turnover of proteins/cells, it’s useful to calculate a half-life. This is the rough timescale on which the reaction/cell count will equilibrate. (And when there are many steps in a pathway, the slowest half-life typically controls the timescale for the whole pathway, so this helps us narrow in on the most important part.)
When thinking about sizes or distances in a cell, it’s useful to compare them to the size of a typical cell.
When thinking about concentrations, it’s useful to calculate number of molecules per cell. In general, there’s noise of order sqrt(molecule count), which is a large fraction of the total count when the count is low.
On the moon, you can get to orbit by building a maglev and just accelerating up to orbital speed. How long does the track need to be, assuming we limit the acceleration (to avoid pancaking an passengers)? Turns out, if we limit the acceleration to n times the surface gravity, then the distance needs to be 1/n times the radius of the moon. That’s the sort of clean intuitive result we hope for from dimensionless quantities.
In general, the trigger for these is something like “see a quantity for which you have no intuition/poor intuition”, and the action is “express it relative to some characteristic parameter of the system”.
Good example. I’ll use this to talk about what I think is the right way to think about this.
First things first: true zero-sum games are ridiculously rare in the real world. There’s always some way to achieve mutual gains—even if it’s just “avoid mutual losses” (as in e.g. mutual assured destruction). Of course, that does not mean that an enemy can be trusted to keep a deal. As with any deal, it’s not a good deal if we don’t expect the enemy to keep it.
The mutual gains do have to be real in order for “working with monsters” to make sense.
That said… I think people tend to have a gut-level desire to not work with monsters. This cashes out as motivated stopping: someone thinks “ah, but I can’t really trust the enemy to uphold their end of the deal, can I?”… and they use that as an excuse to not make any deal at all, without actually considering (a) whether there is actually any evidence that the enemy is likely to break the deal (e.g. track record), (b) whether it would actually be in the enemy’s interest to break the deal, or (c) whether the deal can be structured so that the enemy has no incentive to break it. People just sort of horns-effect, and assume the Bad Person will of course break a deal because that would be Bad.
(There’s a similar thing with reputational effects, which I expect someone will also bring up at some point. Reputational effects are real and need to be taken into consideration when thinking about whether a deal is actually net-positive-expected-value. But I think people tend to say “ah, but dealing with this person will ruin my reputation”… then use that as an excuse to not make a deal, without considering (a) how highly-visible/salient this deal actually is to others, (b) how much reputational damage is actually likely, or (c) whether the deal can plausibly be kept secret.)
A good explanation of Yoneda is indeed the third planned post… assuming I eventually manage to understand it well enough to write that post.
The examples in the post where authors disagreed heavily about the sign of the effect (school → pregnancy and immigration → social policy support) are both questions where I’d expect, a priori, to find small-and-inconsistent effect sizes. And if we ignore “statistical significance” and look at effect sizes in the graphs, it indeed looks like almost all the researchers on those questions found tiny effects—plus or minus 0.02 for the first, or 0.05 for the second. (Unclear what the units are on those, so maybe I’m wrong about the effects being small, but I’m guessing it’s some kind of standardized effect size.) The statistical significance or sign of the effect isn’t all that relevant—the important part is that almost all researchers agree the effect is tiny.
On the flip side, for the soccer study, the effect sizes are reasonably large. Assuming I’m reading that graph right, the large majority of researchers find that dark-skinned players are ~30% more likely to get a red-card. There’s still a wide range of estimates, but the researchers mostly agree that the effect is large, and they mostly agree on the direction.
So I don’t think it really makes sense to interpret these as “many results”. The takeaway is not “analysis has too many degrees of freedom for results to replicate”. The takeaway is “statistical significance by itself sucks, look at effect sizes”. It’s effect sizes which reproduce, and it’s effect sizes which matter anyway for most practical purposes (as opposed to just getting papers published). For the most part, the teams only disagree on whether numbers which are basically 0 are +0 or −0.
Simple “policy” proposal: fire vigilante. Someone (who takes pains to keep their identity secret) goes around lighting fires at places/times where they’re likely to be relatively-less-bad—e.g. maybe there’s a big rainstorm coming in a couple days which is likely to keep the fire under control. (That’s just spitballing, I don’t really know what the main determinants are of fire controllability.)
Main advantage of this proposal: can be unilaterally implemented. Requires dealing with zero institutional bullshit, zero broken metaincentives, zero coordination problems, zero politics, etc. Essentially no social points-of-failure; the problems-to-be-solved are entirely physical. That also means it could be implemented by a small team or even an individual. I would give it a dramatically higher chance of success than any political approach.
Good question. No.
For the second case, each data point can measure something different, possibly correlated with each other, and related in different ways to the parameters we’re trying to estimate. For instance, maybe we’re trying to estimate some parameters of a car, so we measure the wheel sizes, axle length, number of gears, engine cylinder volume, etc, etc. Every now and then we measure something which gives us a totally different “kind” of information from the other things we measured—something which forces a non-exponential-family update. When that happens, we have to add a new summary component. Eventually other data points may measure the same “kind” of information and also contribute to that component of the summary. But over time, it becomes more and more rare to measure something which no other data point has captured before, so we add summary dimensions more and more slowly.
(Side note: there’s no reason I know why O(log n) growth would be special here; the qualitative story would be similar for any sub-linear summary growth.)
I love this.
The theorems here should apply directly to that situation. The summary will eventually be lower-dimensional than the data, and that’s all we need for these theorems to apply. At any data size n sufficiently large that the O(log n) summary dimension is smaller than the data dimension, the distribution of those n data points must be exponential family.
The construction is correct.
Note that for M2, conceptually we don’t need to modify it, we just need to use the original M2 but apply it only to the subcomponents of the new X-variable which correspond to the original X-variable. Alternatively, we can take the approach you do: construct M′2 which has a distribution over the new X, but “doesn’t say anything” about the new components, i.e. the it’s just maxentropic over the new components. This is equivalent to ignoring the new components altogether.