Shared Frames Are Capital Investments in Coordination

johnswentworth23 Sep 2021 23:24 UTC

93 points

Shared Frames Are Coordination Tools

When Claude Shannon invented information theory, he was working at Bell Labs. Naturally, the new theory was applied to Bell’s telephone system quite quickly. But information theory itself did not offer any immediately-usable methods to run more phone calls through the same wires; practical compression algorithms came later. So how did information theory create immediate practical value?

What information theory offered was a new way of framing communication infrastructure problems. Historically, the content transmitted (e.g. speech) was tightly coupled to the mode of transmission (e.g. voltage oscillations matching the audio oscillations). Information theory suggested that the content and the mode of transmission could be decoupled. Any kind of data (written words, speech, video) could in-principle all be transmitted over any kind of channel (copper wire, radio, microwave, messenger pigeon) with near-optimal efficiency and arbitrarily-low noise. We could even quantify “amount of information” in a message and “information capacity” of a channel, independently of each other, in terms of “bits”—a universal way to represent information.

So information theory naturally suggests a way to break up the problem of communication infrastructure design: separate content and encoding from transmission and channel design. More generally, information theory is a frame: it suggests questions, approximations, what to pay attention to vs ignore, and ways to break up the problem/system into subproblems/subsystems.

Questions:
- What’s the information capacity of a given channel (e.g. a pair of microwave towers), and how can we increase it?
- What’s the amount of information per second in a phone call, and how can we efficiently encode it?
Approximations:
- Accurately transmitting an n-bit message through a channel requires a channel capacity of approximately n bits, regardless of the message details.
- Conversely, an n-bit capacity channel can accurately transmit a message of approximately n bits, regardless of the channel details.
What to pay attention to vs ignore:
- Ignore most interactions between channel-design and message content—e.g. we don’t need to worry about the idiosyncrasies of human speech when designing a satellite link.
- Pay attention to how many bits per second we can send through the channel (i.e. throughput).
- Pay attention to how long the bits take to get through (i.e. latency).
Ways to break up the problem/system into subproblems/subsystems:
- Separate transmission channel design problems (e.g. the design of cable and repeaters for a transatlantic connection) from content encoding (e.g. methods to digitally encode voice signals efficiently).

Notice that these are tightly coupled: the questions and ways to break up the problem make sense mainly because the approximations are accurate and the things we ignore are in fact small/irrelevant. In the case of information theory, Shannon’s proofs guarantee that accuracy.

At a personal level, the information theory frame offers useful approximations and heuristics about where to allocate attention for communications engineering problems (as well as AI, statistical mechanics, etc). But this post will focus more on value at the group level—not frames as object-level problem-solving tools in their own right, but shared frames as tools for solving coordination problems.

Have you ever tried to work with people, but disagreed on what the key questions were? Or disagreed on what approximations you could rely on? Or disagreed on what to pay attention to vs ignore, or how to break up the problem?

Imagine, for instance, some (pre-Shannon) communication engineers designing a long-distance phone line. They need to deal with crosstalk—a phenomenon where conversations transmitted on different channels interfere, so e.g. one phone conversation might be interrupted by a snippet of another. But the engineers disagree about how to frame the problem. Maybe the subteam focused on the physical cable wants to just maximize bandwidth, and let the encoder team figure out how to use that bandwidth most effectively (a way to break up the problem). The encoder team, on the other hand, thinks the cable should be directly optimized for number of simultaneous crosstalk-free calls—which means the best cable design will depend heavily on the encoding chosen. Underlying this clash is an implicit disagreement about approximations and what to pay attention to vs ignore: if a clever encoding can achieve approximately-optimal phone-call-capacity regardless of the cable design, then the cable designers can ignore encoding and crosstalk issues. But it may take some effort to figure out that this is the key factual disagreement—on the surface, it looks like the two teams are talking past each other. They’re asking different questions, like how encoder design can leverage bandwidth or how cable design can minimize crosstalk, and it may not be obvious to either team why the other isn’t more interested in their questions.

Without agreement on these sorts of issues, it’s hard to operate productively as a group. If the encoder team wants the cable team to worry about crosstalk and the cable team wants to ignore that issue, that’s likely to lead to infighting. If the encoder is designed on the assumption that the cable will minimize crosstalk, while the cable is just optimized for bandwidth, the two subteams could end up with outright incompatible parts. In general, if different people or subgroups are paying attention to different things or breaking up the system differently or asking different questions, it will be very difficult to communicate; it will be hard for each to understand what the other is doing and why. Inferential distances will be long, and risk of miscommunication high.

Shared frames help smooth over all of these problems. Shannon’s information theory frame, for instance, suggests that the cable should be designed to maximize bitrate without regard to crosstalk, and the encoder can be designed to leverage that capacity approximately-optimally. When everyone shares that frame, they have implicit agreement on questions, approximations, what to pay attention to vs ignore, and how to break up the problem.

When everyone agrees on what questions to ask, we don’t waste time talking at cross-purposes with different goals. When everyone agrees on what approximations to make or what to pay attention to vs ignore, that shortens inferential distances and makes communication easier. And of course shared ideas about how to break up the problem/system into subproblems/subsystems directly informs how to split up work among multiple people, what each subgroup needs to do, and where/how different subgroups need to interface.

Shared frames are tools for solving coordination problems.

On the other hand, this is not a case where just reaching some agreement on a frame is more important than which frame; this is not a pure Schelling problem. If the engineers agree to ignore crosstalk when designing the cable, and it turns out that clever encoding can’t circumvent the issue, then hundreds of millions of dollars are spent on a useless cable! On the other hand, if the engineers simply don’t reach agreement at all (rather than agreeing on something wrong), then at least they know that they don’t have consensus on how to design the cable, so they can avoid wasting hundreds of millions on a cable design which does not work.

Frames can be wrong, and using the wrong frame is costly. It’s not just important to agree on a shared frame, it’s important to agree on a correct shared frame.

Figuring out useful, correct frames is itself a difficult technical problem—just like designing complex machinery or writing complex code. We can’t just pick arbitrary things to ignore, or arbitrary ways to break up the system; it takes empirical and/or mathematical evidence to figure out whether and when the implicit approximations in a frame are actually accurate, and to find frames which generalize very well. (Though of course, that doesn’t mean frame-finding is all blackboards and lab coats and PhDs; science does not require a degree, and it does not need to happen in an ivory tower or Official Research Lab.)

Quick recap:

Frames suggest questions, approximations, what to pay attention to vs ignore, and ways to break up the problem/system into subproblems/subsystems.
Shared frames serve as coordination tools: they provide common knowledge of questions, what to ignore, and ways of breaking up the problem.
Frames can be wrong, and using a wrong frame is costly, ~~even if~~ especially if everyone agrees on the frame.
Figuring out useful, correct frames is a nontrivial technical problem.

Shared Frames Are Capital Investments

Back in the industrial era, standard economic models imagined people coming up with clever new machine designs (technology) to make some kind of work more productive. Then, they invest resources up-front to actually build and deploy lots of those new machines (capital assets), and reap the reward of higher productivity over a long time span.

In our case, a frame is the technology. Like the technology of the industrial era, figuring out useful new frames is a nontrivial technical problem. It takes smarts, and often scientific/mathematical expertise—though there’s a certain flavor of natural genius which seems particularly good at figuring out useful frames (e.g. John Boyd, inventor of the OODA loop frame), just like another certain flavor of natural genius lends itself to mechanical inventions (e.g. da Vinci).

Most importantly, like other kinds of technology, a frame is ultimately an idea: once it’s figured out, it can be copied and re-used indefinitely. Machines break down, people die off, but ideas can last as long as we have use for them. And there’s no natural limit on how many people can use an idea; anyone can use an idea without excluding anyone else from using it. That’s why ideas make such a great bedrock for economic progress: they’re infinitely reusable, infinitely scalable. That all carries over to frames: there’s no natural constraint on how many people can use a frame, or for how long.

On the other hand, a frame without users is no different from a machine design languishing in obscurity on its inventor’s desk, never to be built. In order to actually be useful, a frame must be learned by people, it must be shared—and that takes effort. Learning a frame takes exposure, study, practice, lots of up-front work before we are able to actually use it. And a shared frame requires that effort from every member of the group. Shared frames are capital assets: they require up-front investment in everyone learning to use the frame, and the reward from that investment is harvested in improved ability to solve coordination problems over a long time span.

Example: Paradigms

Scientific paradigms are shared frames in a scientific field. They suggest:

What kinds of questions practitioners in the field work on, and what kind of evidence/arguments/proofs count as answers.
What approximations practitioners use on a day-to-day basis.
What practitioners pay attention to or ignore.
How practitioners break up problems/systems into subproblems/subsystems.

Why do scientists need shared frames? Consider Thomas Kuhn’s description of physical optics before Newton (pg 13):

Yet anyone examining a survey of physical optics before Newton may well conclude that, though the field’s practitioners were scientists, the net result of their activity was something less than science. Being able to take no common body of belief for granted, each writer on physical optics felt forced to build his field anew from its foundations. In doing so, his choice of supporting experiment and observation was relatively free, for there was no standard set of methods or of phenomena that every optical writer felt forced to employ and explain. Under these circumstances, the dialogue of the resulting books was often directed as much to the members of other schools as it was to nature.

This is “preparadigmatic” research—research in which no frame is yet shared by the bulk of the field.

Preparadigmatic research requires constantly explaining and justifying one’s framing choices—i.e. choices of questions, approximations, what to pay attention to vs ignore, and how to break up the problem/system. I have firsthand experience with this: I work in AI alignment, which is currently a preparadigmatic field. People disagree all the time on what questions to ask, what approximations to make, what to pay attention to vs ignore, how to break up the problem, and related issues. An alignment research agenda is at least as much about justifying framing decisions as it is about the technical content itself. “Being able to take no common body of belief for granted, each writer on [alignment] feels forced to build his field anew from its foundations.”

Even aside from the communication overhead, this makes distributed work difficult. It’s hard to pick a project which will nicely complement many other projects, since the other projects are based on wildly different frames. Even if one project finds something useful, it’s nontrivial to integrate that with other peoples’ work. There is no unifying framework to glue it all together.

A shared frame—i.e. a paradigm—allows us to circumvent all that.

When we have a paradigm, we no longer need to explain all of our framing decisions all the time. There are standard references on that, and most researchers just link to those, or take them for granted. The frame also tells us how to break up the problem, so independent groups can work on different subproblems, and the frame provides the glue to combine it all together later.

… but this takes both technology and capital investment. Technology: we need a frame which is correct. If practitioners agree on a frame which doesn’t ask some key question, or ignores some large factor, or breaks up a system in a way which ignores a large interaction between two subsystems… well, in the case of physical optics, they’d make incorrect predictions about light. In the case of AI alignment, we’d end up with AIs which do not do what we intended/wanted.

Once we have the technology, we still need the capital investment: lots of people need to adopt the frame, and that takes both work on the part of the adopters (to learn it) and on the part of the frame’s early advocates (to distill, teach, legibly demonstrate correctness of the frame, and of course argue with old diehards). That investment has to happen before we can reap the gains in the form of easier coordination; only once the frame is shared will it simplify communication and distributed work.

How Can We Produce Shared Frames?

Suppose I want to invent new frame-technology. Maybe I’m an alignment researcher who wants to develop a useful paradigm for the field. Maybe I want to solve coordination problems of a more social nature among my friends, and introducing new frames seems like a potentially-powerful tool for that. Maybe the company or organization I work at suffers from many high-value coordination problems, and I want to design frames which could relax those.

As a frame designer, what questions do I need to ask? What approximations can I make? What can I pay attention to or ignore? How can I break frame-design up into subproblems? How do I frame frame-design?

First and foremost: I know of no systematic gears-level study on the topic. We can work from first principles and think about historical examples, but ultimately this is going to be very speculative. That said, here are some potentially-useful ways to frame it.

Technology: Producing Frames

The foundation of the information theory frame is a set of theorems which guarantee that, in principle, we can ignore the details of the messages and encoding when designing the transmission channel. Or, to put it another way, a transmission channel designed to maximize bitrate without any regard to message/encoding details can achieve approximately the same capacity as a channel specially designed for e.g. phone calls. The theorems provide a rigorous mathematical basis for approximations and things to ignore—these foundations establish correctness of the frame.

The questions and ways to break up the problem suggested by information theory build on top of these foundations. Breaking up the problem into channel-design (maximizing bitrate) and encoder-design implicitly relies on the claim that near-optimal performance can still be achieved without the channel designers worrying about message/encoding details—in other words, it relies on correctness of the approximation. Similarly, questions like “What’s the bitrate of a given channel, and how can we increase it?” implicitly rely on the claim that bitrate is the “right” measurement of capacity—i.e. that a channel maximizing bitrate will also approximately maximize e.g. crosstalk-free phone calls. As long as the foundations are correct, it’s the questions and ways of breaking up the problem which provide most of the usefulness of the shared frame for coordination.

In general, a frame’s “foundations” are its approximations or things to pay attention to vs ignore. These determine the correctness of the frame: a frame is correct exactly when its approximations are accurate, the things it says to ignore are small/irrelevant, and the things it says to pay attention to are large/relevant.

Assuming the frame is correct, its usefulness comes mainly from questions (which are often implicitly subgoals) and especially ways to break up the problem/system. These implicitly rely on approximations or things to pay attention to vs ignore:

Questions are implicitly claims that we should pay attention to whatever the question is asking (and therefore ignore whatever we would have spent our marginal attention on otherwise).
Ways to break up the problem/system are implicitly claims that we can ignore interactions between subproblems/subsystems (other than whatever specific interactions are specified by our breakdown).

So, if we want to design correct and useful new frames, there are two key pieces:

Find approximations or things we can ignore (and prove them either mathematically or empirically)
Find questions or ways to break up the problem/system which leverage the correctness of the approximations or irrelevance of the things we ignore

An example: a group of early scientists might try to predict the speed of a sled on a hill, and find that controlling sled weight, sled bottom material, hill material, and angle is sufficient to achieve reproducible results. In other words, those four variables (sled weight, sled bottom material, hill material, and angle) are sufficient to (approximately) mediate the interaction between sled speed and everything else in the world—from the motions of the planets to the flaps of butterfly wings. As long as we hold those four variables fixed, the effect of the rest of the universe on the sled speed is approximately zero; we can ignore all the other interactions. That, in turn, gives us a way to break up the system: split the sled speed from the rest of the universe, with the “interface” between the two given by the controlled variables. This also suggests useful questions to ask if we’re interested in the sled speed, e.g. ask about the sled’s weight or the hill’s angle.

More generally, in a high dimensional world like ours, the core problem of science is to find mediators: sets of variables which mediate all the interaction between two chunks of the world. This is implicit in reproducibility: if an experimental outcome is reproducible by controlling X, Y, and Z, then apparently X, Y and Z mediate all the interaction between the outcome and the rest of the world. So, experimental findings of mediation or reproducibility provide an approximation or thing to ignore: holding (things we control) fixed, the interactions between (things we measure) and (rest of the universe) are approximately zero if the experiment is reproducible; we can ignore all the other interactions. That, in turn, gives us a way to break up the system: split whatever things we measured from the rest of the universe, with the “interface” between the two given by the controlled variables. This also suggests useful questions to ask, e.g. ask about values of the controlled variables. The fewer and simpler the control variables, and the more different measurables they control, the more useful the frame.

On the mathematical side, theorems serve a similar role. If conditions X, Y, and Z are sufficient to guarantee some result, then we can ignore everything except X, Y, and Z when thinking about the result. It gives us a way to break up the system: we have the result on one side, and the rest of the universe on the other side, with the conditions serving as the “interface” between the two. The frame then suggests conditions X, Y, and Z as questions to ask. The fewer and simpler the conditions, and wider the result, the more useful the frame.

Of course, both of these kinds of frames still have failure modes in practice. There may be hidden conditions in theorems, or hidden controls in experiments (i.e. things which the experimenters happened to never vary much, like the gravitational force in the lab); either of these would break correctness of the frames. Another failure mode: often a theorem or empirical mediation result offers legible, operationalized results which are not actually the things we care about. People will substitute the result for the thing they care about, often without noticing, because it’s well-defined and easy to explain… and then Goodhart or just not notice some important difference. If we’re designing a system to efficiently transport ore down a mountain by sled, then our convenient and simple frame for sled-speed may tempt us to use the sled’s speed as our primary design target and focus on weight, materials and slope—but accidentally ignore other crucial issues like the structural integrity of the sled.

Capital Investment: Turning Frames Into Shared Frames

Once we have a correct and useful frame, we still need to “deploy” it into the minds of some group of people in order to gain the coordination benefits.

To a large extent, this is a sales problem; it’s about convincing people to use our product. It doesn’t matter how correct and useful the frame is if nobody hears about it. Conversely, good salesmanship and hustle can sell a product even if the product isn’t useful. (Here’s a frame: to a first approximation, how many people buy a product depends mostly on marketing and salesmanship; we can largely ignore the product quality, as long as it’s not terrible in a highly-visible/salient way.) Presumably we still want a correct-and-useful shared frame, in order to help solve whatever object-level problem we’re interested in, but it’s mostly not the correctness and usefulness which will drive other people to adopt the frame.

That said, we can still say a couple frame-specific things, aside from the usual sales-and-marketing advice.

First: to the extent that our frame creates value by simplifying coordination problems, the value of knowing-and-using the frame increases with the number of other people who know-and-use it. If only one engineer in a group uses the information theory frame, they just end up in a confusing frame clash with everyone else. But if everyone else already uses it, then the last engineer will likely find the frame valuable to learn. In economic terms, shared frames are network goods: from a given person’s perspective, their value increases with the number of people already using them (much like a phone network). That makes it hardest to get the first few users. Ideally, we want to start with a relatively-small group of users who can get value from the frame on their own, and then expand out from there.

Second: sales are generally easier if the cost is lower. In the case of frames, most of the “cost” is the work of learning the frame. We can reduce that cost via distillation: taking complex, abstract, or technical frames, and providing short, intuitive, and fun explanations. This is especially useful for frames based in math and science, where we often don’t need all the technical details in order to apply the frame itself. For instance, this post didn’t need to explain Shannon’s proofs, just the idea that we can separate transmission channel design from content details.

I wouldn’t be surprised if there’s value to be had from taking technical frames, distilling them via compelling examples or visuals, deploying them to everyone in a small community, and then using the shared frames to solve coordination problems at the community scale. Indeed, I’d argue that a lot of the value of the sequences and today’s rationalist community came from exactly that process.

Summary

Let’s recap the key ideas:

Frames suggest questions, approximations, what to pay attention to vs ignore, and ways to break up the problem/system into subproblems/subsystems.
Shared frames serve as coordination tools: they provide common knowledge of questions, what to ignore, and ways of breaking up the problem.
Frames can be wrong, and using a wrong frame is costly, ~~even if~~ especially if everyone agrees on the frame.
Figuring out useful, correct frames is a nontrivial technical problem.
In economic terms
- A frame is a technology: it’s a useful idea or piece of information
- A shared frame is a capital investment: it takes some up-front cost to deploy the frame to many people, and we make back the investment over time from simplification of coordination problems
Paradigms are shared frames in a scientific field; lots of the Kuhnian view carries over to other kinds of shared frames.
To produce correct and useful frames, we look for approximations which allow us to ignore some interactions. We can view reproducible results in science or theorems in mathematics as providing frames this way.
Shared frames are a network good, so the first few users are the hardest; start with a small group who can get value from the shared frame on their own.
Distillation lowers the cost of frames.
We can view the sequences as a distillation of frames now mostly shared among today’s rationalist community.

What links here?

johnswentworth23 Sep 2021 23:24 UTC

93 points

6 comments14 min readLW link 1 review

World Modeling