Mode Collapse and the Norm One Principle

[Epistemic status: I assign a 70% chance that this model proves to be useful, 30% chance it describes things we are already trying to do to a large degree, and won’t cause us to update much.]

I’m going to talk about something that’s a little weird, because it uses some results from some very recent ML theory to make a metaphor about something seemingly entirely unrelated—norms surrounding discourse.

I’m also going to reach some conclusions that surprised me when I finally obtained them, because it caused me to update on a few things that I had previously been fairly confident about. This argument basically concludes that we should adopt fairly strict speech norms, and that there could be great benefit to moderating our discourse well.

I argue that in fact, discourse can be considered an optimization process and can be thought of in the same way that we think of optimizing a large function. As I will argue, thinking of it in this way will allow us to make a very specific set of norms that are easy to think about and easy to enforce. It is partly a proposal for how to solve the problem of dealing with speech that is considered hostile, low-quality, or otherwise harmful. But most importantly, it is a proposal for how to ensure that the discussion always moves in the right direction: Towards better solutions and more accurate models.

It will also help us avoid something I’m referring to as “mode collapse” (where new ideas generated are non-diverse and are typically characterized by adding more and more details to ideas that have already been tested extensively). It’s also highly related to the concepts discussed in the Death Spirals and the Cult Attractor portion of the Sequences. Ideally, we’d like to be able to make sure that we’re exploring as much of the hypothesis space as possible, and there’s good reason to believe we’re probably not doing this very well.

The challenge: Making sure we’re searching for the global optimum in model-space sometimes requires reaching out blindly into the frontiers, the not well-explored regions, which runs the risk of ending up somewhere very low-quality or dangerous. There are also sometimes large gaps between very different regions of model-space where the quality of the model is very low in-between, but very high on each side of the gap. This requires traversing through potentially dangerous territory and being able to survive the whole way through.

(I’ll be using terms like “models” and “hypotheses” quite often, and I hope this isn’t confusing. I am using them very broadly, to refer to both theoretical understandings of phenomenon and blueprints for practical implementations of ideas).

We desire to have a set of principles which allows us to do this safely—to think about models of the world that are new and untested, solutions for solving problems that have never been done in a similar way—and they should ensure that, eventually, we can reach the global optimum.

Before we derive that set of principles, I am going to introduce a topic of interest from the field of Machine Learning. This topic will serve as the main analogy for the rest of this piece, and serve as a model for how the dynamics of discourse should work in the ideal case.

I. The Analogy: Generative Adversarial Networks

For those of you who are not familiar with the recent developments in deep-learning, Generative Adversarial Networks (GANs)[intro pdf here] are a new type of generative model class that are ideal for producing high-quality samples from very high-dimensional, complex distributions. They have caused great buzz and hype in the deep-learning community due to how impressive some of the samples they produce are, and how efficient they are at generation.

Put simply, a generator model and a critic (sometimes called a discriminator) model perform a two-player game where the critic is trained to distinguish between samples produced by the generator and the “true” samples taken from the data distribution. In turn, the generator is trained to maximize the critic’s loss function. Both models are usually parametrized by deep neural networks and can be trained by taking turns running a gradient descent step on each. The Nash equilibrium of this game is when the generator’s distribution matches that of the data distribution perfectly. This is never really borne out in practice, but sometimes it gets so close that we don’t mind.

GANs have one principal failure mode, which is often thought to be due to the instability of the system, which is often called “mode collapse” (a term I’m going to appropriate to refer to a much broader concept). It was often believed that, if a careful balance between the generator and critic could not be maintained, one would eventually overpower the other—leading the critic to provide either useless or overly harsh information to the generator. Useless information will cause the generator to update very slowly or not at all, and overly harsh information will lead the samples to “collapse” to a small region of the data space that are the easiest targets for the generator to hit.

This problem was essentially solved earlier this year due to a series of papers that propose modifications to the loss functions that GANs use, and, most crucially, add another term to the critic’s loss which stabilizes the gradient (with respect to the inputs) to have a norm close to one. It was recognized that we actually desire an extremely powerful critic so that the generator can make the best updates it possibly can, but the updates themselves can’t go beyond what the generator is capable of handling. With these changes to the GAN formulation, it became possible to use crazy critic networks such as ultra-deep ResNets and train them as much as desired before updating the generator network.

The principle behind their operation is rather simple to describe, but unfortunately, it is much more difficult to explain why they work so well. However, I believe that as long as we know how to make one, and know specific implementation details that improve their stability, then I believe their principles can be applied more broadly to achieve success in a wide variety of regimes.

II. GANs as a Model of Discourse

In order to use GANs as a tool for conceptual understanding of discourse, I propose to model of the dynamics of debate as a collection of hypothesis-generators and hypothesis-critics. This could be likened to the structure of academia—researchers publish papers, they go through peer-review, the work is iterated on and improved—and over time this process converges to more and more accurate models of reality (or so we hope). Most individuals within this process play both roles, but in theory this process would still work even if they didn’t. For example, Isaac Newton was a superb hypothesis generator, but he also had some wacky ideas that most of us would consider to be obviously absurd. Nevertheless, calculus and Newtonian physics became a part of our accepted scientific knowledge, and alchemy didn’t. The system adopted and iterated on his good ideas while throwing away the bad.

Our community should be capable of something similar, while doing it more efficiently and not requiring the massive infrastructure of academia.

A hypothesis-generator is not something that just randomly pulls out a model from model-space. It proposes things that are close modifications of things it already holds to be likely within its model (though I expect this point to be debatable). Humans are both hypothesis-generators and hypothesis-critics. And as I will argue, that distinction is not quite as sharply defined as one would think.

I think there has always been an underlying assumption within the theory of intelligence that creativity and recognition /​ distinction are fundamentally different. In other words, one can easily understand Mozart to be a great composer, but it is much more difficult to be a Mozart. Naturally this belief entered it’s way into the field of Artificial Intelligence too, and became somewhat of a dogma. Computers might be able to play Chess, they might be able to play Go, but they aren’t doing anything fundamentally intelligent. They lack the creative spark, they work on pure brute-force calculation only, with maybe some heuristics and tricks that their human creators bestowed upon them.

GANs seem to defy this principle. Trained on a dataset of photographs of human faces, a GAN generator learns to produce near-photo-realistic images that nonetheless do not fully match any the faces the critic network saw (one of the reasons why CelebA was such a good choice to test these on), and are therefore in some sense producing things which are genuinely original. It may have once been thought that there was a fundamental distinction between creation and critique, but perhaps that’s not really the case. GANs were a surprising discovery, because they showed that it was possible to make impressive “creations” by starting from random nonsense and slowly tweaking it in the direction of “good” until it eventually got there (well okay, that’s basically true for the whole of optimization, but it was thought to be especially difficult for generative models).

What does this mean? Could someone become a “Mozart” by beginning a musical composition from random noise and slowly tweaking it until it became a masterpiece?

The above seems to imply “yes, perhaps.” However, this is highly contingent on the quality of the “tweaking.” It seems possible only as long as the directions to update in are very high quality. What if they aren’t very high quality? What if they point nowhere, or in very bad directions?

I think the default distribution of discourse is that it is characterized by a large number of these directionless, low quality contributions. And that it’s likely that this is one of the main factors behind mode collapse. This is related to what has been noted before: Too much intolerance for imperfect ideas (or ideas outside of established dogma) in a community prevent useful tasks from being accomplished, and progress from being made. Academia does not seem immune to this problem. Where low-quality or hostile discussion is tolerated is where this risk is greatest.

Fortunately, making sure we get good “tweaks” seems to be the easy part. Critique is in high abundance. Our community is apparently very good at it. We also don’t need to worry much about the ratio of hypothesis-generators to hypothesis-critics, as long as we can establish good principles that allow us to follow GANs as closely as possible. The nice feature of the GAN formulation is that you are allowed to make the critic as powerful as you want. In fact, the critic should be more powerful than the generator (If the generator is too powerful, it just goes directly to the argmax of the critic).

(In addition, any collection of generators is a generator, and any collection of critics is a critic. So this formulation can be applied to the community setting).

III. The Norm One Principle

So the question then becomes, how do we take an algorithm governing a game between models much simpler than a human, and use the same tweaks which consist of nothing more than a few very simple equations?

Here what I devise is a strategy for taking the concept of the norm of the critic gradient being as close to one as possible, and using that as a heuristic for how to structure appropriate discourse.

(This is where my argument gets more speculative and I expect to update this a lot, and where I welcome the most criticism).

What I propose is that we begin modeling the concept of “criticism” based on how useful it is to the idea-generator receiving the criticism. Under this model, I think we should start breaking down criticism into two fundamental attributes:

  1. Directionality—does the criticism contain highly useful information, such that the “generator” knows how to update their model /​ hypothesis /​ proposal?

  2. Magnitude—Is the criticism too harsh, does it point to something completely unlike the original proposal, or otherwise require changes that aren’t feasible for the generator to make?

My claim is that any contribution to a discussion should satisfy the “Norm One Principle.” In other words, it should have a well-defined direction, and the quantity of change should be feasible to implement.

    If a critique can satisfy our requirements for both directionality and magnitude, then it serves a useful purpose. The inverse claim to this is that if we can’t follow these requirements, we risk falling into mode collapse, and the ideas commonly proposed are almost indistinguishable from the ones which preceded them, and ideas which deviate too far from the norm are harshly condemned and suppressed.

    I think it’s natural to question whether or not restricting criticism to follow certain principles is a form of speech suppression that prevents useful ideas from being considered. But the pattern I’m proposing doesn’t restrict the “generation” process, the creative aspect which produces new hypotheses. It doesn’t restrict the topics that can be discussed. It only restricts the criticism of those hypotheses, such that they are maximally useful to the source of the hypothesis.

    One of the primary fears behind having too much criticism is that it discourages people from contributing because they want to avoid the negative feedback. But under the Norm One Principle, I think it is useful to distinguish between disagreement and criticism. I think if we’re following these norms properly, we won’t need to consider criticism to be a negative reward. In fact, criticism can be positive. Agreement could be considered “criticism in the same direction you are moving in.” Disagreement would be the opposite. And these norms also eliminate the kind of feedback that tends to be the most discouraging.

    For example, some things which violate “Norm One”:

    • Ad hominem attacks (typically directionless).

    • Affective Death Spirals (unlimited praise or denunciation is usually directionless, and usually very high magnitude).

    • Signs that cause aversion (things I “don’t like”, that trigger my System 1 alarms, which probably violates both directionality and magnitude).

    • Lengthy lists of changes to make (norm greater than 1, ideally we want to try to focus on small sets of changes that have the highest priority).

    • Repetition of points that have already been made (norm greater than one).

    One of my strongest hopes is that whomever is playing the part of the “generator” is able to compile the list of critiques easily and use them to update somewhere close to the optimal direction. This would be difficult if the sum of all critiques is either directionless (many critics point in opposite or near-opposite directions) or very high-magnitude (Critics simply say to get as far away from here as possible).

    But let’s suppose that each individual criticism satisfies the Norm One principle. We will also assume that the generator is weighing each critique by their respect for whoever produced it, which I think is highly likely. Then the generator should be able to move in a direction unless the sum of the directions completely cancel out. It is unlikely for this to happen—unless there is very strong epistemic disagreement in the community over some fundamental assumptions (in which case the conversation should probably move over to that).

    In addition, it also becomes less likely for the directions to cancel out as the number of inputs increases. Thus, it seems that proposals for new models should be presented to a wide audience, and we should avoid the temptation to keep our proposals hidden to all except for a small set of people we trust.

    So I think that in general, this proposed structure should tend to increase the amount of collective trust we have in the community, and that it favors transparency and favors diversity of viewpoints.

    But what of the possible failure modes of this plan?

    This model should fail if the specific details of its implementation either remove too much discussion, or fail to deal with individuals who refuse to follow the norms and refuse to update. Any implementation should allow room for anyone to update. Someone who posts an extremely hostile, directionless comment should be allowed chances to modify their contribution. The only scenario in which the “banhammer” becomes appropriate is when this model fails to apply: The cardinal sin of rationality, the refusal to update.

    IV. Building the Ideal “Generator”

    As a final point, I’ll note that the above assumes that generators will be able to update their models incrementally. The easy part, as I mentioned, was obtaining the updates, the hard part is accumulating them. This seems difficult with the infrastructure we have in place. What we do have is a good system for posting proposals and receiving feedback (The blog post /​ comment thread set-up), but this assumes that each “generator” is keeping track of their models by themselves and has to be fully aware of the status of other models on their own. There is no centralized “mixture model” anywhere that contains the full set of models weighted by how much probability they are given by the community. Currently, we do not have a good solution for this problem.

    However, it seems that the first conception of Arbital was centered around finding a solution to this kind of problem:

    Arbital has bigger ambitions than even that. We all dream of a world that eliminates the duplication of effort in online argument—a world where, the same way that Wikipedia centralized the recording of definite facts, an argument only needs to happen once, instead of being reduplicated all over the Internet; with all the branches of the argument neatly recorded in the same place, along with some indication of who believes what. A world where ‘just check Arbital’ had the same status for determining the current state of debates, as ‘just check Wikipedia’ now has when somebody starts arguing about the population of Melbourne. There’s entirely new big subproblems and solutions, not present at all in the current Arbital, that we’d need to tackle that considerably more difficult problem. But to solve ‘explaining things’ is something of a first step. If you have a single URL that you can point anyone to for ‘explaining Bayes’, and if you can dispatch people to different pages depending on how much math they know, you’re starting to solve some of the key subproblems in removing the redundancy in online arguments.

    If my proposed model is accurate, then it suggests that the problem Arbital aims to solve is in fact quite crucial to solve, and that the developers of Arbital should consider working through each obstacle they face without pivoting from this original goal. I feel confident enough that this goal should be high priority that I’d be willing to support its development in whatever way is deemed most helpful and is feasible for me (I am not an investor, but I am a programmer and would also be capable of making small donations, or contributing material).

    The only thing that this model would require for Arbital to do would be to make it as open as possible to contribute, and then perform heavy moderation or filtering of contributed content (but importantly not the other way around, where it is closed to small group of trusted people).

    Currently, the incremental changes that would have to be made to LessWrong and related sites like SSC would simply be increased moderation of comment quality. Otherwise, any further progress on the problem would require overcoming much more serious obstacles requiring significant re-design and architecture changes.

    Everything I’ve written above is also subject to the model I’ve just outlined, and therefore I expect to make incremental updates as feedback to this post accrues.

    My initial prediction for feedback to this post is that the ideas might be considered helpful and offer a useful perspective or a good starting point, but that there are probably many details that I have missed that would be useful to discuss, or points that were not quite well-argued or well thought-out. I will look out for these things in the comments.