Abstracts should be either Actually Short™, or broken into paragraphs

It looks to me like academia figured out (correctly) that it’s useful for papers to have an abstract that makes it easy to tell-at-a-glance what a paper is about. They also figured out that abstract should be about a paragraph. Then people goodharted on “what paragraph means”, trying to cram too much information in one block of text. Papers typically have ginormous abstracts that should actually broken into multiple paragraphs.

I think LessWrong posts should probably have more abstracts, but I want them to be nice easy-to-read abstracts, not worst-of-all-worlds-goodharted-paragraph abstracts. Either admit that you’ve written multiple paragraphs and break it up accordingly, or actually streamline it into one real paragraph.

Sorry to pick on the authors of this particular post, but my motivating example today was bumping into the abstract for the Natural Abstractions: Key claims, Theorems, and Critiques. It’s a good post, it’s opening summary just happened to be written in an academic-ish style that exemplified the problem. It opens with:

TL;DR: John Wentworth’s Natural Abstraction agenda aims to understand and recover “natural” abstractions in realistic environments. This post summarizes and reviews the key claims of said agenda, its relationship to prior work, as well as its results to date. Our hope is to make it easier for newcomers to get up to speed on natural abstractions, as well as to spur a discussion about future research priorities. We start by summarizing basic intuitions behind the agenda, before relating it to prior work from a variety of fields. We then list key claims behind John Wentworth’s Natural Abstractions agenda, including the Natural Abstraction Hypothesis and his specific formulation of natural abstractions, which we dub redundant information abstractions. We also construct novel rigorous statements of and mathematical proofs for some of the key results in the redundant information abstraction line of work, and explain how those results fit into the agenda. Finally, we conclude by critiquing the agenda and progress to date. We note serious gaps in the theoretical framework, challenge its relevance to alignment, and critique John’s current research methodology.

There are 179 words. They blur together, I have a very hard time parsing it. If this were anything other than an abstract I expect you’d naturally write it in about 3 paragraphs:

TL;DR: John Wentworth’s Natural Abstraction agenda aims to understand and recover “natural” abstractions in realistic environments. This post summarizes and reviews the key claims of said agenda, its relationship to prior work, as well as its results to date. Our hope is to make it easier for newcomers to get up to speed on natural abstractions, as well as to spur a discussion about future research priorities.

We start by summarizing basic intuitions behind the agenda, before relating it to prior work from a variety of fields. We then list key claims behind John Wentworth’s Natural Abstractions agenda, including the Natural Abstraction Hypothesis and his specific formulation of natural abstractions, which we dub redundant information abstractions. We also construct novel rigorous statements of and mathematical proofs for some of the key results in the redundant information abstraction line of work, and explain how those results fit into the agenda.

Finally, we conclude by critiquing the agenda and progress to date. We note serious gaps in the theoretical framework, challenge its relevance to alignment, and critique John’s current research methodology.

If I try to streamline this without losing info, it’s still hard to get it into something less than 3 paragraphs (113 words)

We review John Wentworth’s Natural Abstraction agenda. We aim to help newcomers to get up to speed on natural abstractions, and to spur discussion about future research priorities.

We summarize the basic intuitions behind the agenda, and its key claims, including both ’the natural abstraction hypothesis”, and his specific formulation of natural abstractions, which we dub “redundant information abstractions.” We connect it to prior work, and also construct novel rigorous statements for some key results in the redundant information abstraction line of work.

Finally, we conclude by critiquing the agenda and progress to date. We note serious gaps in the theoretical framework, challenge its relevance to alignment, and critique John’s current research methodology.

If I’m letting myself throw out significant information, I can get it down to 69 words. I’m not thrilled with this as a paragraph, but my eyes don’t completely glaze over it.

We review John Wentworth’s Natural Abstraction agenda. We conceptualize as having two major claims – the “universal abstraction hypothesis” and the “redundant information hypothesis.” We construct rigorous statements for some key results in the redundant information abstraction line of work, and explain how those results fit into the agenda. We conclude with a critique, noting serious gaps in the theoretical framework, challenge its relevance to alignment, and critique John’s current research methodology.

I think what I actually want in most cases is a very short abstract (1 long sentence or 3 short sentences), followed by a few paragraphs.

I do notice that once you start letting the abstract be multiple paragraphs, it ends up not that different from the introduction to the post.

For comparison:

Introduction

The Natural Abstraction Hypothesis (NAH) says that our universe abstracts well, in the sense that small high-level summaries of low-level systems exist, and that furthermore, these summaries are “natural”, in the sense that many different cognitive systems learn to use them. There are also additional claims about how these natural abstractions should be formalized. We thus split up the Natural Abstraction Hypothesis into two main components that are sometimes conflated:

  1. The Universality Hypothesis: Natural abstractions exist, i.e. many cognitive systems learn similar abstractions.

  2. The Redundant Information Hypothesis: Natural abstractions are well described mathematically as functions of redundant or conserved information.

Closely connected to the Natural Abstraction Hypothesis are several mathematical results as well as plans to apply natural abstractions to AI alignment. We’ll call all of these views together the natural abstractions agenda.

The natural abstractions agenda has been developed by John Wentworth over the last few years. The large number of posts on the subject, which often build on each other by each adding small pieces to the puzzle, can make it difficult to get a high-level overview of the key claims and results. Additionally, most of the mathematical definitions, theorems, and proofs are stated only informally, which makes it easy to mix up conjectures, proven claims, and conceptual intuitions if readers aren’t careful.

In this post, we

  • survey some existing related work, including in the academic literature,

  • summarize the key conceptual claims behind the natural abstractions agenda and break them down into specific subclaims,

  • formalize some of the key mathematical claims and provide formal proofs for them,

  • outline the high-level plan for how the natural abstractions agenda aims to help with AI alignment,

  • and critique the agenda by noting gaps in the theory, issues with the relation to alignment, and methodological criticisms.

All except the last of these sections are our attempt to describe John’s views, not our own. That said, we attempt to explain things in the way that makes the most sense to us, which may differ from how John would phrase them somewhat. And while John met with us to clarify his thinking, it’s still possible we’re simply misunderstanding some of his views. The final section discusses our own views: we note some of our agreements but focus on the places where we disagree or see a need for additional work.

In the remainder of this introduction, we provide some high-level intuitions and motivation, and then survey existing distillations and critiques of the natural abstractions agenda. Readers who are already quite familiar with natural abstractions may wish to skip directly to the next section.

Honestly I’m not sure the abstract really adds that much over this. This is 430 words. The original abstract is 179, about 42% as long. The parts of the abstract that nail down “literally what are all the things we included in this post” don’t really seem to add much that I wouldn’t get by skimming the bullet points in the intro. And it’s much easier to read in the intro. (I also bet you could streamline the intro somewhat, which would further reduce the benefit of having an abstract in the first place)

Rather than copying academic abstract style, I’d rather people basically write good introductions, where the first paragraph helps you make a decision about whether to read the rest of intro, and the rest of the intro helps you decide whether to read the rest of the piece.

In this case, I’d maybe just replace the abstract with:

We review John Wentworth’s Natural Abstraction agenda, summarizing it’s key claims and critiquing it’s relevance to alignment. We aim to help newcomers to get up to speed on natural abstractions, offer criticism, and to spur discussion about future research priorities.

and then jump into the introduction, which covers the rest of the information.