Some (problematic) aesthetics of what constitutes good work in academia

Steven Byrnes11 Mar 2024 17:47 UTC

146 points

World Modeling Incentives Hamming Questions Intellectual Progress (Society-Level)Research Taste

(Not-terribly-informed rant, written in my free time.)

Terminology note: When I say “an aesthetic”, I mean an intuitive (“I know it when I see it”) sense of what a completed paper, project, etc. is ideally “supposed” to look like. It can include both superficial things (the paper is properly formatted, the startup has high valuation, etc.), and non-superficial things (the theory is “elegant”, the company is “making an impact”, etc.).

Part 1: The aesthetic of novelty / cleverness

Example: my rant on “the psychology of everyday life”

(Mostly copied from this tweet)

I think if you want to say something that is:
(1) true,
(2) important, and
(3) related to the psychology of everyday life,
…then it’s NOT going to conform to the aesthetic of what makes a “good” peer-reviewed academic psych paper.
The problem is that this particular aesthetic demands that results be (A) “novel”, and (B) “surprising”, in a certain sense. Unfortunately, if something satisfies (1-3) above, then it will almost definitely be obvious-in-hindsight, which (perversely) counts against (B); and it will almost definitely have some historical precedents, even if only in folksy wisdom, which (perversely) counts against (A).
If you find a (1-3) thing that is not “novel” and “surprising” per the weird peer-review aesthetic, but you have discovered a clearer explanation than before, or a crisper breakdown, or better pedagogy, etc., then good for you, and good for the world, but it’s basically useless for getting into top psych journals and getting prestigious jobs in psych academia, AFAICT. No wonder professional psychologists rarely even try.
Takeaway from the perspective of a reader: if you want to find things that are all three of (1-3), there are extremely rare, once-in-a-generation, academic psych papers that you should read, and meanwhile there’s also a giant treasure trove of blog posts and such. For example:
Motivated reasoning is absolutely all three of (1-3). If you want to know more about motivated reasoning, don’t read psych literature, read Scout Mindset.
Scope neglect is absolutely all three of (1-3). If you want to know more about scope neglect, don’t read psych literature, read blog posts about Cause Prioritization.
As it happens, I’ve been recently trying to make sense of social status and related behaviors. And none of the best sources I’ve found have been academic psychology— all my “aha” moments came from blog posts. And needless to say, whatever I come up with, I will also publish via blog posts. (Example.)
Takeaway from the perspective of an aspiring academic psychologist: What do you do? (Besides “rethink your life choices”.) Well, unless you have a once-in-a-generation insight, it seems that you need to drop at least one of (1-3):
If you drop (3), then you can, I dunno, figure out some robust pattern in millisecond-scale reaction times or forgetting curves that illuminates something about neuroscience, or find a deep structure underlying personality differences, or solve the Missing Heritability Problem, etc.—anything where we don’t have everyday intuitions for what’s true. There are lots of good psych studies in this genre (…along with lots of crap, of course, just like every field).
If you drop (2), then you can use very large sample sizes to measure very small effects that probably nobody ought to care about.
If you drop (1), then you have lots of excellent options ranging from p-hacking to data fabrication, and you can rocket to the top of your field, give TED talks, sell books, get lucrative consulting deals, etc.

Example: Holden Karnofsky quote about academia

From a 2018 interview (also excerpted here):

I would say the vast majority of what is going on in academic is people are trying to do something novel, interesting, clever, creative, different, new, provocative, that really pushes the boundaries of knowledge forward in a new way. I think that’s really important obviously and great thing. I’m really, incredibly glad we have institutions to do it.
I think there are a whole bunch of other activities that are intellectual, that are challenging, that take a lot of intellectual work and that are incredibly important and that are not that. They have nowhere else to live…
To give examples of this, I mean I think GiveWell is the first place where I might have initially expected that there was going to be development economics was going to tell us what the best charities are. Or, at least, tell us what the best interventions are. Tell us if bed nets, deworming, cash transfers, agricultural extension programs, education improvement programs, which ones are helping the most people for the least money. There’s really very little work on this in academia.
A lot of times, there will be one study that tries to estimate the impact of deworming, but very few or no attempts to really replicate it. It’s much more valuable [from the point-of-view of an academic] to have a new insight, to show something new about the world than to try and nail something down. It really got brought home to me recently when we were doing our Criminal Justice Reform work and we wanted to check ourselves. We wanted to check this basic assumption that it would be good to have less incarceration in the US.
David Roodman, who is basically the person that I consider the gold standard of a critical evidence reviewer, someone who can really dig on a complicated literature and come up with the answers, he did what, I think, was a really wonderful and really fascinating paper, which is up on our website, where he looked for all the studies on the relationship between incarceration and crime, and what happens if you cut incarceration, do you expect crime to rise, to fall, to stay the same? He really picked them apart. What happened is he found a lot of the best, most prestigious studies and about half of them, he found fatal flaws in when he just tried to replicate them or redo their conclusions.
When he put it all together, he ended up with a different conclusion from what you would get if you just read the abstracts. It was a completely novel piece of work that reviewed this whole evidence base at a level of thoroughness that had never been done before, came out with a conclusion that was different from what you naively would have thought, which concluded his best estimate is that, at current margins, we could cut incarceration and there would be no expected impact on crime. He did all that. Then, he started submitting it to journals. It’s gotten rejected from a large number of journals by now [laughter]. I mean starting with the most prestigious ones and then going to the less.…

More examples

There’s a method to calculate how light bounces around multilayer thin films. It’s basic, college-level physics and has probably been known for more than 100 years. But the explanations I could find all had typos, and the computer implementations all had bugs. So when I was a physics grad student, I wrote out my own derivation and open-source implementation with scrupulous attention to detail. I treated that as a hobby project, and didn’t even mention it in my dissertation, because obviously that’s not the kind of exciting novel physics work that helps one advance in physics academia. But in terms of accelerating the field of solar cell R&D, it was probably far more impactful than any of my “real” solar-related grad-school projects. (More discussion here.)
When I was in academia, sometimes there would be a controversy in the literature, and I would put in a ton of effort to figure out who is right, and then I figure it out to my satisfaction, and it turns out that one side is right about everything, and then … that’s it. There was nothing I could do with that information to help my nascent academic career. Obviously you can’t publish a peer-reviewed paper saying “Y’know that set of papers from 20 years ago by Prof. McBloop? They were all correct as written. All the later criticisms were wrong. Good job, Prof. McBloop!” (Sometimes figuring out something like that is indirectly useful, of course.) It would definitely work as a blog post, but if the goal is peer-reviewed papers and grants, figuring out these kinds of things is a waste of time except to the extent that it impacts “novel” follow-up work. And needless to say, if we systematically disincentivize this kind of activity, we shouldn’t be surprised that it doesn’t happen as much as it should.
It’s extremely frequent for an academic to read an article and decide it’s wrong, but extremely rare for them to say that publicly, let alone submit a formal reply (a time-consuming and miserable process, apparently). I think there are a bunch of things that contribute to that, but one of them is that the goal is “big new exciting clever insights”—and “this paper is wrong” sure doesn’t sound like a big new exciting clever insight.
Tweet by Nate Soares: “big progress often comes from lots of small reconceptualizations. the “i can’t distinguish your idea from a worse one in the literature” police are punishing real progress.”

Part 2: The aesthetic of topicality (or more cynically, “trendiness”)

General discussion

When I was in physics academia (grad school and postdoc), I got a very strong sense that the community had a tacit shared understanding of the currently-trending topics / questions, within which there’s a contest to find interesting new ideas / progress.

Now, if you think about it, aside from commercially-relevant work, success for academic research scientists / philosophers / etc. is ≈100% determined by “am I impressing my peers?”—that’s how you get promoted, that’s how you get grants, that’s how you get prizes and other accolades, etc.

So, if I make great progress on Subtopic X, and all the prestigious people in my field don’t care about Subtopic X, that’s roughly just as bad for me and my career as if those people had unanimously said “this is lousy work”.

It’s a bit like in clothing fashion: if you design an innovative new beaded dress, but beads aren’t in fashion this season, then you’re not going to sell many dresses.

Of course, the trends change, and indeed everyone is trying to be the pioneer of the next hot topic. There are a lot of factors that go into “what is the next hot topic”, including catching the interest of a critical mass of respected people (or people-who-control-funding), which in turn involves them feeling it’s “exciting”, and that they themselves have an angle for making further progress in this area, etc.

A couple personal anecdotes from my physics experience

When I was a grad student, “multiferroics” were really hot, partly due to a hope that they would enable new types of computer memory (which I think helped justify funding), and partly due to some cool new physics phenomena involving them (see Part 1 above). Separately, solar cell research was really hot, both because everyone wants to help with climate change and because you could get funding that way. I had an advisor running a multiferroics research group, and he shrewdly bought a lamp and put some multiferroics under it, and wouldn’t you know it, they had a photovoltaic effect. So what? Tons of materials do. It wasn’t a particularly strong effect, nor promising for future practical applications, but in terms of starting a trendy new physics / materials-science research area, it was bang-on. I was a coauthor on two papers related to this idea, and they now have 600 and 1700 citations respectively. Everyone involved got copious funding and promotions.
When I was a postdoc, “metamaterials” were pretty hot, although maybe a bit past its peak by that point. Separately, “diffractive optical elements” were an ancient, boring technology that had long ago migrated from physicists-in-academia to optical-engineers-in-industry. Somebody figured out that there was an opening for a second wave of academic research on diffractive optical elements, aided by modern lithography and design tools. But they didn’t describe it that way! Instead they made up a new term “metasurface”, which sounds like it’s continuing the “metamaterial” conversation, but taking it in an exciting new direction, and by the way it’s very easy to make “metasurfaces” whereas metamaterials are a giant pain that few groups can build and experiment on. So tons of groups immediately jumped onto that bandwagon. The “metasurface” trend became huge, and everyone involved got copious funding and promotions. I am confident that this would not have happened if the original group had published the same results using the traditional term “diffractive optical element” instead of coining “metasurface”. (I’m leaving out parts of this story; and also, I’m describing it as deliberate crass marketing, when in fact it was mostly a happy accident, I think. But still, it illustrates some aspects of what makes a trendsetting physics idea.)
Similarly, there’s a term “photonics” which is related to, but slightly different from, the term “optics”. But what really happens in practice is that everyone uses the term “photonics” whenever possible, because “photonics” sounds exciting and trendy, whereas “optics” sounds old and tired.

“The other Hamming question”

Richard Hamming famously asked his colleagues “What are the important problems of your field?”. I think the important follow-up question should be “Are you sure?”

Actually, perhaps one could ask a series of questions:

“What are the important problems of your field?”
“What are the problems in your field that would be most prestigious for you to solve? In other words, what are the problems where, if you solved them, lots of people, and especially your own colleagues that you look up to, would be very impressed by you?”
If those two lists are heavily overlapping, shouldn’t you be a little suspicious that you’re optimizing for impressiveness instead of really thinking about what’s “important”?
And oh by the way, what criteria are you using to define the word “important”? If you didn’t already answer that question in the course of answering Question 1 a minute ago, then … what exactly were you doing when you were answering Question 1??

Of course, this latter question ultimately gets us into the field of Cause Prioritization, which of course I think everyone in academia should take much more seriously. (Check out the “Effective Thesis” organization!)

Extremely cynical tips to arouse academics’ interests

Let’s say you’re working on a math problem that’s relevant to making safe and beneficial Artificial General Intelligence. And you want to get academic mathematicians to work on it. One might think that helping prevent human extinction would be motivation enough. Nope! Some things you might try are:

If you see something beautiful and clever, consider not revealing directly that you have seen it, but rather find an already-prestigious mathematician, hint at it to them, and hope that they “discover it” for themselves and publish it. That way they’ll become invested in the health of that subfield, and help sell it to their colleagues.
Make it sound connected to existing popular / prestigious math areas and open problems (ideally by finding and promoting actual legitimate connections, but branding and vibes can substitute in a pinch)
Make it sound connected to future funding opportunities (ideally by finding and promoting actual legitimate future funding opportunities, but branding and vibes can substitute in a pinch)

The above is tongue-in-cheek—obviously I do not endorse conducting oneself in an undignified and manipulative manner, and I notice that I mostly don’t do any of these things myself, despite having a strong wish that more academic neuroscientists would work on certain problems that I care about.

Part 3: The aesthetic of effort

In competitive gymnastics, there’s no goal except to impress the judges. Consequently, the judges learn to be impressed by people perfectly executing skills that are conspicuously difficult to execute. And indeed, if too many people can perfectly execute a skill, then the judges stop being impressed by it, and instead look for more difficult skills.

I think there’s an echo of that dynamic in the context of academia and peer review.

My favorite example is that there’s a simple idea related to AI alignment, which was well explained in a couple sentences in a 2018 blog post by Abram Demski. (See “the easy problem of wireheading” here.) A few months after I read that, a DeepMind group published a 36-page arxiv paper (see also companion blog post) full of obvious signals of effort, including gridworld models, causal influence diagrams, and so on. But the upshot of that paper was basically the same idea as those couple sentences in a blog post.

My point in bringing that up is not that there was absolutely no value-add in the extra 35.9 pages going from the sentences-in-a-blog-post to the arxiv paper. Of course there was! My point is rather (1) Those blog post sentences would have been at least as helpful as the paper for at least most of the paper’s audience, (2) Nevertheless, despite the value of those blog post sentences, they could not possibly have been published in a peer-reviewed, citable, CV-enhancing way. It just looks too simple. It does not match “the aesthetic of effort”.

Another example: There was a nice 2020 paper by Rohin Shah, Stuart Russell, et al., “Benefits of assistance over reward learning”. It was helpfully explaining a possibly-confusing conceptual point. It would have made a nice little blog post. Alas! After the authors translated their nice little conceptual clarification into academic-ese, including thorough literature reviews, formalizations, and so on, it came out to 22 pages. (UPDATE: Rohin comments that “I don’t think the main paper would have been much shorter if we’d aimed to write a blog post…”. I apologize for the error.) And then it got panned by peer reviewers, mostly for not being sufficiently surprising and novel. So maybe this example mostly belongs in Part 1 above. But I have a strong guess that the reviewers were also unhappy that even those 22 pages did demonstrate enough performative effort. For example, one reviewer complained that “there were no computational results shown in the main paper”. This reviewer didn’t say anything about why computational results would have helped make the paper better! The absence of computational results was treated as self-evidently bad.

(Needless to say, I’m not opposed to conspicuously-effortful things!! Sometimes that’s the best way to figure out something important. I’m just saying that conspicuous effort, in and of itself, should be treated by everyone as a cost, not a benefit.)

Part 4: Some general points

This obviously isn’t just about academia

For example, a recent post by @bhauth, entitled “story-based decision making” has a fun discussion of some of the “aesthetics” subconsciously used by investors when they judge startup company pitches.

Aesthetics-of-success can be sticky due to signaling issues

If Bob does something that fails by the usual standards-of-success, nobody can tell whether Bob could have succeeded by the usual standards-of-success if he had wanted to, but he doesn’t want to because he’s marching to the beat of a different drummer—or whether Bob just isn’t as skillful and hardworking as other people. So there’s a lemons problem.

Aesthetics-of-success are invisible to exactly the people most impacted by them

There’s a tendency to buy into these aesthetics and see them as the obviously appropriate and correct way to judge success, as opposed to contingent cultural impositions.

People generally only become aware of an aesthetic-of-success when they rebel against it. Otherwise they’re blind to the fact that it exists at all. I’m sure that the three items above are three out of a much longer list of “aesthetics of what constitutes good work in academia”. But those three have always annoyed me, so of course I am hyper-aware of them.

To illustrate this blindness, consider:

“That’s not ‘trendy’! It’s just ‘good important work’!”, says the scientist.
“That’s not ‘trendy’! It’s just ‘beautiful and chic’!”, says the clothing designer.
“That’s not ‘performative effort to signal technical skill’! It’s just ‘being thorough and careful’!” says the scientist.
“That’s not ‘performative effort to signal technical skill’! It’s just ‘elegant and impressive’” says the Olympic gymnast.

(One time I suggested to a friend in the construction industry that future generations would view all-glass office buildings, greige interiors, etc., as “very 2020s”, and he gave me a look, like that thought had never crossed his mind before. To him, other decades have characteristic style trends reflecting the fickle winds of fashion and culture, but ours? Of course not. We merely design things in the natural, objectively-sensible way!)

If your aesthetics-of-success are bad, so will be your “research taste”

People on this forum often talk about “developing research taste”. The definition of “good research taste” is “ability to find research directions that will lead to successful projects”. Therefore, if your “aesthetic sense of what a successful project would ideally wind up looking like” is corrupted, your notion of “good research taste” will wind up corrupted as well—optimized towards a bad target.

Homework problem

What “aesthetics” are you using to recognize success in your own writing, projects, and other pursuits? And what kinds of problematic distortions might it lead to?

What links here?

Steven Byrnes's comment on Wei Dai’s Shortform by Wei Dai (1 Mar 2024 21:29 UTC; 15 points)