2022 (and All Time) Posts by Pingback Count

For the past couple years I’ve wished LessWrong had a “sort posts by number of pingbacks, or, ideally, by total karma of pingbacks”. I particularly wished for this during the Annual Review, where “which posts got cited the most?” seemed like a useful thing to track for potential hidden gems.

We still haven’t built a full-fledged feature for this, but I just ran a query against the database, and made it into a spreadsheet, which you can view here:

LessWrong 2022 Posts by Pingbacks

Here are the top 100 posts, sorted by Total Pingback Karma

Title/​LinkPost KarmaPingback CountTotal Pingback KarmaAvg Pingback Karma
AGI Ruin: A List of Lethalities87015812,48479
MIRI announces new "Death With Dignity" strategy334738,134111
A central AI alignment problem: capabilities generalization, and the sharp left turn273967,70480
Simulators6121277,69961
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover367835,12362
Reward is not the optimization target341624,49372
A Mechanistic Interpretability Analysis of Grokking367483,45072
How To Go From Interpretability To Alignment: Just Retarget The Search167453,37475
On how various plans miss the hard bits of the alignment challenge292403,28882
[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering79363,02384
How likely is deceptive alignment?101472,90762
The shard theory of human values238422,84368
Mysteries of mode collapse279322,84289
[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain57302,73191
Why Agent Foundations? An Overly Abstract Explanation285422,73065
A Longlist of Theories of Impact for Interpretability124262,589100
How might we align transformative AI if it’s developed very soon?136322,35173
A transparency and interpretability tech tree148312,34376
Discovering Language Model Behaviors with Model-Written Evaluations100192,336123
A note about differential technological development185202,270114
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]195352,26765
Supervise Process, not Outcomes132252,26290
Shard Theory: An Overview157282,01972
Epistemological Vigilance for Alignment61212,00896
A shot at the diamond-alignment problem92231,84880
Where I agree and disagree with Eliezer862271,83668
Brain Efficiency: Much More than You Wanted to Know201271,80767
Refine: An Incubator for Conceptual Alignment Research Bets143211,79385
Externalized reasoning oversight: a research direction for language model alignment117281,78864
Humans provide an untapped wealth of evidence about alignment186191,64787
Six Dimensions of Operational Adequacy in AGI Projects298201,60780
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme240161,57598
Godzilla Strategies137171,57393
(My understanding of) What Everyone in Technical Alignment is Doing and Why411231,53067
Two-year update on my personal AI timelines287181,53085
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA90161,48293
[Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL66251,46058
Human values & biases are inaccessible to the genome90141,450104
You Are Not Measuring What You Think You Are Measuring350211,44969
Open Problems in AI X-Risk [PAIS #5]59141,446103
[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?146251,40756
Conditioning Generative Models24111,362124
Conjecture: Internal Infohazard Policy132141,34096
A challenge for AGI organizations, and a challenge for readers299181,33674
Superintelligent AI is necessary for an amazing future, but far from sufficient132111,335121
Optimality is the tiger, and agents are its teeth288141,31994
Let’s think about slowing down AI522171,27375
Niceness is unnatural121121,263105
Announcing the Alignment of Complex Systems Research Group91111,247113
[Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts67231,24354
ELK prize results135171,23573
Abstractions as Redundant Information64181,21668
[Link] A minimal viable product for alignment53121,18499
Acceptability Verification: A Research Agenda50111,182107
What an actually pessimistic containment strategy looks like647161,16873
Let's See You Write That Corrigibility Tag120101,161116
chinchilla's wild implications403181,15164
Worlds Where Iterative Design Fails185171,12266
why assume AGIs will optimize for fixed goals?138141,10379
Gradient hacking: definitions and examples38111,07998
Contra shard theory, in the context of the diamond maximizer problem10161,073179
We Are Conjecture, A New Alignment Research Startup19781,050131
Circumventing interpretability: How to defeat mind-readers109111,04795
Evolution is a bad analogy for AGI: inner alignment7371,043149
Refining the Sharp Left Turn threat model, part 1: claims and mechanisms8281,042130
MATS Models8681,035129
Common misconceptions about OpenAI239111,02893
Prizes for ELK proposals143201,02251
Current themes in mechanistic interpretability research8891,014113
Discovering Agents711399476
[Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”421599266
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?1182498841
Inner and outer alignment decompose one hard problem into two extremely hard problems1151795956
Threat Model Literature Review731395373
Language models seem to be much better than humans at next-token prediction1721195287
Will Capabilities Generalise More?1227952136
Pivotal outcomes and pivotal processes918938117
Conditioning Generative Models for Alignment569934104
Training goals for large language models289930103
It’s Probably Not Lithium4415929186
Latent Adversarial Training401191483
“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments1291191383
Conditioning Generative Models with Restrictions185913183
The alignment problem from a deep learning perspective978910114
Instead of technical research, more people should focus on buying time1001590460
By Default, GPTs Think In Plain Sight849903100

[Intro to brain-like-AGI safety] 4. The “short-term predictor”
641689056
Don't leave your fingerprints on the future1091189081
Strategy For Conditioning Generative Models315883177
Call For Distillers2041987846
Thoughts on AGI organizations and capabilities work1025871174
Optimization at a Distance87986896
[Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning521785951
What does it take to defend the world against out-of-control AGIs?1801185378
Monitoring for deceptive alignment1351185177
Late 2021 MIRI Conversations: AMA / Discussion1198849106
How to Diversify Conceptual Alignment: the Model Behind Refine872784531
wrapper-minds are the enemy1038833104
But is it really in Rome? An investigation of the ROME model editing technique1028833104
An Open Agency Architecture for Safe Transformative AI741283169