2022 (and All Time) Posts by Pingback Count

For the past couple years I’ve wished LessWrong had a “sort posts by number of pingbacks, or, ideally, by total karma of pingbacks”. I particularly wished for this during the Annual Review, where “which posts got cited the most?” seemed like a useful thing to track for potential hidden gems.

We still haven’t built a full-fledged feature for this, but I just ran a query against the database, and made it into a spreadsheet, which you can view here:

LessWrong 2022 Posts by Pingbacks

Here are the top 100 posts, sorted by Total Pingback Karma

`Title/Link`	`Post Karma`	`Pingback Count`	`Total Pingback Karma`	`Avg Pingback Karma`
`AGI Ruin: A List of Lethalities`	870	158	12,484	79
`MIRI announces new "Death With Dignity" strategy`	334	73	8,134	111
`A central AI alignment problem: capabilities generalization, and the sharp left turn`	273	96	7,704	80
`Simulators`	612	127	7,699	61
`Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover`	367	83	5,123	62
`Reward is not the optimization target`	341	62	4,493	72
`A Mechanistic Interpretability Analysis of Grokking`	367	48	3,450	72
`How To Go From Interpretability To Alignment: Just Retarget The Search`	167	45	3,374	75
`On how various plans miss the hard bits of the alignment challenge`	292	40	3,288	82
`[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering`	79	36	3,023	84
`How likely is deceptive alignment?`	101	47	2,907	62
`The shard theory of human values`	238	42	2,843	68
`Mysteries of mode collapse`	279	32	2,842	89
`[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain`	57	30	2,731	91
`Why Agent Foundations? An Overly Abstract Explanation`	285	42	2,730	65
`A Longlist of Theories of Impact for Interpretability`	124	26	2,589	100
`How might we align transformative AI if it’s developed very soon?`	136	32	2,351	73
`A transparency and interpretability tech tree`	148	31	2,343	76
`Discovering Language Model Behaviors with Model-Written Evaluations`	100	19	2,336	123
`A note about differential technological development`	185	20	2,270	114
`Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]`	195	35	2,267	65
`Supervise Process, not Outcomes`	132	25	2,262	90
`Shard Theory: An Overview`	157	28	2,019	72
`Epistemological Vigilance for Alignment`	61	21	2,008	96
`A shot at the diamond-alignment problem`	92	23	1,848	80
`Where I agree and disagree with Eliezer`	862	27	1,836	68
`Brain Efficiency: Much More than You Wanted to Know`	201	27	1,807	67
`Refine: An Incubator for Conceptual Alignment Research Bets`	143	21	1,793	85
`Externalized reasoning oversight: a research direction for language model alignment`	117	28	1,788	64
`Humans provide an untapped wealth of evidence about alignment`	186	19	1,647	87
`Six Dimensions of Operational Adequacy in AGI Projects`	298	20	1,607	80
`How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme`	240	16	1,575	98
`Godzilla Strategies`	137	17	1,573	93
`(My understanding of) What Everyone in Technical Alignment is Doing and Why`	411	23	1,530	67
`Two-year update on my personal AI timelines`	287	18	1,530	85
`[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA`	90	16	1,482	93
`[Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL`	66	25	1,460	58
`Human values & biases are inaccessible to the genome`	90	14	1,450	104
`You Are Not Measuring What You Think You Are Measuring`	350	21	1,449	69
`Open Problems in AI X-Risk [PAIS #5]`	59	14	1,446	103
`[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?`	146	25	1,407	56
`Conditioning Generative Models`	24	11	1,362	124
`Conjecture: Internal Infohazard Policy`	132	14	1,340	96
`A challenge for AGI organizations, and a challenge for readers`	299	18	1,336	74
`Superintelligent AI is necessary for an amazing future, but far from sufficient`	132	11	1,335	121
`Optimality is the tiger, and agents are its teeth`	288	14	1,319	94
`Let’s think about slowing down AI`	522	17	1,273	75
`Niceness is unnatural`	121	12	1,263	105
`Announcing the Alignment of Complex Systems Research Group`	91	11	1,247	113
`[Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts`	67	23	1,243	54
`ELK prize results`	135	17	1,235	73
`Abstractions as Redundant Information`	64	18	1,216	68
`[Link] A minimal viable product for alignment`	53	12	1,184	99
`Acceptability Verification: A Research Agenda`	50	11	1,182	107
`What an actually pessimistic containment strategy looks like`	647	16	1,168	73
`Let's See You Write That Corrigibility Tag`	120	10	1,161	116
`chinchilla's wild implications`	403	18	1,151	64
`Worlds Where Iterative Design Fails`	185	17	1,122	66
`why assume AGIs will optimize for fixed goals?`	138	14	1,103	79
`Gradient hacking: definitions and examples`	38	11	1,079	98
`Contra shard theory, in the context of the diamond maximizer problem`	101	6	1,073	179
`We Are Conjecture, A New Alignment Research Startup`	197	8	1,050	131
`Circumventing interpretability: How to defeat mind-readers`	109	11	1,047	95
`Evolution is a bad analogy for AGI: inner alignment`	73	7	1,043	149
`Refining the Sharp Left Turn threat model, part 1: claims and mechanisms`	82	8	1,042	130
`MATS Models`	86	8	1,035	129
`Common misconceptions about OpenAI`	239	11	1,028	93
`Prizes for ELK proposals`	143	20	1,022	51
`Current themes in mechanistic interpretability research`	88	9	1,014	113
`Discovering Agents`	71	13	994	76
`[Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”`	42	15	992	66
`What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?`	118	24	988	41
`Inner and outer alignment decompose one hard problem into two extremely hard problems`	115	17	959	56
`Threat Model Literature Review`	73	13	953	73
`Language models seem to be much better than humans at next-token prediction`	172	11	952	87
`Will Capabilities Generalise More?`	122	7	952	136
`Pivotal outcomes and pivotal processes`	91	8	938	117
`Conditioning Generative Models for Alignment`	56	9	934	104
`Training goals for large language models`	28	9	930	103
`It’s Probably Not Lithium`	441	5	929	186
`Latent Adversarial Training`	40	11	914	83
`“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments`	129	11	913	83
`Conditioning Generative Models with Restrictions`	18	5	913	183
`The alignment problem from a deep learning perspective`	97	8	910	114
`Instead of technical research, more people should focus on buying time`	100	15	904	60
`By Default, GPTs Think In Plain Sight`	84	9	903	100
`[Intro to brain-like-AGI safety] 4. The “short-term predictor”`	64	16	890	56
`Don't leave your fingerprints on the future`	109	11	890	81
`Strategy For Conditioning Generative Models`	31	5	883	177
`Call For Distillers`	204	19	878	46
`Thoughts on AGI organizations and capabilities work`	102	5	871	174
`Optimization at a Distance`	87	9	868	96
`[Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning`	52	17	859	51
`What does it take to defend the world against out-of-control AGIs?`	180	11	853	78
`Monitoring for deceptive alignment`	135	11	851	77
`Late 2021 MIRI Conversations: AMA / Discussion`	119	8	849	106
`How to Diversify Conceptual Alignment: the Model Behind Refine`	87	27	845	31
`wrapper-minds are the enemy`	103	8	833	104
`But is it really in Rome? An investigation of the ROME model editing technique`	102	8	833	104
`An Open Agency Architecture for Safe Transformative AI`	74	12	831	69