Here is a plot of the annual citations received by MATS, EleutherAI, and Apart research, adjusted so they start on the same year. The three organizations are somewhat comparable, as they leverage large networks of external collaborators: MATS mentors/fellows, EleutherAI Discord, Apart sprint participants.
The EleutherAI data fits a logistic curve perfectly, asymptoting to ~18.5k citations/year. I can’t fit the others as at least 4 data points are needed to fit a logistic curve.
I made a Google Scholar page for MATS. This was inspired by @Esben Kran’s Google Scholar for Apart Research. Eleuther AI subsequently made one too. I think all AI safety organizations and research programs should consider making Google Scholar pages to better share research and track impact.
Here is a plot of the annual citations received by MATS, EleutherAI, and Apart research, adjusted so they start on the same year. The three organizations are somewhat comparable, as they leverage large networks of external collaborators: MATS mentors/fellows, EleutherAI Discord, Apart sprint participants.
The EleutherAI data fits a logistic curve perfectly, asymptoting to ~18.5k citations/year. I can’t fit the others as at least 4 data points are needed to fit a logistic curve.
The top-10 most-cited papers that MATS contributed to are (all with at least 290 citations)
Representation Engineering: A Top-Down Approach to AI Transparency
Sparse autoencoders find highly interpretable features in language models
Towards understanding sycophancy in language models
Steering Language Models With Activation Engineering
Steering Llama 2 via Contrastive Activation Addition
Refusal in language models is mediated by a single direction
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The Reversal Curse: LLMs trained on” A is B” fail to learn” B is A”
LLM Evaluators Recognize and Favor Their Own Generations
Finding neurons in a haystack: Case studies with sparse probing
Compare this to the top-10 highest-karma LessWrong posts that MATS contributed to (all with over 200 karma):
SolidGoldMagikarp (plus, prompt generation)
Steering GPT-2-XL by adding an activation vector (arXiv)
Transformers Represent Belief State Geometry in their Residual Stream (arXiv)
Understanding and Controlling a Maze-Solving Policy Network (arXiv)
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs (arXiv)
Refusal in LLMs is mediated by a single direction (arXiv)
Natural Abstractions: Key Claims, Theorems, and Critiques
Distillation Robustifies Unlearning (arXiv)
Mechanistically Eliciting Latent Behaviors in Language Models
Neural networks generalize because of this one weird trick