Machine Learning (ML)

TagLast edit: Dec 30, 2024, 10:37 AM by Dakara

Machine Learning is a general field of study that deals with automated statistical learning and pattern detection by non-biological systems. It can be seen as a sub-domain of artificial intelligence that specifically deals with modeling and prediction through the knowledge extracted from training data. As a multi-disciplinary area, it has borrowed concepts and ideas from other areas like pure mathematics and cognitive science.

Understanding different machine learning algorithms

The most widely used distinction is between unsupervised (e.g. k-means clustering, principal component analysis) vs supervised (e.g. Support Vector Machines, logistic regression) methods. The first approach identifies interesting patterns (e.g. clusters and latent dimensions) in unlabeled training data, whereas the second takes labeled training data and tries to predict the label for unlabeled data points from the same distribution.

Another important distinction relates to the bias/variance tradeoff—some machine learning methods are capable of recognizing more complex patterns, but the tradeoff is that these methods can overfit and generalize poorly if there’s noise in the training data—especially if there’s not much training data available.

There are also subfields of machine learning devoted to operating on specific kinds of data. For example, Hidden Markov Models and recurrent neural networks operate on time series data. Convolutional neural networks are commonly applied to image data.

Applications

The use of machine learning has been widespread since its formal definition in the 50’s. The ability to make predictions based on data has been extensively used in areas such as analysis of financial markets, natural language processing and even brain-computer interfaces. Amazon’s product suggestion system makes use of training data in the form of past customer purchases in order to predict what customers might want to buy in the future.

In addition to its practical usefulness, machine learning has also offered insight into human cognitive organization. It seems likely machine learning will play an important role in the development of artificial general intelligence.

Paper: Discovering novel algorithms with AlphaTensor [Deepmind]

LawrenceCOct 5, 2022, 4:20 PM

82 points

18 comments1 min readLW link

(www.deepmind.com)

Playing with DALL·E 2

Dave OrrApr 7, 2022, 6:49 PM

166 points

118 comments6 min readLW link

Predictive Coding has been Unified with Backpropagation

lsusrApr 2, 2021, 9:42 PM

176 points

51 comments2 min readLW link

A Bird’s Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H and TW123

May 9, 2022, 5:18 PM

163 points

8 comments35 min readLW link

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

the scaling “inconsistency”: openAI’s new insight

nostalgebraistNov 7, 2020, 7:40 AM

148 points

14 comments9 min readLW link

(nostalgebraist.tumblr.com)

Matt Botvinick on the spontaneous emergence of learning algorithms

Adam SchollAug 12, 2020, 7:47 AM

154 points

87 comments5 min readLW link

What we know about machine learning’s replication crisis

Younes KamelMar 5, 2022, 11:55 PM

36 points

4 comments6 min readLW link

(youneskamel.substack.com)

Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

Logan Riggs and Gurkenglas

Sep 3, 2020, 6:27 PM

68 points

11 comments2 min readLW link

EfficientZero: How It Works

1a3ornNov 26, 2021, 3:17 PM

299 points

50 comments29 min readLW link 1 review

I Trained a Neural Network to Play Helltaker

lsusrApr 7, 2021, 8:24 AM

34 points

5 comments3 min readLW link

An Illustrated Proof of the No Free Lunch Theorem

lifelonglearnerJun 8, 2020, 1:54 AM

19 points

0 comments1 min readLW link

(mlu.red)

The No Free Lunch theorems and their Razor

Adrià Garriga-alonsoMay 24, 2022, 6:40 AM

56 points

3 comments9 min readLW link

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling

jdpOct 20, 2023, 7:32 AM

119 points

15 comments22 min readLW link

Self-fulfilling misalignment data might be poisoning our AI models

TurnTroutMar 2, 2025, 7:51 PM

154 points

27 comments1 min readLW link

(turntrout.com)

GPT-175bee

Adam Scherlis and LawrenceC

Feb 8, 2023, 6:58 PM

122 points

14 comments1 min readLW link

Magna Alta Doctrina

jacob_cannellDec 11, 2021, 9:54 PM

60 points

7 comments28 min readLW link

One possible approach to develop the best possible general learning algorithm

martillopartMar 14, 2022, 7:24 PM

3 points

0 comments7 min readLW link

Regularization Causes Modularity Causes Generalization

dkirmaniJan 1, 2022, 11:34 PM

50 points

7 comments3 min readLW link

Unsolved ML Safety Problems

jsteinhardtSep 29, 2021, 4:00 PM

61 points

2 comments3 min readLW link

(bounded-regret.ghost.io)

[MLSN #1]: ICLR Safety Paper Roundup

Dan HOct 18, 2021, 3:19 PM

59 points

1 comment2 min readLW link

Mech Interp Challenge: September—Deciphering the Addition Model

CallumMcDougallSep 13, 2023, 10:23 PM

35 points

0 comments4 min readLW link

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner and Peter Hase

Apr 9, 2021, 7:19 PM

141 points

17 comments102 min readLW link

UML XI: Nearest Neighbor Schemes

Rafael HarthFeb 16, 2020, 8:30 PM

15 points

3 comments9 min readLW link

Behavioral and mechanistic definitions (often confuse AI alignment discussions)

LawrenceCFeb 20, 2023, 9:33 PM

33 points

5 comments6 min readLW link

[Question] If I ask an LLM to think step by step, how big are the steps?

ryan_bSep 13, 2024, 8:30 PM

7 points

1 comment1 min readLW link

Residual stream norms grow exponentially over the forward pass

StefanHex and TurnTrout

May 7, 2023, 12:46 AM

77 points

24 comments11 min readLW link

Neural nets as a model for how humans make and understand visual art

Owain_EvansNov 9, 2019, 4:53 PM

28 points

7 comments2 min readLW link

(owainevans.github.io)

[Aspiration-based designs] 1. Informal introduction

B Jacobs, Jobst Heitzig, Simon Fischer and Simon Dima

Apr 28, 2024, 1:00 PM

44 points

4 comments8 min readLW link

Anticorrelated Noise Injection for Improved Generalization

tailcalledFeb 20, 2022, 10:15 AM

2 points

9 comments1 min readLW link

How good are LLMs at doing ML on an unknown dataset?

Håvard Tveit IhleJul 1, 2024, 9:04 AM

33 points

4 comments13 min readLW link

Make a neural network in ~10 minutes

Arjun YadavApr 26, 2022, 5:24 AM

8 points

0 comments4 min readLW link

(arjunyadav.net)

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana and AE Studio

Mar 27, 2025, 3:39 PM

80 points

4 comments13 min readLW link

Cross-Validation vs Bayesian Model Comparison

johnswentworthJul 21, 2019, 6:14 PM

28 points

2 comments4 min readLW link

Caution when interpreting Deepmind’s In-context RL paper

Sam MarksNov 1, 2022, 2:42 AM

105 points

8 comments4 min readLW link

How to train your transformer

p.b.Apr 7, 2022, 9:34 AM

6 points

0 comments8 min readLW link

New GPT-3 competitor

Quintin PopeAug 12, 2021, 7:05 AM

32 points

10 comments1 min readLW link

interpreting GPT: the logit lens

nostalgebraistAug 31, 2020, 2:47 AM

230 points

38 comments10 min readLW link

[Question] Is “Recursive Self-Improvement” Relevant in the Deep Learning Paradigm?

DragonGodApr 6, 2023, 7:13 AM

32 points

36 comments7 min readLW link

UML final

Rafael HarthMar 8, 2020, 8:43 PM

22 points

1 comment14 min readLW link

UML XII: Dimensionality Reduction

Rafael HarthFeb 23, 2020, 7:44 PM

9 points

0 comments9 min readLW link

A Data limited future

Donald HobsonAug 6, 2022, 2:56 PM

52 points

25 comments2 min readLW link

Discussion on the machine learning approach to AI safety

VikaNov 1, 2018, 8:54 PM

27 points

3 comments4 min readLW link

Modelling and Understanding SGD

J BostockOct 5, 2021, 1:41 PM

8 points

0 comments3 min readLW link

The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research

Arthur Conmy and Neel Nanda

Feb 24, 2025, 2:17 AM

48 points

1 comment7 min readLW link

D&D.Sci September 2022: The Allocation Helm

abstractapplicSep 16, 2022, 11:10 PM

34 points

34 comments1 min readLW link

Interpretability in ML: A Broad Overview

lifelonglearnerAug 4, 2020, 7:03 PM

53 points

5 comments15 min readLW link

Key Papers in Language Model Safety

aogJun 20, 2022, 3:00 PM

40 points

1 comment22 min readLW link

Possible OpenAI’s Q* breakthrough and DeepMind’s AlphaGo-type systems plus LLMs

BurnyNov 23, 2023, 3:16 AM

37 points

25 comments2 min readLW link

Researcher incentives cause smoother progress on benchmarks

ryan_greenblattDec 21, 2021, 4:13 AM

20 points

4 comments1 min readLW link

Metaculus Introduces AI-Powered Community Insights to Reveal Factors Driving User Forecasts

ChristianWilliamsNov 10, 2023, 5:57 PM

6 points

0 comments1 min readLW link

(www.metaculus.com)

Autoregressive Propaganda

lsusrAug 22, 2021, 2:18 AM

25 points

3 comments3 min readLW link

[Link] Word-vector based DL system achieves human parity in verbal IQ tests

jacob_cannellJun 13, 2015, 11:38 PM

17 points

8 comments1 min readLW link

OpenAI releases functional Dota 5v5 bot, aims to beat world champions by August

habrykaJun 26, 2018, 10:40 PM

53 points

12 comments1 min readLW link

(blog.openai.com)

How LLMs are and are not myopic

janusJul 25, 2023, 2:19 AM

135 points

16 comments8 min readLW link

Breaking down the training/deployment dichotomy

Erik JennerAug 28, 2022, 9:45 PM

30 points

3 comments3 min readLW link

Paper: Superposition, Memorization, and Double Descent (Anthropic)

LawrenceCJan 5, 2023, 5:54 PM

53 points

11 comments1 min readLW link

(transformer-circuits.pub)

Tabula Bio: towards a future free of disease (& looking for collaborators)

mpoonMar 23, 2025, 4:30 PM

44 points

15 comments2 min readLW link

Preferences from (real and hypothetical) psychology papers

Stuart_ArmstrongOct 6, 2021, 9:06 AM

15 points

0 comments2 min readLW link

How ARENA course material gets made

CallumMcDougallJul 2, 2024, 6:04 PM

41 points

2 comments7 min readLW link

UML XIII: Online Learning and Clustering

Rafael HarthMar 1, 2020, 6:32 PM

13 points

0 comments14 min readLW link

[Question] How Does the Human Brain Compare to Deep Learning on Sample Efficiency?

DragonGodJan 15, 2023, 7:49 PM

11 points

6 comments1 min readLW link

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix

Jaehyuk Lim, Kanishk Tantia and Sinem

Oct 11, 2024, 11:06 PM

8 points

2 comments10 min readLW link

[1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Arxiv

DragonGodNov 21, 2019, 1:18 AM

52 points

4 comments1 min readLW link

(arxiv.org)

Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius HobbhahnOct 4, 2022, 7:22 AM

46 points

11 comments1 min readLW link

(arxiv.org)

And All the Shoggoths Merely Players

Zack_M_DavisFeb 10, 2024, 7:56 PM

170 points

57 comments12 min readLW link

Does SGD Produce Deceptive Alignment?

Mark XuNov 6, 2020, 11:48 PM

96 points

9 comments16 min readLW link

[Question] Nonlinear limitations of ReLUs

magfrumpOct 26, 2023, 6:51 PM

13 points

1 comment1 min readLW link

Google’s PaLM-E: An Embodied Multimodal Language Model

SandXboxMar 7, 2023, 4:11 AM

87 points

7 comments1 min readLW link

(palm-e.github.io)

The surprising parameter efficiency of vision models

berenApr 8, 2023, 7:44 PM

81 points

28 comments4 min readLW link

[Link] Whittlestone et al., The Societal Implications of Deep Reinforcement Learning

Aryeh EnglanderMar 10, 2021, 6:13 PM

11 points

1 comment1 min readLW link

(jair.org)

UML IV: Linear Predictors

Rafael HarthJul 8, 2020, 7:06 PM

15 points

0 comments9 min readLW link

NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

OzyrusOct 11, 2021, 3:28 PM

51 points

36 comments1 min readLW link

(developer.nvidia.com)

A Mechanistic Interpretability Analysis of Grokking

Neel Nanda and Tom Lieberum

Aug 15, 2022, 2:41 AM

373 points

48 comments36 min readLW link 1 review

(colab.research.google.com)

Concept Safety: Producing similar AI-human concept spaces

Kaj_SotalaApr 14, 2015, 8:39 PM

51 points

45 comments8 min readLW link

“Inductive Bias”

Eliezer YudkowskyApr 8, 2007, 7:52 PM

39 points

24 comments3 min readLW link

GD’s Implicit Bias on Separable Data

Xander DaviesOct 17, 2022, 4:13 AM

25 points

0 comments7 min readLW link

The Brain as a Universal Learning Machine

jacob_cannellJun 24, 2015, 9:45 PM

201 points

171 comments19 min readLW link

Why square errors?

AprillionNov 26, 2022, 1:40 PM

41 points

11 comments2 min readLW link

UML IX: Kernels and Boosting

Rafael HarthFeb 2, 2020, 9:51 PM

13 points

1 comment10 min readLW link

Future ML Systems Will Be Qualitatively Different

jsteinhardtJan 11, 2022, 7:50 PM

119 points

10 comments5 min readLW link

(bounded-regret.ghost.io)

Thoughts on Loss Landscapes and why Deep Learning works

berenJul 25, 2023, 4:41 PM

53 points

4 comments18 min readLW link

Reframing inner alignment

davidadDec 11, 2022, 1:53 PM

53 points

13 comments4 min readLW link

[Question] Does agent foundations cover all future ML systems?

Jonas HallgrenJul 25, 2022, 1:17 AM

4 points

0 comments1 min readLW link

Understanding Machine Learning (II)

Rafael HarthDec 22, 2019, 6:28 PM

24 points

4 comments10 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

Mar 13, 2025, 7:09 PM

155 points

41 comments6 min readLW link

Let’s Read: Superhuman AI for multiplayer poker

Yuxi_LiuJul 14, 2019, 6:22 AM

56 points

6 comments8 min readLW link

Google’s Imagen uses larger text encoder

Ben LivengoodMay 24, 2022, 9:55 PM

27 points

2 comments1 min readLW link

Ironing Out the Squiggles

Zack_M_DavisApr 29, 2024, 4:13 PM

157 points

36 comments11 min readLW link

Automated Fact Checking: A Look at the Field

HoagyOct 6, 2021, 11:52 PM

12 points

0 comments8 min readLW link

dalle2 comments

nostalgebraistApr 26, 2022, 5:30 AM

183 points

14 comments13 min readLW link

(nostalgebraist.tumblr.com)

Train first VS prune first in neural networks.

Donald HobsonJul 9, 2022, 3:53 PM

18 points

5 comments2 min readLW link

DeepMind: Generally capable agents emerge from open-ended play

Daniel KokotajloJul 27, 2021, 2:19 PM

247 points

53 comments2 min readLW link

(deepmind.com)

[Question] How do you do hyperparameter searches in ML?

lsusrJan 13, 2020, 3:45 AM

9 points

3 comments1 min readLW link

Touch reality as soon as possible (when doing machine learning research)

LawrenceCJan 3, 2023, 7:11 PM

117 points

9 comments8 min readLW link 1 review

Multimodal Neurons in Artificial Neural Networks

Kaj_SotalaMar 5, 2021, 9:01 AM

57 points

2 comments2 min readLW link

(distill.pub)

Mechanism for feature learning in neural networks and backpropagation-free machine learning models

Matt GoldenbergMar 19, 2024, 2:55 PM

8 points

1 comment1 min readLW link

(www.science.org)

Four usages of “loss” in AI

TurnTroutOct 2, 2022, 12:52 AM

46 points

18 comments4 min readLW link

Understanding Machine Learning (I)

Rafael HarthDec 20, 2019, 6:22 PM

44 points

12 comments11 min readLW link

Exploring toy neural nets under node removal. Section 1.

Donald HobsonApr 13, 2022, 11:30 PM

12 points

7 comments8 min readLW link

NLP Position Paper: When Combatting Hype, Proceed with Caution

Sam BowmanOct 15, 2021, 8:57 PM

46 points

14 comments1 min readLW link

QAPR 4: Inductive biases

Quintin PopeOct 10, 2022, 10:08 PM

67 points

2 comments18 min readLW link

Boring machine learning is where it’s at

George3d6Oct 20, 2021, 11:23 AM

28 points

16 comments3 min readLW link

(cerebralab.com)

linkpost: loss basin visualization

Nathan Helm-BurgerSep 30, 2022, 3:42 AM

14 points

1 comment1 min readLW link

Stable Diffusion has been released

P.Aug 22, 2022, 7:42 PM

15 points

7 comments1 min readLW link

(stability.ai)

Durkon, an open-source tool for Inherently Interpretable Modelling

abstractapplicDec 24, 2022, 1:49 AM

37 points

0 comments4 min readLW link

SGD’s Bias

johnswentworthMay 18, 2021, 11:19 PM

63 points

16 comments3 min readLW link

The inordinately slow spread of good AGI conversations in ML

Rob BensingerJun 21, 2022, 4:09 PM

173 points

62 comments8 min readLW link

Mesa-Optimizers via Grokking

orthonormalDec 6, 2022, 8:05 PM

36 points

4 comments6 min readLW link

ML Systems Will Have Weird Failure Modes

jsteinhardtJan 26, 2022, 1:40 AM

57 points

8 comments6 min readLW link

(bounded-regret.ghost.io)

Diffusion Guided NLP: better steering, mostly a good thing

Nathan Helm-BurgerAug 10, 2024, 7:49 PM

13 points

0 comments1 min readLW link

(arxiv.org)

LOVE in a simbox is all you need

jacob_cannellSep 28, 2022, 6:25 PM

66 points

73 comments44 min readLW link 1 review

Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind

DragonGodJan 13, 2023, 4:53 PM

62 points

12 comments1 min readLW link

(arxiv.org)

UML VIII: Linear Predictors (2)

Rafael HarthJan 26, 2020, 8:09 PM

9 points

2 comments10 min readLW link

Processor clock speeds are not how fast AIs think

Ege ErdilJan 29, 2024, 2:39 PM

135 points

55 comments2 min readLW link

Claude 3 Opus can operate as a Turing machine

Gunnar_ZarnckeApr 17, 2024, 8:41 AM

36 points

2 comments1 min readLW link

(twitter.com)

Announcing Epoch’s dashboard of key trends and figures in Machine Learning

JsevillamolApr 13, 2023, 7:33 AM

35 points

7 comments1 min readLW link

(epochai.org)

Supervised learning of outputs in the brain

Steven ByrnesOct 26, 2020, 2:32 PM

28 points

9 comments10 min readLW link

Emotions = Reward Functions

jpyykkoJan 20, 2022, 6:46 PM

16 points

10 comments5 min readLW link

Neural networks biased towards geometrically simple functions?

DavidHolmesDec 8, 2022, 4:16 PM

16 points

2 comments3 min readLW link

A market is a neural network

David Hugh-JonesSep 15, 2022, 9:53 PM

7 points

4 comments8 min readLW link

My ML Scaling bibliography

gwernOct 23, 2021, 2:41 PM

35 points

9 comments1 min readLW link

(www.gwern.net)

Neural net / decision tree hybrids: a potential path toward bridging the interpretability gap

Nathan Helm-BurgerSep 23, 2021, 12:38 AM

21 points

2 comments12 min readLW link

[MLSN #5]: Prize Compilation

Dan HSep 26, 2022, 9:55 PM

15 points

1 comment2 min readLW link

Inference-Only Debate Experiments Using Math Problems

Arjun Panickssery, Abhimanyu Pallavi Sudhir and JacksonKaunismaa

Aug 6, 2024, 5:44 PM

31 points

0 comments2 min readLW link

Machine Learning Consent

jefftkDec 8, 2022, 3:50 AM

38 points

14 comments3 min readLW link

(www.jefftk.com)

Understanding “Deep Double Descent”

evhubDec 6, 2019, 12:00 AM

151 points

51 comments5 min readLW link 4 reviews

chinchilla’s wild implications

nostalgebraistJul 31, 2022, 1:18 AM

424 points

128 comments10 min readLW link 1 review

AXRP Episode 29 - Science of Deep Learning with Vikrant Varma

DanielFilanApr 25, 2024, 7:10 PM

20 points

1 comment63 min readLW link

If I were a well-intentioned AI… I: Image classifier

Stuart_ArmstrongFeb 26, 2020, 12:39 PM

35 points

4 comments5 min readLW link

A Simple Introduction to Neural Networks

Rafael HarthFeb 9, 2020, 10:02 PM

34 points

13 comments18 min readLW link

We have achieved Noob Gains in AI

phdeadMay 18, 2022, 8:56 PM

117 points

20 comments7 min readLW link

[Question] Impact of ” ‘Let’s think step by step’ is all you need”?

yrimonJul 24, 2022, 8:59 PM

20 points

2 comments1 min readLW link

Place-Based Programming—Part 2 - Functions

lsusrApr 16, 2021, 12:25 AM

14 points

0 comments3 min readLW link

Remaking EfficientZero (as best I can)

HoagyJul 4, 2022, 11:03 AM

36 points

9 comments22 min readLW link

UML V: Convex Learning Problems

Rafael HarthJan 5, 2020, 7:47 PM

14 points

0 comments10 min readLW link

UML VI: Stochastic Gradient Descent

Rafael HarthJan 12, 2020, 9:59 PM

13 points

0 comments10 min readLW link

You should go to ML conferences

Jan_KulveitJul 24, 2024, 11:47 AM

112 points

13 comments4 min readLW link

Neural network polytopes (Colab notebook)

Zach FurmanApr 21, 2023, 10:42 PM

11 points

0 comments1 min readLW link

(colab.research.google.com)

Machine Learning Analogy for Meditation (illustrated)

abramdemskiJun 28, 2018, 10:51 PM

100 points

48 comments1 min readLW link

Search versus design

Alex FlintAug 16, 2020, 4:53 PM

109 points

40 comments36 min readLW link 1 review

UML VII: Meta-Learning

Rafael HarthJan 19, 2020, 6:23 PM

14 points

0 comments15 min readLW link

Understanding Machine Learning (III)

Rafael HarthDec 25, 2019, 6:55 PM

16 points

2 comments11 min readLW link

Safety Implications of LeCun’s path to machine intelligence

Ivan VendrovJul 15, 2022, 9:47 PM

102 points

18 comments6 min readLW link

[Question] Why don’t we have self driving cars yet?

Linda LinseforsNov 14, 2022, 12:19 PM

22 points

16 comments1 min readLW link

“The Bitter Lesson”, an article about compute vs human knowledge in AI

the gears to ascensionJun 21, 2019, 5:24 PM

52 points

14 comments4 min readLW link

(www.incompleteideas.net)

Place-Based Programming—Part 1 - Places

lsusrApr 14, 2021, 10:18 PM

32 points

18 comments2 min readLW link

“Deep Learning” Is Function Approximation

Zack_M_DavisMar 21, 2024, 5:50 PM

98 points

28 comments10 min readLW link

(zackmdavis.net)

AlphaStar: Impressive for RL progress, not for AGI progress

orthonormalNov 2, 2019, 1:50 AM

113 points

58 comments2 min readLW link 1 review

KAN: Kolmogorov-Arnold Networks

Gunnar_ZarnckeMay 1, 2024, 4:50 PM

18 points

15 comments1 min readLW link

(arxiv.org)

New Scaling Laws for Large Language Models

1a3ornApr 1, 2022, 8:41 PM

246 points

22 comments5 min readLW link

[Question] Why no major LLMs with memory?

Kaj_SotalaMar 28, 2023, 4:34 PM

42 points

15 comments1 min readLW link

Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible

Sam BowmanAug 31, 2022, 1:39 AM

91 points

6 comments2 min readLW link

Experimentation with AI-generated images (VQGAN+CLIP) | Solarpunk airships fleeing a dragon

Kaj_SotalaJul 15, 2021, 11:00 AM

44 points

4 comments2 min readLW link

(kajsotala.fi)

Mech Interp Challenge: October—Deciphering the Sorted List Model

CallumMcDougallOct 3, 2023, 10:57 AM

23 points

0 comments3 min readLW link

Understanding and controlling auto-induced distributional shift

L Rudolf LDec 13, 2021, 2:59 PM

33 points

4 comments16 min readLW link

Apply to be a TA for TARA

yanni kyriacosDec 20, 2024, 2:25 AM

10 points

0 comments1 min readLW link

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceCFeb 16, 2023, 7:47 PM

65 points

9 comments1 min readLW link

(arxiv.org)

Tabooing ‘Agent’ for Prosaic Alignment

Hjalmar_WijkAug 23, 2019, 2:55 AM

57 points

10 comments6 min readLW link

Apply to a small iteration of MLAB to be run in Oxford

RP, MariaK and OliverHayman

Aug 27, 2023, 2:21 PM

12 points

0 comments1 min readLW link

The Limits of Automation

milkandcigarettesJun 23, 2022, 6:03 PM

5 points

1 comment5 min readLW link

(milkandcigarettes.com)

Model Depth as Panacea and Obfuscator

abstractapplicNov 9, 2020, 12:02 AM

8 points

3 comments15 min readLW link

Imitation Learning from Language Feedback

Jérémy Scheurer, Tomek Korbak and Ethan Perez

Mar 30, 2023, 2:11 PM

71 points

3 comments10 min readLW link

No free lunch theorem is irrelevant

CatneeOct 4, 2022, 12:21 AM

18 points

7 comments1 min readLW link

Path dependence in ML inductive biases

Vivek Hebbar and evhub

Sep 10, 2022, 1:38 AM

68 points

13 comments10 min readLW link

CNN feature visualization in 50 lines of code

StefanHexMay 26, 2022, 11:02 AM

17 points

4 comments5 min readLW link

Basic Mathematics of Predictive Coding

Adam ShaiSep 29, 2023, 2:38 PM

49 points

6 comments9 min readLW link

Epoch AI is hiring a CTO!

merilalama and Jaime Sevilla Molina

Apr 2, 2025, 8:29 PM

7 points

0 comments2 min readLW link

(careers.epoch.ai)

Estimating the Probability of Sampling a Trained Neural Network at Random

Adam Scherlis and Nora Belrose

Mar 1, 2025, 2:11 AM

32 points

10 comments1 min readLW link

(arxiv.org)

DeepMind article: AI Safety Gridworlds

scarcegreengrassNov 30, 2017, 4:13 PM

25 points

6 comments1 min readLW link

(deepmind.com)

[Question] Book recommendations for the history of ML?

Eleni AngelouDec 28, 2022, 11:50 PM

2 points

2 comments1 min readLW link

Analyzing how SAE features evolve across a forward pass

bensenberner, danibalcells, Michael Oesterle, Ediz Ucar and StefanHex

Nov 7, 2024, 10:07 PM

47 points

0 comments1 min readLW link

(arxiv.org)

[Question] Vector search on a large dataset?

camsdixonNov 10, 2023, 6:43 PM

−1 points

2 comments1 min readLW link

[Question] What Is the Idea Behind (Un-)Supervised Learning and Reinforcement Learning?

MorpheusSep 30, 2022, 4:48 PM

9 points

6 comments2 min readLW link

The Machine Learning Personality Test

PhilGoetzAug 4, 2009, 11:36 PM

31 points

34 comments6 min readLW link

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)

RP and agg

Feb 9, 2024, 7:00 AM

50 points

6 comments3 min readLW link

Beyond Gaussian: Language Model Representations and Distributions

Matt LevinsonNov 24, 2024, 1:53 AM

6 points

1 comment5 min readLW link

Research Adenda: Modelling Trajectories of Language Models

NickyPNov 13, 2023, 2:33 PM

28 points

0 comments12 min readLW link

The shallow reality of ‘deep learning theory’

Jesse HooglandFeb 22, 2023, 4:16 AM

34 points

11 comments3 min readLW link

(www.jessehoogland.com)

Yann LeCun, A Path Towards Autonomous Machine Intelligence [link]

Bill BenzonJun 27, 2022, 11:29 PM

5 points

1 comment1 min readLW link

A Generalization of ROC AUC for Binary Classifiers

Adam ScherlisDec 4, 2021, 9:47 PM

10 points

0 comments2 min readLW link

(adam.scherlis.com)

Patterns or getting to Objective Truth – A thought piece on Artificial Intelligence

Thehumanproject.aiOct 20, 2024, 4:45 PM

1 point

0 comments8 min readLW link

Does ChatGPT know what a tragedy is?

Bill BenzonDec 31, 2023, 7:10 AM

2 points

4 comments5 min readLW link

VC Theory Overview

Joar SkalseJul 2, 2023, 10:45 PM

12 points

2 comments11 min readLW link

If you want to learn technical AI safety, here’s a list of AI safety courses, reading lists, and resources

KatWoodsOct 3, 2022, 12:43 PM

12 points

3 comments1 min readLW link

Making a Difference Tempore: Insights from ‘Reinforcement Learning: An Introduction’

TurnTroutJul 5, 2018, 12:34 AM

33 points

6 comments8 min readLW link

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

DragonGodDec 6, 2017, 6:01 AM

13 points

4 comments1 min readLW link

(arxiv.org)

Virtual Machine Learning Conferences: The Good and the Bad

libaiAug 29, 2021, 7:26 PM

4 points

0 comments3 min readLW link

Deconfusing In-Context Learning

Arjun PanicksseryFeb 25, 2024, 9:48 AM

37 points

1 comment2 min readLW link

ChatGPT Plays 20 Questions [sometimes needs help]

Bill BenzonOct 17, 2023, 5:30 PM

5 points

3 comments12 min readLW link

Technical model refinement formalism

Stuart_ArmstrongAug 27, 2020, 11:54 AM

19 points

0 comments6 min readLW link

Rethinking Batch Normalization

Matthew BarnettAug 2, 2019, 8:21 PM

20 points

5 comments8 min readLW link

[Question] How do biological or spiking neural networks learn?

Dom PolsinelliJan 31, 2025, 4:03 PM

2 points

1 comment2 min readLW link

Proliferating Education

Haris RashidDec 20, 2022, 7:22 PM

−1 points

2 comments5 min readLW link

(www.harisrab.com)

Subjective AI/ML Digest: April II

Boris TApr 24, 2023, 6:33 PM

1 point

0 comments1 min readLW link

(borisagain.substack.com)

AIOS

samhealyDec 31, 2023, 1:23 PM

−3 points

5 comments6 min readLW link

RFC: a tool to create a ranked list of projects in explainable AI

eamagApr 6, 2025, 9:18 PM

2 points

0 comments1 min readLW link

(eamag.me)

Unity Gridworlds

WillPetilloOct 15, 2023, 4:36 AM

9 points

0 comments1 min readLW link

[Paper] Trajectories through semantic spaces in schizophrenia and the relationship to ripple bursts

bvbvbvbvbvbvbvbvbvbvbvDec 15, 2023, 1:37 PM

3 points

0 comments1 min readLW link

(www.pnas.org)

I’m a bit skeptical of AlphaFold 3

Oleg TrottJun 25, 2024, 12:04 AM

87 points

14 comments2 min readLW link

[Question] Why hasn’t deep learning generated significant economic value yet?

Alex_AltairApr 30, 2022, 8:27 PM

114 points

89 comments2 min readLW link

Announcing Epoch’s newly expanded Parameters, Compute and Data Trends in Machine Learning database

Robi Rahman, Jaime Sevilla Molina, Tamay, Ege Erdil, Pablo Villalobos, Ben Cottier and Matthew Barnett

Oct 25, 2023, 2:55 AM

18 points

0 comments1 min readLW link

(epochai.org)

Updating the Lottery Ticket Hypothesis

johnswentworthApr 18, 2021, 9:45 PM

73 points

41 comments2 min readLW link

Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)

mattmacdermottSep 1, 2024, 7:46 AM

26 points

0 comments5 min readLW link

(yoshuabengio.org)

ChatGPT’s Ontological Landscape

Bill BenzonNov 1, 2023, 3:12 PM

7 points

0 comments4 min readLW link

If language is for communication, what does that imply about LLMs?

Bill BenzonMay 12, 2024, 2:55 AM

10 points

0 comments1 min readLW link

Trends in Training Dataset Sizes

Pablo VillalobosSep 21, 2022, 3:47 PM

25 points

2 comments5 min readLW link

(epochai.org)

On AI and Compute

johncroxApr 3, 2019, 7:00 PM

36 points

10 comments5 min readLW link

Is this the beginning of the end for LLMS [as the royal road to AGI, whatever that is]?

Bill BenzonAug 24, 2023, 2:50 PM

3 points

15 comments3 min readLW link

Brains, Planes, Blimps, and Algorithms

ai danOct 18, 2023, 9:26 PM

1 point

0 comments6 min readLW link

On possible cross-fertilization between AI and neuroscience [Creativity]

Bill BenzonNov 27, 2023, 4:50 PM

15 points

22 comments7 min readLW link

Degeneracies are sticky for SGD

Guillaume Corlouer and Nicolas Macé

Jun 16, 2024, 9:19 PM

56 points

1 comment16 min readLW link

[Question] Are Speed Superintelligences Feasible for Modern ML Techniques?

DragonGodSep 14, 2022, 12:59 PM

9 points

7 comments1 min readLW link

[Linkpost] AlphaFold: a solution to a 50-year-old grand challenge in biology

adamShimiNov 30, 2020, 5:33 PM

54 points

22 comments1 min readLW link

(deepmind.com)

Optimizing a Week of Machine Learning Learning

RaemonJan 9, 2018, 6:55 AM

8 points

2 comments3 min readLW link

[Question] Why does gradient descent always work on neural networks?

MichaelDickensMay 20, 2022, 9:13 PM

15 points

11 comments1 min readLW link

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

RobertKirkJul 20, 2023, 9:56 AM

39 points

2 comments5 min readLW link

[Question] Question about Test-sets and Bayesian machine learning

Haziq MuhammadAug 9, 2021, 5:16 PM

2 points

8 comments1 min readLW link

Singularities against the Singularity: Announcing Workshop on Singular Learning Theory and Alignment

Jesse Hoogland, Alexander Gietelink Oldenziel and Daniel Murfet

Apr 1, 2023, 9:58 AM

87 points

0 comments1 min readLW link

(singularlearningtheory.com)

Interview Daniel Murfet on Universal Phenomena in Learning Machines

Alexander Gietelink OldenzielFeb 6, 2023, 12:00 AM

50 points

1 comment16 min readLW link

Bioinformatics 101

iy3dJan 22, 2023, 2:36 AM

5 points

0 comments4 min readLW link

[Question] GPT-3 + GAN

stick109Oct 17, 2020, 7:58 AM

4 points

3 comments1 min readLW link

Architecture-aware optimisation: train ImageNet and more without hyperparameters

Chris MingardApr 22, 2023, 9:50 PM

6 points

2 comments2 min readLW link

Creating Interpretable Latent Spaces with Gradient Routing

Jacob G-WDec 14, 2024, 4:00 AM

26 points

6 comments2 min readLW link

(jacobgw.com)

Basic Facts about Language Model Internals

beren and Eric Winsor

Jan 4, 2023, 1:01 PM

130 points

19 comments9 min readLW link

Generative ML in chemistry is bottlenecked by synthesis

Abhishaike MahajanSep 16, 2024, 4:31 PM

38 points

2 comments14 min readLW link

(www.owlposting.com)

“Toward Safe Self-Evolving AI: Modular Memory and Post-Deployment Alignment”

Manasa DwarapureddyMay 2, 2025, 5:02 PM

1 point

0 comments3 min readLW link

Learning with catastrophes

paulfchristianoJan 23, 2019, 3:01 AM

27 points

9 comments4 min readLW link

[Proposal] Method of locating useful subnets in large models

Quintin PopeOct 13, 2021, 8:52 PM

9 points

0 comments2 min readLW link

The positional embedding matrix and previous-token heads: how do they actually work?

AdamYedidiaAug 10, 2023, 1:58 AM

26 points

4 comments13 min readLW link

Pong from pixels without reading “Pong from Pixels”

Ian McKenzieAug 29, 2020, 5:26 PM

17 points

1 comment7 min readLW link

faster latent diffusion

bhauthJul 2, 2023, 1:30 AM

10 points

8 comments2 min readLW link

(www.bhauth.com)

GPT-2′s positional embedding matrix is a helix

AdamYedidiaJul 21, 2023, 4:16 AM

44 points

21 comments4 min readLW link

Leveraging Legal Informatics to Align AI

John NaySep 18, 2022, 8:39 PM

11 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

[Question] How do top AI labs vet architecture/algorithm changes?

Jemal YoungMay 8, 2024, 4:47 PM

3 points

5 comments1 min readLW link

Deep Q-Networks Explained

Jay BaileySep 13, 2022, 12:01 PM

58 points

8 comments20 min readLW link

Krueger Lab AI Safety Internship 2024

Joey BreamJan 24, 2024, 7:17 PM

3 points

0 comments1 min readLW link

What If AI Recognized Meaning? An Inquiry into “Resonant Recognition”

JD___Feb 5, 2025, 9:24 PM

0 points

0 comments2 min readLW link

LDL 2: Nonconvex Optimization

magfrumpOct 20, 2017, 6:20 PM

13 points

13 comments4 min readLW link

[Question] Why isn’t JS a popular language for deep learning?

Will ClarkOct 8, 2020, 2:36 PM

12 points

21 comments1 min readLW link

“Genlangs” and Zipf’s Law: Do languages generated by ChatGPT statistically look human?

Justin-DiamondJan 31, 2024, 6:30 PM

2 points

2 comments1 min readLW link

(arxiv.org)

Question 1: Predicted architecture of AGI learning algorithm(s)

Cameron BergFeb 10, 2022, 5:22 PM

13 points

1 comment7 min readLW link

ChatGPT refuses to accept a challenge where it would get shot between the eyes [game theory]

Bill BenzonFeb 20, 2024, 4:55 PM

4 points

6 comments4 min readLW link

The Shard Theory Alignment Scheme

David UdellAug 25, 2022, 4:52 AM

47 points

32 comments2 min readLW link

Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained

Zeping YuDec 26, 2023, 12:36 AM

7 points

1 comment11 min readLW link

ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking.

Bill BenzonJan 24, 2023, 7:05 PM

5 points

0 comments5 min readLW link

Using machine learning to predict romantic compatibility: empirical results

JonahSDec 17, 2014, 2:54 AM

37 points

18 comments11 min readLW link

Cheap Model → Big Model design

Maxwell PetersonNov 19, 2023, 10:50 PM

15 points

2 comments7 min readLW link

Finding Skeletons on Rashomon Ridge

David Udell, Peter S. Park and NickyP

Jul 24, 2022, 10:31 PM

30 points

2 comments7 min readLW link

Alex Irpan: “My AI Timelines Have Sped Up”

VaniverAug 19, 2020, 4:23 PM

43 points

20 comments1 min readLW link

(www.alexirpan.com)

Observations on self-supervised Learning for vision

Dinkar JuyalMar 10, 2025, 7:31 PM

3 points

0 comments5 min readLW link

Report on Analyzing Connotation Frames in Evolving Wikipedia Biographies

MairaAug 30, 2023, 10:02 PM

1 point

0 comments4 min readLW link

Empirical risk minimization is fundamentally confused

Jesse HooglandMar 22, 2023, 4:58 PM

32 points

8 comments1 min readLW link

Magical Categories

Eliezer YudkowskyAug 24, 2008, 7:51 PM

74 points

143 comments9 min readLW link

Language models can explain neurons in language models

nzMay 9, 2023, 5:29 PM

23 points

0 comments1 min readLW link

(openai.com)

A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

rifeJan 15, 2025, 10:59 PM

57 points

31 comments2 min readLW link

Inductive biases stick around

evhubDec 18, 2019, 7:52 PM

64 points

15 comments3 min readLW link

Review Report of Davidson on Takeoff Speeds (2023)

Trent KannegieterDec 22, 2023, 6:48 PM

37 points

11 comments38 min readLW link

Conceptual coherence for concrete categories in humans and LLMs

Bill BenzonDec 9, 2023, 11:49 PM

13 points

1 comment2 min readLW link

Technical comparison of Deepseek, Novasky, S1, Helix, P0

JuliezhangggFeb 25, 2025, 4:20 AM

8 points

0 comments5 min readLW link

Sleeper agents appear resilient to activation steering

Lucy WingardFeb 3, 2025, 7:31 PM

4 points

0 comments7 min readLW link

The (local) unit of intelligence is FLOPs

boazbarakJun 5, 2023, 6:23 PM

42 points

7 comments5 min readLW link

The slingshot helps with learning

Wilson WuOct 31, 2024, 11:18 PM

33 points

0 comments8 min readLW link

Mechanistically interpreting time in GPT-2 small

rgould, Elizabeth Ho and Arthur Conmy

Apr 16, 2023, 5:57 PM

68 points

6 comments21 min readLW link

is gpt-3 few-shot ready for real applications?

nostalgebraistAug 3, 2020, 7:50 PM

31 points

5 comments9 min readLW link

(nostalgebraist.tumblr.com)

Approximation is expensive, but the lunch is cheap

Jesse Hoogland and Zach Furman

Apr 19, 2023, 2:19 PM

70 points

3 comments16 min readLW link

Declarative Mathematics

johnswentworthMar 21, 2019, 7:05 PM

59 points

10 comments3 min readLW link

Which AI Safety Benchmark Do We Need Most in 2025?

Loïc Cabannes and William Ludington

Nov 17, 2024, 11:50 PM

2 points

2 comments8 min readLW link

Grokking Beyond Neural Networks

Jack MillerOct 30, 2023, 5:28 PM

10 points

0 comments2 min readLW link

(arxiv.org)

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2

StefanHex and Marius Hobbhahn

May 25, 2023, 3:37 PM

71 points

1 comment13 min readLW link

Six (and a half) intuitions for KL divergence

CallumMcDougallOct 12, 2022, 9:07 PM

171 points

27 comments10 min readLW link 1 review

(www.perfectlynormal.co.uk)

Challenge proposal: smallest possible self-hardening backdoor for RLHF

Christopher KingJun 29, 2023, 4:56 PM

7 points

0 comments2 min readLW link

Reinterpreting “AI and Compute”

habrykaDec 25, 2018, 9:12 PM

30 points

9 comments1 min readLW link

(aiimpacts.org)

Artificial Intelligence and Life Sciences (Why Big Data is not enough to capture biological systems?)

HansNaujJan 15, 2020, 1:59 AM

6 points

3 comments6 min readLW link

Which of these five AI alignment research projects ideas are no good?

rmoehnAug 8, 2019, 7:17 AM

25 points

13 comments1 min readLW link

Skilling-up in ML Engineering for Alignment: request for comments

CallumMcDougallApr 23, 2022, 3:11 PM

19 points

0 comments1 min readLW link

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

Joar SkalseFeb 28, 2025, 7:20 PM

26 points

4 comments14 min readLW link

Revisiting the Manifold Hypothesis

Aidan RockeOct 1, 2023, 11:55 PM

13 points

19 comments4 min readLW link

[Question] Can this model grade a test without knowing the answers?

ElizabethAug 31, 2019, 12:53 AM

20 points

3 comments1 min readLW link

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher KingFeb 22, 2023, 4:49 PM

1 point

7 comments1 min readLW link

resolving some neural network mysteries

bhauthJun 19, 2023, 12:09 AM

44 points

6 comments2 min readLW link

(www.bhauth.com)

How to Contribute to Theoretical Reward Learning Research

Joar SkalseFeb 28, 2025, 7:27 PM

16 points

0 comments21 min readLW link

Miriam Yevick on why both symbols and networks are necessary for artificial minds

Bill BenzonJun 6, 2022, 8:34 AM

1 point

0 comments4 min readLW link

The “Outside the Box” Box

Eliezer YudkowskyOct 12, 2007, 10:50 PM

94 points

51 comments2 min readLW link

[Question] Terminology: <something>-ware for ML?

Oliver SourbutJan 3, 2024, 11:42 AM

17 points

27 comments1 min readLW link

Race Along Rashomon Ridge

Stephen Fowler, Peter S. Park and MichaelEinhorn

Jul 7, 2022, 3:20 AM

50 points

15 comments8 min readLW link

An Introduction to Representation Engineering—an activation-based paradigm for controlling LLMs

Jan WehnerJul 14, 2024, 10:37 AM

37 points

6 comments17 min readLW link

User-inclination-guessing algorithms: registering a goal

ProgramCrafterMar 20, 2024, 3:55 PM

2 points

0 comments2 min readLW link

A multi-disciplinary view on AI safety research

Roman LeventovFeb 8, 2023, 4:50 PM

46 points

4 comments26 min readLW link

Summary of ML Safety Course

zeshenSep 27, 2022, 1:05 PM

7 points

0 comments6 min readLW link

Features and Adversaries in MemoryDT

Joseph Bloom and Jay Bailey

Oct 20, 2023, 7:32 AM

31 points

6 comments25 min readLW link

A dialectical view of the history of AI, Part 1: We’re only in the antithesis phase. [A synthesis is in the future.]

Bill BenzonNov 16, 2023, 12:34 PM

6 points

0 comments12 min readLW link

Link: Interview with Vladimir Vapnik

Daniel_BurfootJul 25, 2009, 1:36 PM

22 points

7 comments2 min readLW link

Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

Erich_GrunewaldOct 8, 2023, 12:14 PM

12 points

7 comments2 min readLW link

(arxiv.org)

Deep neural networks are not opaque.

jem-mosigJul 6, 2022, 6:03 PM

22 points

14 comments3 min readLW link

Neuroevolution, Social Intelligence, and Logic

vinnik.dmitry07May 31, 2023, 5:54 PM

1 point

0 comments10 min readLW link

What I am working on right now and why: representation engineering edition

Lukasz G BartoszczeMar 18, 2025, 10:37 PM

3 points

0 comments3 min readLW link

Brief Notes on Transformers

Adam JermynSep 26, 2022, 2:46 PM

48 points

3 comments2 min readLW link

AI’s impact on biology research: Part I, today

octopoctaDec 23, 2023, 4:29 PM

31 points

6 comments2 min readLW link

A primer on ML in antibody engineering

Abhishaike MahajanSep 23, 2024, 5:03 PM

11 points

0 comments25 min readLW link

(www.owlposting.com)

On the Importance of Open Sourcing Reward Models

elandgreJan 2, 2023, 7:01 PM

18 points

5 comments6 min readLW link

[Question] What should I do? (long term plan about starting an AI lab)

not_a_catJun 9, 2024, 12:45 AM

2 points

1 comment2 min readLW link

A compilation of misuses of statistics

Younes KamelFeb 14, 2022, 9:53 PM

4 points

11 comments13 min readLW link

(youneskamel.substack.com)

Aligning an H-JEPA agent via training on the outputs of an LLM-based “exemplary actor”

Roman LeventovMay 29, 2023, 11:08 AM

12 points

10 comments30 min readLW link

Better antibodies by engineering targets, not engineering antibodies (Nabla Bio)

Abhishaike MahajanJan 13, 2025, 3:05 PM

4 points

0 comments14 min readLW link

(www.owlposting.com)

Misspecification in Inverse Reinforcement Learning

Joar SkalseFeb 28, 2025, 7:24 PM

19 points

0 comments11 min readLW link

Parameter counts in Machine Learning

Jsevillamol and Pablo Villalobos

Jun 19, 2021, 4:04 PM

47 points

18 comments7 min readLW link

[Question] Where to begin in ML/AI?

Jake the StudentApr 6, 2023, 8:45 PM

9 points

4 comments1 min readLW link

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

alamertonApr 18, 2024, 6:29 PM

25 points

4 comments16 min readLW link

A Primer on Matrix Calculus, Part 2: Jacobians and other fun

Matthew BarnettAug 15, 2019, 1:13 AM

22 points

7 comments7 min readLW link

Behavior Cloning is Miscalibrated

leogaoDec 5, 2021, 1:36 AM

78 points

3 comments3 min readLW link

There is a globe in your LLM

jacob_droriOct 8, 2024, 12:43 AM

89 points

4 comments1 min readLW link

“Decision Transformer” (Tool AIs are secret Agent AIs)

gwernJun 9, 2021, 1:06 AM

37 points

4 comments1 min readLW link

(sites.google.com)

Evidence Sets: Towards Inductive-Biases based Analysis of Prosaic AGI

bayesian_kittenDec 16, 2021, 10:41 PM

22 points

10 comments21 min readLW link

“model scores” is a questionable concept

Maxwell PetersonNov 6, 2020, 3:19 AM

26 points

0 comments6 min readLW link

Week One of Studying Transformers Architecture

JustisMillsJun 20, 2024, 3:47 AM

3 points

0 comments15 min readLW link

(justismills.substack.com)

Expanding the Scope of Superposition

Derek LarsonSep 13, 2023, 5:38 PM

10 points

0 comments4 min readLW link

New paper: The Incentives that Shape Behaviour

RyanCareyJan 23, 2020, 7:07 PM

23 points

5 comments1 min readLW link

(arxiv.org)

Scaling laws vs individual differences

berenJan 10, 2023, 1:22 PM

45 points

21 comments7 min readLW link

Truthful LMs as a warm-up for aligned AGI

Jacob_HiltonJan 17, 2022, 4:49 PM

65 points

14 comments13 min readLW link

Apply for the ML Upskilling Winter Camp in Cambridge, UK [2-10 Jan]

hannah wing-yeeDec 2, 2022, 8:45 PM

3 points

0 comments2 min readLW link

AI Alignment Research Engineer Accelerator (ARENA): call for applicants

CallumMcDougallApr 17, 2023, 8:30 PM

100 points

9 comments7 min readLW link

From Simon’s ant to machine learning, a parable

Bill BenzonJan 4, 2023, 2:37 PM

6 points

5 comments2 min readLW link

On precise out-of-context steering

Olli JärviniemiMay 3, 2024, 9:41 AM

9 points

6 comments3 min readLW link

DeepSeek-R1 for Beginners

Anton RazzhigaevFeb 5, 2025, 6:58 PM

12 points

0 comments8 min readLW link

The Efficient Market Hypothesis in Research

libaiJul 8, 2021, 5:00 PM

11 points

9 comments3 min readLW link

Can you force a neural network to keep generalizing?

Q HomeSep 12, 2022, 10:14 AM

2 points

10 comments5 min readLW link

[Question] Algorithms vs Compute

johnswentworthJan 28, 2020, 5:34 PM

26 points

11 comments1 min readLW link

Transformer language models are doing something more general

NumendilAug 3, 2022, 9:13 PM

53 points

6 comments2 min readLW link

LDL 4: Big data is a pain in the ass

magfrumpOct 25, 2017, 8:59 PM

6 points

0 comments3 min readLW link

Introducing the WeirdML Benchmark

Håvard Tveit IhleJan 16, 2025, 11:38 AM

56 points

13 comments11 min readLW link

Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

rifeJan 6, 2025, 5:34 PM

4 points

20 comments1 min readLW link

(awakenmoon.ai)

[Aspiration-based designs] 2. Formal framework, basic algorithm

Jobst Heitzig, Simon Dima and Simon Fischer

Apr 28, 2024, 1:02 PM

18 points

2 comments16 min readLW link

The need for multi-agent experiments

Martín SotoAug 1, 2024, 5:14 PM

43 points

3 comments9 min readLW link

math terminology as convolution

bhauthOct 30, 2023, 1:05 AM

34 points

1 comment4 min readLW link

(www.bhauth.com)

The future of Humans: Operators of AI

François-Joseph LacroixDec 30, 2023, 11:46 PM

1 point

0 comments1 min readLW link

(medium.com)

Practical Pitfalls of Causal Scrubbing

Jérémy Scheurer, Phil3, tony, jacquesthibs and David Lindner

Mar 27, 2023, 7:47 AM

87 points

17 comments13 min readLW link

Solving adversarial attacks in computer vision as a baby version of general AI alignment

Stanislav FortAug 29, 2024, 5:17 PM

89 points

8 comments7 min readLW link

Elements of Computational Philosophy, Vol. I: Truth

Paul Bricman and Tom Feeney

Jul 1, 2023, 11:44 AM

12 points

6 comments1 min readLW link

(compphil.github.io)

“Designing agent incentives to avoid reward tampering”, DeepMind

gwernAug 14, 2019, 4:57 PM

28 points

15 comments1 min readLW link

(medium.com)

[Question] What are the most important papers/post/resources to read to understand more of GPT-3?

adamShimiAug 2, 2020, 8:53 PM

22 points

4 comments1 min readLW link

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails

Louka Ewington-PitsosSep 17, 2024, 3:52 AM

6 points

2 comments7 min readLW link

Models of life

Abhishaike MahajanSep 29, 2024, 7:24 PM

8 points

0 comments16 min readLW link

(www.asimov.press)

Machines vs Memes Part 3: Imitation and Memes

ceru23Jun 1, 2022, 1:36 PM

7 points

0 comments7 min readLW link

Against sacrificing AI transparency for generality gains

Ape in the coatMay 7, 2023, 6:52 AM

4 points

0 comments2 min readLW link

Measuring Predictability of Persona Evaluations

Thee Ho and evhub

Apr 6, 2024, 8:46 AM

20 points

0 comments7 min readLW link

Self-Supervised Learning and AGI Safety

Steven ByrnesAug 7, 2019, 2:21 PM

30 points

9 comments12 min readLW link

[Question] When did Eliezer Yudkowsky change his mind about neural networks?

[deactivated]Nov 14, 2023, 9:24 PM

31 points

15 comments1 min readLW link

Beginning Machine Learning

crybxApr 30, 2018, 3:54 PM

12 points

4 comments6 min readLW link

Testing “True” Language Understanding in LLMs: A Simple Proposal

MtryaSamNov 2, 2024, 7:12 PM

−3 points

0 comments2 min readLW link

Developmental Stages in Multi-Problem Grokking

James SullivanSep 29, 2024, 6:58 PM

4 points

0 comments6 min readLW link

Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection

Oliver SourbutMay 9, 2022, 9:38 PM

70 points

19 comments8 min readLW link 1 review

(www.oliversourbut.net)

Testing “True” Language Understanding in LLMs: A Simple Proposal

MtryaSamNov 2, 2024, 7:12 PM

9 points

2 comments2 min readLW link

The Weighted Majority Algorithm

Eliezer YudkowskyNov 12, 2008, 11:19 PM

23 points

96 comments10 min readLW link

Deep learning—deeper flaws?

Richard_NgoSep 24, 2018, 6:40 PM

39 points

17 comments4 min readLW link

(thinkingcomplete.blogspot.com)

My Thoughts on the ML Safety Course

zeshenSep 27, 2022, 1:15 PM

50 points

3 comments17 min readLW link

Mech Interp Challenge: August—Deciphering the First Unique Character Model

CallumMcDougallAug 9, 2023, 7:14 PM

36 points

1 comment3 min readLW link

Addressing doubts of AI progress: Why GPT-5 is not late, and why data scarcity isn’t a fundamental limiter near term.

LDJJan 17, 2025, 6:53 PM

2 points

0 comments2 min readLW link

Secret Collusion: Will We Know When to Unplug AI?

schroederdewitt, srm, MikhailB, Lewis Hammond, chansmi and sofmonk

Sep 16, 2024, 4:07 PM

57 points

7 comments31 min readLW link

Machine Learning Model Sizes and the Parameter Gap [abridged]

Pablo VillalobosJul 18, 2022, 4:51 PM

20 points

0 comments1 min readLW link

(epochai.org)

Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability

ntt123Jun 17, 2024, 11:46 AM

5 points

4 comments6 min readLW link

(neuralblog.github.io)

Patterns or getting to Objective Truth – A thought piece on Artificial Intelligence

Thehumanproject.aiOct 20, 2024, 4:45 PM

1 point

0 comments8 min readLW link

LDL 7: I wish I had a map

magfrumpNov 30, 2017, 2:03 AM

13 points

2 comments3 min readLW link

[Question] Is the competition/cooperation between symbolic AI and statistical AI (ML) about historical approach to research / engineering, or is it more fundamentally about what intelligent agents “are”?

Edward HammondFeb 17, 2022, 11:11 PM

1 point

1 comment2 min readLW link

An analysis of the Less Wrong D&D.Sci 4th Edition game

Maxwell PetersonOct 4, 2021, 12:03 AM

18 points

7 comments5 min readLW link

Compute Trends Across Three eras of Machine Learning

Jsevillamol, Pablo Villalobos, lennart, Marius Hobbhahn, Tamay Besiroglu and anson.ho

Feb 16, 2022, 2:18 PM

94 points

13 comments2 min readLW link

Framing AI Childhoods

David UdellSep 6, 2022, 11:40 PM

37 points

8 comments4 min readLW link

The Unreasonable Effectiveness of Deep Learning

Richard_NgoSep 30, 2018, 3:48 PM

86 points

5 comments13 min readLW link

(thinkingcomplete.blogspot.com)

Steganography in Chain of Thought Reasoning

A RayAug 8, 2022, 3:47 AM

62 points

13 comments6 min readLW link

What’s going on with Per-Component Weight Updates?

4gateAug 22, 2024, 9:22 PM

1 point

0 comments6 min readLW link

Google DeepMind’s RT-2

SandXboxAug 11, 2023, 11:26 AM

9 points

1 comment1 min readLW link

(robotics-transformer2.github.io)

[Question] What is a training “step” vs. “episode” in machine learning?

Evan R. MurphyApr 28, 2022, 9:53 PM

10 points

4 comments1 min readLW link

Epoch is hiring an ML Distributed Systems Senior Researcher

merilalama and Jaime Sevilla Molina

Nov 24, 2023, 10:33 PM

2 points

0 comments4 min readLW link

(careers.rethinkpriorities.org)

Lessons After a Couple Months of Trying to Do ML Research

RowanWangMar 22, 2022, 11:45 PM

70 points

8 comments6 min readLW link

Causality and a Cost Semantics for Neural Networks

scottviteriAug 21, 2023, 9:02 PM

22 points

1 comment1 min readLW link

Compute Trends — Comparison to OpenAI’s AI and Compute

lennart, Jsevillamol, Pablo Villalobos, Marius Hobbhahn, Tamay Besiroglu and anson.ho

Mar 12, 2022, 6:09 PM

23 points

3 comments3 min readLW link

Quantum Advantage in Learning from Experiments

Dennis TowneJul 27, 2022, 3:49 PM

5 points

5 comments1 min readLW link

(ai.googleblog.com)

Frequentist practice incorporates prior information all the time

Maxwell PetersonNov 7, 2020, 8:43 PM

18 points

0 comments4 min readLW link

o1-preview is pretty good at doing ML on an unknown dataset

Håvard Tveit IhleSep 20, 2024, 8:39 AM

67 points

1 comment2 min readLW link

The subset parity learning problem: much more than you wanted to know

Dmitry VaintrobJan 3, 2025, 9:13 AM

94 points

18 comments11 min readLW link

The Perceptron Controversy

Yuxi_LiuJan 10, 2024, 11:07 PM

65 points

18 comments1 min readLW link

(yuxi-liu-wired.github.io)

Why Gradients Vanish and Explode

Matthew BarnettAug 9, 2019, 2:54 AM

25 points

9 comments3 min readLW link

Worse Than Random

Eliezer YudkowskyNov 11, 2008, 7:01 PM

46 points

102 comments12 min readLW link

From No Mind to a Mind – A Conversation That Changed an AI

parthibanarjuna sFeb 7, 2025, 11:50 AM

1 point

0 comments3 min readLW link

Understanding LLMs: Some basic observations about words, syntax, and discourse [w/ a conjecture about grokking]

Bill BenzonOct 11, 2023, 7:13 PM

6 points

0 comments5 min readLW link

Model splintering: moving from one imperfect model to another

Stuart_ArmstrongAug 27, 2020, 11:53 AM

79 points

10 comments33 min readLW link

Predicting AGI by the Turing Test

Yuxi_LiuJan 22, 2024, 4:22 AM

21 points

2 comments10 min readLW link

(yuxi-liu-wired.github.io)

Speculation on Path-Dependance in Large Language Models.

NickyPJan 15, 2023, 8:42 PM

16 points

2 comments7 min readLW link

Learning societal values from law as part of an AGI alignment strategy

John NayOct 21, 2022, 2:03 AM

5 points

18 comments54 min readLW link

Logical or Connectionist AI?

Eliezer YudkowskyNov 17, 2008, 8:03 AM

46 points

26 comments9 min readLW link

How LLMs Learn: What We Know, What We Don’t (Yet) Know, and What Comes Next

JonasbJul 9, 2024, 9:58 AM

2 points

0 comments16 min readLW link

(www.denominations.io)

GAN Discriminators Don’t Generalize?

tryactionsJun 8, 2020, 8:36 PM

18 points

7 comments2 min readLW link

Other Papers About the Theory of Reward Learning

Joar SkalseFeb 28, 2025, 7:26 PM

16 points

0 comments5 min readLW link

Machine Learning Projects on IDA

Owain_Evans, William_S and stuhlmueller

Jun 24, 2019, 6:38 PM

49 points

3 comments2 min readLW link

Discursive Competence in ChatGPT, Part 2: Memory for Texts

Bill BenzonSep 28, 2023, 4:34 PM

1 point

0 comments3 min readLW link

On scalable oversight with weak LLMs judging strong LLMs

zac_kenton, Noah Siegel, janos, Jonah Brown-Cohen, Samuel Albanie, David Lindner and Rohin Shah

Jul 8, 2024, 8:59 AM

49 points

18 comments7 min readLW link

(arxiv.org)

Prosaic AI alignment

paulfchristianoNov 20, 2018, 1:56 PM

48 points

10 comments8 min readLW link

Grokking, memorization, and generalization — a discussion

Kaarel and Dmitry Vaintrob

Oct 29, 2023, 11:17 PM

75 points

11 comments23 min readLW link

Using rationality to debug Machine Learning

Dr_ManhattanApr 10, 2018, 8:03 PM

20 points

3 comments1 min readLW link

(amid.fish)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.

happy fridayOct 24, 2024, 4:54 PM

8 points

0 comments1 min readLW link

Machine Unlearning in Large Language Models: A Comprehensive Survey with Empirical Insights from the Qwen 1.5 1.8B Model

RudaibaFeb 1, 2025, 9:26 PM

9 points

2 comments11 min readLW link

Research Questions from Stained Glass Windows

StefanHexJun 8, 2022, 12:38 PM

4 points

0 comments2 min readLW link

My Criticism of Singular Learning Theory

Joar SkalseNov 19, 2023, 3:19 PM

83 points

56 comments12 min readLW link

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

StefanHex and Julian_R

Oct 25, 2022, 8:48 PM

15 points

2 comments4 min readLW link

Seeing the Invisible (And How to Think About Machine Learning)

Filip DousekDec 8, 2021, 9:04 PM

3 points

0 comments3 min readLW link

How can Interpretability help Alignment?

RobertKirk and Tomáš Gavenčiak

May 23, 2020, 4:16 PM

37 points

3 comments9 min readLW link

Connectionism: Modeling the mind with neural networks

Scott AlexanderJul 19, 2011, 1:16 AM

61 points

20 comments8 min readLW link

Tutor-GPT & Pedagogical Reasoning

courtlandleerJun 5, 2023, 5:53 PM

26 points

3 comments4 min readLW link

A Girardian interpretation of the Altman affair, it’s on my to-do list

Bill BenzonNov 20, 2023, 12:21 PM

3 points

0 comments1 min readLW link

In-Context Learning: An Alignment Survey

alamertonSep 30, 2024, 6:44 PM

8 points

0 comments20 min readLW link

(docs.google.com)

Do humans really learn from “little” data?

Alice WanderlandJan 14, 2025, 10:46 AM

14 points

5 comments1 min readLW link

(aliceandbobinwanderland.substack.com)

Interpretable by Design—Constraint Sets with Disjoint Limit Points

Ronak_MehtaMay 8, 2025, 9:08 PM

23 points

1 comment9 min readLW link

(ronakrm.github.io)

Towards White Box Deep Learning

Maciej SatkiewiczMar 27, 2024, 6:20 PM

18 points

5 comments1 min readLW link

(arxiv.org)

Gradient descent might see the direction of the optimum from far away

Mikhail SaminJul 28, 2023, 4:19 PM

70 points

13 comments4 min readLW link

Thoughts on the Alignment Implications of Scaling Language Models

leogaoJun 2, 2021, 9:32 PM

82 points

11 comments17 min readLW link

Machines vs Memes Part 1: AI Alignment and Memetics

Harriet FarlowMay 31, 2022, 10:03 PM

19 points

1 comment6 min readLW link

Can AI improve the current state of molecular simulation?

Abhishaike MahajanDec 6, 2024, 8:22 PM

5 points

0 comments1 min readLW link

(www.owlposting.com)

My (Mis)Adventures With Algorithmic Machine Learning

AHartNtknSep 20, 2020, 5:31 AM

16 points

4 comments41 min readLW link

Some thoughts after reading Artificial Intelligence: A Modern Approach

swift_spiralMar 19, 2019, 11:39 PM

38 points

4 comments2 min readLW link

If Van der Waals was a neural network

George3d6Jan 28, 2020, 6:38 PM

18 points

3 comments11 min readLW link

(blog.cerebralab.com)

Trading off compute in training and inference (Overview)

Pablo VillalobosJul 31, 2023, 4:03 PM

42 points

2 comments7 min readLW link

(epochai.org)

Geoffrey Hinton on the Past, Present, and Future of AI

Stephen McAleeseOct 12, 2024, 4:41 PM

22 points

5 comments18 min readLW link

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1

StefanHex and Marius Hobbhahn

May 9, 2023, 7:41 PM

119 points

1 comment10 min readLW link

Predicting the Elections with Deep Learning—Part 1 - Results

Quentin ChenevierMay 14, 2022, 12:54 PM

0 points

0 comments1 min readLW link

Defining and Characterising Reward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM

15 points

0 comments4 min readLW link

Specification gaming examples in AI

Samuel RødalNov 10, 2018, 12:00 PM

24 points

6 comments1 min readLW link

(docs.google.com)

Write Good Enough Code, Quickly

Oliver DanielsDec 15, 2024, 4:45 AM

19 points

10 comments8 min readLW link

Multi-Component Learning and S-Curves

Adam Jermyn and Buck

Nov 30, 2022, 1:37 AM

63 points

24 comments7 min readLW link

P=NP

OnePolynomialOct 17, 2024, 5:56 PM

−25 points

0 comments8 min readLW link

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

Nov 18, 2022, 1:43 AM

75 points

7 comments3 min readLW link

(arxiv.org)

LLM misalignment can probably be found without manual prompt engineering

ProgramCrafterJul 8, 2023, 2:35 PM

1 point

0 comments1 min readLW link

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

Diego Caples and rrenaud

Sep 6, 2024, 5:55 PM

70 points

7 comments4 min readLW link

The Japanese Quiz: a Thought Experiment of Statistical Epistemology

DanBApr 8, 2021, 5:37 PM

11 points

0 comments9 min readLW link

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

James Fox, Chloe Li, JamesH, Gracie Green and CallumMcDougall

Jul 6, 2024, 11:34 AM

57 points

7 comments6 min readLW link

Singular Learning Theory for Dummies

Rahul ChandOct 15, 2024, 9:13 PM

1 point

0 comments8 min readLW link

[Question] Natural Selection vs Gradient Descent

CuriousApe11May 1, 2023, 10:16 PM

4 points

3 comments1 min readLW link

Influence functions—why, what and how

Nina PanicksserySep 15, 2023, 8:42 PM

72 points

6 comments8 min readLW link

Grouped Loss may disfavor discontinuous capabilities

Adam JermynJul 9, 2022, 5:22 PM

14 points

2 comments4 min readLW link

Examples of AI’s behaving badly

Stuart_ArmstrongJul 16, 2015, 10:01 AM

41 points

41 comments1 min readLW link

Machine learning could be fundamentally unexplainable

George3d6Dec 16, 2020, 1:32 PM

26 points

15 comments15 min readLW link

(cerebralab.com)

Domain-specific SAEs

jacob_droriOct 7, 2024, 8:15 PM

28 points

2 comments5 min readLW link

Reasons compute may not drive AI capabilities growth

Tristan HDec 19, 2018, 10:13 PM

42 points

10 comments8 min readLW link

Selling Nonapples

Eliezer YudkowskyNov 13, 2008, 8:10 PM

76 points

78 comments7 min readLW link

The case for aligning narrowly superhuman models

Ajeya CotraMar 5, 2021, 10:29 PM

186 points

75 comments38 min readLW link 1 review

Explaining grokking through circuit efficiency

Vikrant Varma and Rohin Shah

Sep 8, 2023, 2:39 PM

101 points

11 comments3 min readLW link

(arxiv.org)

What will the scaled up GATO look like? (Updated with questions)

Amal Oct 25, 2022, 12:44 PM

34 points

22 comments1 min readLW link

“Learning to Summarize with Human Feedback”—OpenAI

[deleted]Sep 7, 2020, 5:59 PM

57 points

3 comments1 min readLW link

Competitive Markets as Distributed Backprop

johnswentworthNov 10, 2018, 4:47 PM

59 points

10 comments4 min readLW link 1 review

[Link] Computer improves its Civilization II gameplay by reading the manual

Kaj_SotalaJul 13, 2011, 12:00 PM

49 points

5 comments4 min readLW link

Perceptrons Explained

lifelonglearnerFeb 14, 2020, 5:34 PM

13 points

2 comments1 min readLW link

(owenshen24.github.io)

Implementing a Transformer from scratch in PyTorch—a write-up on my experience

Mislav JurićApr 25, 2023, 8:51 PM

20 points

0 comments10 min readLW link

AI-Generated GitHub repo backdated with junk then filled with my systems work. Has anyone seen this before?

rguntherMay 1, 2025, 8:14 PM

7 points

1 comment1 min readLW link

Reinforcement Learning Study Group

Kay KozaronekDec 26, 2021, 11:11 PM

20 points

8 comments1 min readLW link

Complexity Penalties in Statistical Learning

michael_hFeb 6, 2019, 4:13 AM

31 points

3 comments6 min readLW link

No comments.

Ma­chine Learn­ing (ML)

Understanding different machine learning algorithms

Applications

Further Reading & References

See Also

Machine Learning (ML)