epicurus

Karma: 40

epicurus 15 May 2026 9:04 UTC
3 points
0
on: Convergent Abstraction Hypothesis
I have converged (ha!) to similar views recently. I think it is worth trying to make this a lot more precise actually. Let me take a simplified version of a standard ML training set up. So we have some dataset D that samples a subset of all possible inputs A with binary labels in {0,1} and a neural network architecture that defines for you a parameter space and an associated function space F: A to {0,1}. Points of this function space correspond to a “labelling function” on D, and in particular there is a subspace S of F that is the set of “correct functions”, i.e., functions that match on the training set. In general, our optimization algorithms tend to find a point in S always, i.e., they minimize loss on the training set.

Now there is also a test set that is a even smaller subspace T of S. For training to have worked, or to say that the trained net generalizes correctly, what we really mean is that the optimization algorithm finds not just a point in S, but a point in T. So we see that “correct training” is naturally a function that depends on these two nested subspaces (T \subset S). And the power of neural networks is somehow really that they find a much smaller subspace consistently than what training would require (each test data point cuts down the size of the space by around a factor of 2).

Does this help us make more precise your convergent abstraction hypothesis? I think so. I think the key point is that data sets are naturally generated by learners (often humans). So if we have a trained net or a human who can assign labels to data points, we can generate the training data set D by prompting the neural network, and similarly for the test data set.

Then when we train a different neural network on the outputs of the first, for learning to converge behaviorally is to say that they both identify the smaller space T inside S as the “important” one.

I am not sure how legible that was, I am finding this comment box hard to express mathematical ideas in...

===

Anyway, the upshot is that I think this lets us directly compare two learners. If learner A is trained on the outputs of layer B, do they generalize in similar ways? Do they find the right subsets of the function space as the effective target space?

epicurus 12 May 2026 22:19 UTC
13 points
0
on: Optimisation: Selective versus Predictive
I think the really interesting interaction between these two frames is when selection pressures lead to predictive capacities. When does this happen? A first guess might be: when the training (selecting) environment is so complicated, and there is so much local variance that the selective loop finds its easiest to instill a predictive agent and let that take care of the local adaptation.

A lot of stuff works like this: you can have generic chess/math heuristics but you need to be able to do local calculations to not fall flat on your face; evolution more or less works like this in mammals and obviously humans, maybe much more; presumably LLMs work like this; our central nervous system/mind works like this wrt individual cells in the body.

Are there other factors that mediate how a selective process can give rise to local predictive agents? What consequences does this transition have? Cancer/parasites/fraud are three instances of one example, what else?

epicurus 12 May 2026 22:14 UTC
3 points
0
in reply to: 1a3orn’s comment on: Optimisation: Selective versus Predictive
I think it’s possible that gradient descent works by applying a selection pressure to preexisting circuits in the initial randomization with some finetuning. This would explain why most weights are zero after training as well as stuff like the lottery ticket hypothesis.

Reinforcement Learning, Agency and Taste

epicurus12 May 2026 18:22 UTC

7 points

0 comments9 min readLW link

What is Claude?

epicurus26 Feb 2026 4:26 UTC

14 points

0 comments7 min readLW link

What are we doing when we do mathematics?

epicurus14 Mar 2025 20:54 UTC

9 points

2 comments1 min readLW link

(asving.com)

epicurus 4 May 2015 19:07 UTC
0 points
0
in reply to: Lumifer’s comment on: Is Scott Alexander bad at math?
Cutting edge math is actually mostly about converting fuzzy stuff, at least the parts of math I am interested in(Algebraic Geometry—Grothendieck/Weil for example). Both the mentioned mathematicians worked in a field where people had some stuff that worked but no foundations.

Also, the foundations of math have been changing for quite a long time and continue to do so. I think your reaction to mathematics might be to badly taught mathematics rather than mathematics as practiced. However, I don’t see an easy way to fix it.

To teach mathematics well would require a high amount of mastery and we don’t have enough people like that around.

epicurus 4 May 2015 16:28 UTC
9 points
0
in reply to: Lumifer’s comment on: Is Scott Alexander bad at math?
I am not sure what exactly going deeper at logic/patterns means if not getting into mathematical logic. It is incredibly easy to read mathematics you know and incredibly difficult to read mathematics that you don’t due to how dense it is. It might be the case that your impression is due to comparing these two.

I am training to become a mathematician and I do not know of a single person for whom learning mathematics is not slowly and with a lot of effort, I do not think you are particularly exceptional in that but I know very little about your particular scenario.

epicurus 4 May 2015 16:20 UTC
0 points
0
in reply to: JonahS’s comment on: Is Scott Alexander bad at math?
I am a bachelor’s in mathematics and estimate my current knowledge to be around a second year graduate student’s if my mathematical knowledge is useful. I am interested in getting better at doing math as well as teaching it.

Note: I am not the person you replied to.

epicurus 3 Mar 2015 15:25 UTC
2 points
0
on: Rationalization
According to this article, one can predict a decision 7 seconds before it is actually made. Doesn’t this, in some sense, mean that a large amount of our thought process(certainly those 7 seconds) are actually rationalizing a decision we have already made?

Is my thinking off or is this one more thing to actively guard against and realize when we are letting our unconscious decide for us?

epicurus 2 Mar 2015 15:10 UTC
0 points
0
on: The Futility of Emergence
This is very curious. I never thought of emergent as an explanation but as a property. I roughly understood it to mean that the emergent quality was transferable. That is, intelligence is a product of neurons firing but it need not have been, it could also have been generated from transistors or whatever else.

This is roughly the opposite of your ant example. Something is emergent if it can be explained/predicted with no knowledge of the lower level. A lot of properties of turing machines do not depend on the actual formalism of the turing machine.

Edit: After browsing the other comments, I realize this is something that has been brought up before. My 2 cents for whatever it is worth, I guess…

epicurus

Re­in­force­ment Learn­ing, Agency and Taste

What is Claude?

What are we do­ing when we do math­e­mat­ics?

Reinforcement Learning, Agency and Taste

What are we doing when we do mathematics?