Lambda calculus is though the internal language of a very common kind of category, so, in a sense, category theory allows lambda calculus to do computations not only with functions, but also sets, topological spaces, manifolds, etc.

# MrMind

While I share your enthusiasm toward categories, I find suspicious the claim that CT is the correct framework from which to understand rationality. Around here, it’s mainly equated with Bayesian Probability, and the categorial grasp of probability or even measure is less than impressive. The most interesting fact I’ve been able to dig up is that the Giry monad is the codensity monad of the inclusion of convex spaces into measure spaces, hardly an illuminating fact (basically a convoluted way of saying that probabilities are the most general ways of forming convex combinations out of measures).

I’ve searched and searched for categorial answers or hints about the problem of extending probabilities to other kinds of logic (or even simply extending it to classical predicate logic), but so far I’ve had no luck.

The difference between the two is literally a single summation, so… yeah?

# Odds are not easier

I’d like to point out a source of confusion around Occam’s Razor that I see you’re falling for, dispelling it will make things clearer: “you should not multiplicate entities

**without necessities**!”. This means that Occam’s Razor helps decide between competing theories if and only if they have the same explanation and predictive power. But in the history of science, it was almost*never*the case that competing theories had the same power. Maybe it happened a couple of times (epicycles, the Copenhagen interpretation), but in all other instances a theory was selected not because it was simpler, but because it was much more powerful.Contrary to popular misconception, Occam’s razor gets to be used very, very rarely.

We do have, anyway, a formalization of that principle in algorithmic information theory: Solomonoff induction. A agent that, to predict the outcome of a sequence, places the highest probabilities in the shortest compatible programs, will eventually outperform every other class of predictor. The catch here is the word ‘eventually’: in every measure of complexity, there’s a constant that offset the values due to the definition of the reference universal Turing machine. Different references will indicate different complexities for the same first programs, but all measure will converge after a finite amount.

This is also why I think that the problem explaining thunders with “Thor vs clouds” is such a poor example of Occam’s razor: Solomonoff induction is a formalization of Occam razor for

*theories*, not*explanations*. Due to the aforementioned constant, you cannot have absolutely simpler model of a finite sequence of event. There’s no such a thing, it will always depend on the complexity of the starting Turing machine. However, you can have**eventually simpler**models of**infinite**sequence of events (infinite sequence predictor are equivalent to programs). In that case, the natural event program will prevail because it will allow to control better the outcomes.

I arrived at the same conclusion when I tried to make sense of the Metaethics Sequence. My summary of Eliezer’s writings is: “morality is a bunch of mental computations shared between most human beings”. Morality thus grew out of our evolutive history, and it should not be surprising that in extreme situations it might be incoherent or maladaptive.

Only if you believe that morality should be like systematic and universal and coherent, then you can say that extreme examples are uncovering something interesting about peoples’ morality.

Otherwise, extreme situations are as interesting as saying that people cannot mentally factor long numbers.

First of all, the community around LW2.0 can only be loosely associated to a movement: I don’t think there’s anyone that explicitly endorses *every* technique or theory appeared here. LW is not CFAR, is not the Alignment forum, etc. So I would caution against enticing someone into LW by saying that the community supports this or that technique.

The main advantage of rationality, in its present stage, is defensive: if you’re aspiring to be rational, you wouldn’t waste time attending religious gatherings that you despise; you wouldn’t waste money buying ineffective treatments (sugar pills, crystals, etc.); you wouldn’t waste resources following people that mistake fiction for facts. At the moment, rationality is just a very good filter for every product, knowledge and praxis that society presents to you (hint: 99% of those things is crap).

On the other hand, what you can or should do with all the resources you’re not wasting, is something rationality cannot answer in full today. Metaethics and akrasia are, after all, the greatest unsolved problems of our community.

There were notorious attempts (e.g. Torture vs Dust specks or the Basilisk), but nothing has emerged with the clarity and effectiveness of Bayesian reasoning. Effective Altruism and MIRI are perhaps the most famous examples of trying to solve the most pressing problems. A definitive framework though still eludes us.

In Foerster’s paper, he links the increase in productivity linearly with the increase in population. But Scott has also proposed that the rate of innovation is slowing down, due to a

*logarithmic*increase of productivity from population. So maybe Foerster’s model is still valid, and 1960 is only the year where we exhausted the almost linear part of progress (the “low hanging fruits”).Perhaps nowadays we combine the exponential growth of population from population with the logarithmic increase in productivity, to get the linear economic growth we see.

Algebraic topology is the discipline that studies geometries by associating them with algebraic objects (usually, groups or vector spaces) and observing how changing the underlying space affects the related algebras. In 1941, two mathematicians working in that field sought to generalize a theorem that they discovered, and needed to show that their solution was still valid for a larger class of spaces, obtained by “natural” transformations. Natural, at that point, was a term lacking a precise definition, and only meant something like “avoiding arbitrary choices”, in the same way a vector space is naturally isomorphic to its double dual, while it’s isomorphic to its dual only through the choice of a basis.

The need to make precise the notion of naturality for algebraic topology led them to the definition of natural transformation, which in turn required the notion of functor which in turn required the notion of category.

This answers questions 1 and 2: category theory was born to give a precise definition of naturality, and was sought to generalize the “universal coefficient theorem” to a larger class of spaces.

This story is told with a lot of details in the first paragraphs of Riehl’s wonderful “Category theory in context”.

To answer n° 3, though, even if category theory was rapidly expanding during the ’50s and the ‘60s, it was only with the work of Lawvere (who I consider a genius on par with Gödel) in the ’70s that it became a foundational discipline: guided by his intuitions, category theory became the unifying language for every branch of mathematics, from geometry to computation to logic to algebras. Basically, it showed how the variety of mathematical disciplines are just different ways to say the same thing.

Is it really quite different, besides halo effect? It strongly depends on the detail, though if the two say the exact same thing, how are things different?

The concept of “fake framework”, elucidated in the original post, to me it seems one of a model of reality that hides some complexity, sometimes even to the point of being very wrong, but that is nonetheless useful because it makes some other complex area manageable.

On the other hand, when I read the quotes you presented, I see a rich tapestry of metaphors and jargon, of which the proponent himself says that they can be wrong… but I fail completely to see what part of reality they make manageable. These frameworks seems to just add complexity to complexity, without any real leverage over reality. This makes those frameworks draw nearer fiction, rather than useful but simplified models.

For example, if there’s no post-rational stage of developement, what use is the advice of not confusing it with a pre-rational stage of developement? If Enlightenment is not a thing, what use is the exortation to come up with a chronologically robust definition of the same?

This to me is the most striking difference between “Integral spirituality” and say a road map. With the road map, you know exactly what is hidden and why, and it’s evident how to use it. With Wilber’s framework, it seems exactly the opposite.

Maybe this is due to of my unfamiliarity with that material… so someone who has effectively found out something useful out of that model can chime in and tell their experience, and I will stand corrected.

I’m sorry, but you cannot really learn anything from one example. I’m happy that your parents are faring well in their marriage, but if they didn’t would you have learned the same thing?

I’ve consulted a few statistics on arranged marriage, and they all are:

underpowered

showing no significative difference between autonomous and arranged marriages

The latter part is somewhat surprising for a Westerner, but given what you say, the same should be said for an Indian coming from your background.

The only conclusion I can draw fairly conclusively is that, for a long term relationship, the way or the why it started doesn’t really matter.

Are you familiar with the concept of fold/unfold? Folds are functions that consume structures and produce values, while unfolds do the opposite. The composition of an unfold plus a fold is called a hylomorphism, of which the factorial is a perfect example: the unfold creates a list from 1 to

*n*, the fold multiplies together the entire list. Your section on the “two-fold recursion” is a perfect description of a hylomorphism: you take a goal, unfold it into a plan composed of a list of micro-steps, then you fold it by executing each one of the micro-steps in order.

Luke already wrote that there are at least four factors that feed motivation, and the expectation of success is only one of them. No amount of expectancy can increment drive if other factors are lacking, and as Eliezer notice, it’s not sane to expect only one factor to be 10x the others so that it alone powers the engine.

What Eliezer is asking is basicall if anyone has solved the basic coordination problem of mankind, and I think he knows very well that the answer to his question is no. Also, because we are operating in a relatively small mindspace (humans’ system 1), the fact that no one solved that problem in hundreds of thousands of years of cooperation points strongly toward the fact that such a solution doesn’t exist.

Re: the third point, I think it’s important to differentiate between and , where is the true prediction, that is what actually happens when an agent performs the action .

is simply the outcome the agent is aiming at, while is the outcome the agent eventually gets. So maybe it’s more interesting a measure of similarity in , from which you can compare the two.

Let’s say that is the set of available actions and is the set of consequences. is then the set of predictions, where a single prediction associates to every possible action a consequence. is then a choice operator, that selects for each prediction an action to take.

What we have seen so far:

There’s no ‘general’ or ‘natural’ choice operator, that is, every choice operator must be based on at least a partial knowledge of the domain or the codomain;

Unless the possible consequences are trivial, a choice operator will choose the same action for many different predictions, that is a choice operator only uses certain feature of the predictions’ space and is indifferent to anything else [1];

A choice operator defines naturally a ‘preferred outcome’ operator, which is simply the predicted outcome of the chosen action, and is defined by ‘sandwiching’ the choice operator between two predictions. I just thought

*interleave*is a better name than*sandwich*. It’s of type .

[1] To show this, let be a partition of and let be the equivalence relation uniquely generated by the partition. Then

# You’re never wrong injecting complexity, but rarely you’re right

I wonder if there are any plausible examples of this type where the constraints don’t look like ordering on B and search on A.

Yes, as I shown in my post, such operators must know at least an element of one of the domains of the function. If it knows at least an element of A, a constant function on that element has the right type. Unfortunately, it’s not much interesting.

It’s interesting to notice that there’s nothing with that type on hoogle (Haskell language search engine), so it’s not the type of any common utility.

On the other hand, you can still say quite a bit on functions of that type, drawing from type and set theory.

First, let’s name a generic function with that type . It’s possible to show that k cannot be parametric in both types. If it were, would be valid, which is absurd ( has an element!). It’ also possible to show that if k is not parametric in one type, it must have access to at least an element of that type (think about and ).

A simple cardinality argument also shows that k must be many-to-one (that is, non injective): unless B is 1 (the one element type),

There is an interesting operator that uses k, which I call interleave:

Trivially,

It’s interesting because partially applying interleave to some k has the type , which is the type of continuations, and I suspect that this is what underlies the common usage of such operators.

Two of my favorite categories show that they really are everywhere: the free category on any graph and the presheaves of gamma.

The first: take any directed graph, unfocus your eyes and instead of arrows consider paths. That is a category!

The second: take any finite graph. Take sets and functions that realize this graph. This is a category, moreover you can make it dagger-compact, so you can do quantum mechanics with it. Take as the finite graph gamma, which is just two vertex with two arrows between them. Sets and functions that realize this graph are… any graph! So, CT allows you to do quantum mechanics with graphs.

Amazing!