Karma: 2,142

Half-researcher, half-distiller (see https://​​distill.pub/​​2017/​​research-debt/​​), both in AI Safety. Funded, and also PhD in theoretical computer science (distributed computing).

If you’re interested by some research ideas that you see in my posts, know that I probably have many private docs complete in the process of getting feedback (because for my own work, the AF has proved mostly useless in terms of feedback https://​​www.lesswrong.com/​​posts/​​rZEiLh5oXoWPYyxWh/​​adamshimi-s-shortform?commentId=4ZciJDznzGtimvPQQ). I can give you access if you PM me!

• Caveat: I’m a bit of an algebraic abstractologist with a not-that-deep understanding of category theory. Thus I might represent the worst of both worlds, someone defending the abstract for the abstract without concrete arguments.

My first reaction when seeing this post is that it was a great idea to give an intuitive explanation of category theory. My second reaction, when reading the introduction and the “Path in Graphs” section, was to feel like every useful part of category theory had been thrown away. After reading the whole post, I feel there are great parts (notably the extended concrete example for functors), and others with which I disagree. I will try to clarify my disagreement.

• First, category theory is not about graphs. Categories are not graphs. There not even graphs with a tiny bit of structure on top. Categories are abstractions of mathematicals structures, which can be represented as graph. One can even argue that applications of category theory are simply about structuring the underlying objects and relations in such a way that what we want to prove is shown by a basic diagram. An intuition for this is that you can do category theory without using the graphs.

• There are very good reason to use sets and the category of sets as primarily examples. The most obvious one is that category theory aims to provide a fondation of mathemathics in place of set theory. Thus starting with sets, which any student of modern mathematics know, is being nice to the mathematical reader. Also, pretty much any book on category theory uses the category of sets as a foil, to show that categorical notions should not be given an intuition based only on set theory and functions on sets, as this intuition breaks down in other cases. For example, the fact that an isomorphism in category theory is not just a bijection as in the category of sets—for the category of posets, isomorphisms must be order-preserving while bijections are not necessarily.

• Lastly, about the applications you seem to search for category theory. I feel like you want to use category theory as a tool to solve a problem, like you apply an inequality to derive a new theorem. But as far as I know, all uses of category theory come after the fact. It is a formal and structured perspective you can take on mathematical objects and how they relate. It might give you an understanding of the underlying structure behind your approach or results, but it rarely, if ever, gives you the means to accomplish an effective goal. Even the most concrete examples like monads in Haskell come from people already committed to find a solution within category theory, not because category theory was the best tool to solve the problem at hand.

All that being said, I still think trying to boil down the main concepts of category theory is a great idea. I’m only worried that your approach throws away too much of what makes the value of category theory.

• I don’t know if I should feel good or bad for taking the bait… let’s say it make the debate progress.

From what you are saying, I think I get what you are aiming for. But I am also not sure that this issue you see is really there. Because every textbook on category theory that I read pointed in its first chapter or section that you should not use your intuition for sets in the abstract setting of category theory. Even the historical origins of category theory comes from algebraic topology, where the influence of sets and set theory is very dimmed.

That being said, maybe there is some “Cantor bias” in the modern practice of mathematics, even when the point is to replace set theory. That’s actually one aspect of the Physics, Topology, Logic and Computation: A Rosetta Stone paper mentioned in my other answer that I like a lot: the authors give different categories, and argue that the category best capturing applications in Physics and Computer Science is not the category of sets, but another behaving differently.

I think I have two questions following your comment:

• First, could you expand your example of group theory in math vs group theory in physics? I know a bit a mathy group theory, and I know it is applied to various parts of physics, but I’m curious of clear cut differences between the uses.

• Second, does the already existing field of Applied Category Theory (intro paper and intro book) fits your bill for applications?

• Separate comment for references to classical category theory books and resources. I don’t think any of these are exactly what you are looking for, but they each propose a different perspective on these concepts, which might be the best we have now.

• The best textbook I know of is Category Theory by Awodey. It is both rigorous and intuitive, at least at the level of maths textbook. There are a lot of examples, as concrete as possible, and the differences between them and how that inform the abstract definitions are treated in details.

• Do not go to Maclane’s Category Theory for Working Mathematician. Not that it is a bad book, just that it is the most honest title of a book I ever saw. Maclane writes for the working mathematician, so not even the graduate student in maths fits exactly his standards.

• For a glimpse of the structuring power of category theory and its links to physics and computer science, Physics, Topology, Logic and Computation: A Rosetta Stone by Baez and Stay is the place to go. This paper also argues eloquently that the most important categories are not the one similar to the category of sets.

• A short one that I like is Basic Category Theory for Computer Scientists by Pierce. It is short, to the point, and goes deeper into the applications to theoretical computer science. One caveat is that Pierce is the kind of computer scientist that studies proof theory and teaches Coq and theorem proving. So it might be slightly too abstract for some people.

• If I understand your examples correctly, one way a classifier can be manipulative is by learning to control its training protocol/​environment? Does this means that a fixed training protocol (without changes in the training sets or programmer interventions) would forbid this kind of manipulation?

This still might be a problem, since some approaches to AI Safety rely on human supervision/​intervention.

• So your path-based approach to category theory would be analogous to the matrix-based approach of group theory in physics? That is, removing the abstraction that made us stumble into theses concepts in the first place, and keeping only what is of use for our applications?

I would like to see that. I’m not sure that your own proposition is the right one, but the idea is exciting.

• This is interesting, because at least in theoretical computer science, verifying something is conjectured to be easier that creating it: the P vs NP question for example, where almost all complexity theorist conjectures that P is not equal to NP. That is to say, some problems in NP (problems for which we can verify a solution in polynomial time) are conjectured to not be in P (problems for which we can find a solution in polynomial time).

On the other hand, your examples hint at cases where verifying something (the quality of the product for example) is almost as hard as creating this thing (building a quality product).

Not sure if this adds anything to the conversation, but I found the connection surprising.

• Also, I am pretty sure that the xkcd example is wrong. Mathematically, the entropy of the second password should be lower, because we can guess the next letters of the words from dictionary analysis, or even frequencies of next letters in language like english. And practically, dictionary attacks are pretty much built for breaking passwords like the latter.

The standard for root passwords in my local community is more on the order of finding a very long sentence, and taking letters from each words (the first one in the easiest scheme, but it can get harder) to build a long password that is both hard to guess and relatively easy to remember.

• When taking the digital evolution example, it seems that by “manipulation”, you mean that the tricks used by the programmers at training time did not prevent the behavior they were supposed to prevent. Which fits the intuitive notion of manipulative.

Then, we might see these examples as evidence that even classifiers, arguably the simplest of learning algorithms with the least amount of interaction with the environment, cannot be steered away from maximizing their goal by ad-hoc variations in the training protocol. Is that what your point was?

Somehow, I also see your examples (at least the first one, and the digital evolution one) as indicative that classifiers are robust against manipulations by their programmers. Because these classifiers are trying to maximize their predictive accuracy or their fitness, and the programmers are trying to make them maximize something else. Hence, we can see the behavior of these classifiers as “fault-tolerant”, in a way.

Though this can be a big issue when the target of prediction is not exactly what we wanted, and thus we would like to steer it to something different.

• Your point is that is all boils down to accountability, then. Not because of justice, but because failing on some aspects of your job for which you are held accountable by people on the outside (like not delivering the mail for the mail company, or polluting for the eco-friendly company) makes you vulnerable, and thus is really dangerous for your self-interest.

The fully cynical worldview is a bit too much for me, but I feel this explains a lot within this view.

• You’re right. I was thinking on the level of letters, but the fact that he gives the same number of bits of entropy to four quite different words should have alerted me. And with around 2000 common words to choose from, the entropy is indeed around 11 bits per word.

Thanks for the correction!

(For our local password, the sentences tends to be created, to avoid some basic dictionary attacks, and they tends to be complex and full of puns. But you might be right about the entropy loss in this case.

• The existence of problems whose answers are hard to verify does not entail that this verification is harder than finding the answer itself. Do you have examples of the latter case? Intuitively, it seems akin to comparing any (deterministic) complexity class with its non-deterministic version, and any problem solvable in the former is verifiable in the latter, by dropping the proof and just solving the problem.

For the difference between verifying a proof and an answer, I agree that interactive protocols are more appropriate for the discussion we’re having. Even if interactive protocols are not about distinguishing between different experts, they might serve this point indirectly by verifying the beauty of a car design or the security of a system. That is, we could (in theory) use interactive proofs to get convinced with good probability of the quality of a candidate-expert’s output.

• Could you give a list of some open problems or open questions related to this agenda (maybe with some pointers to the more relevant posts)? I am potentially interested in working on it, but I find it far easier to study a topic (and you sir write a lot of technical posts) while trying to solve some concrete problem.

• Okay, so we agree that it’s improbable (at least for decision problems) to be able to verify an answer faster than finding it. What you care about are cases where verification is easier, as is conjectured for example for NP (where verification is polynomial, but finding an answer is supposed to not be).

For IP, if we only want to verify any real-world property, I actually have a simple example I give into my intro to complexity theory lectures. Imagine that you are color-blind (precisely, a specific red and a specific green look exactly the same to you). If I have two balls, perfectly similar except one is green and the other is red, I can convince you that these balls are of different colors. It is basically the interactive protocol for graph non-isomorphism: you flip a coin, and depending on the result, you exchange the balls without me seeing it. If I can tell whether you exchanged the balls a sufficient number of times, then you should get convinced that I can actually differentiate them.

Of course this is not necessarily applicable to questions like tastes. Moreover, it is a protocol for showing that I can distinguish between the balls; it does not show why.

• Great post! It explained clearly both positions, clarified the potential uses of PAL and proposed variations when it was considered accessible.

Maybe my only issue is with the (lack of) definition of the principal-agent problem. The rest of the post works relatively well without you defining it explicitly, but I think a short definition (even just a rephrasing of the one on Wikipedia) would make the post even more readable.

• Great post! I want to chew on it a bit before making a longer comment, but I noticed similarities between this post and Nate Soares’s Replacing Guilt sequence (which I consider the most important sequence… ever). More specifically, he seems to say things similar about guilt and should in “should” considered harmful, Not because you “should” and Your “shoulds” are not a duty.

For example, from “should” considered harmful:

I see lots of guilt-motivated people use “shoulds” as ultimatums: “either I get the meds, or I am a bad person.” They leave themselves only two choices: go out of their way on the way to work and suffer through awkward human interaction at the pharmacy, or be bad. Either way, they lose: the should has set them up for failure.
But the actual options aren’t “suffer” or “be bad.” The actual options are “incur the social/​time costs of buying meds” or “incur the physical/​mental costs of feeling ill.” It’s just a choice: you weigh the branches, and then you pick. Neither branch makes you “bad.” It’s ok to decide that the social/​time costs outweigh the physical/​mental costs. It’s ok to decide the opposite. Neither side is a “should.” Both sides are an option.

Or the idea of prefering to punish someone (me or another) instead of actually looking at the situation and accepting it, makes me think of tolerification:

There’s a certain type of darkness in the world that most people simply cannot to see. It’s not the abstract darkness: people will readily acknowledge that the world is broken, and explain how and why the hated out-group is responsible. And that’s exactly what I’m pointing at: upon seeing that the world is broken, people experience an impulse to explain the brokenness in a way that relieves the tension. When seeing that the world is broken, people reflexively feel a need to explain. Carol can acknowledge that there is suffering abroad, but this acknowledgement comes part and parcel with an explanation about why she bears no responsibility. Dave can acknowledge that he failed to pass the interview, but his mind automatically generates reasons why this is an acceptable state of affairs.
This is the type of darkness in the world that most people cannot see: they cannot see a world that is unacceptable. Upon noticing that the world is broken, they reflexively list reasons why it is still tolerable. Even cynicism, I think, can fill this role: I often read cynicism as an attempt to explain a world full of callous neglect and casual cruelty, in a framework that makes neglect and cruelty seem natural and expected (and therefore tolerable).
I call this reflexive response “tolerification,” and if you watch for it, you can see it everywhere.

The approach of these questions in the replacing guilt series is not exactly at the same level; most notably, I feel Nate is trying to explain why should are not “useful” and cause only harm that cannot serve for accomplishing your goals. On the other hand, I see this post as more about examining the exact mechanism underlying this error we make.

Still, I feel the connection is strong enough to encourage people to read both.

• Yes, I agree that you are focusing more on how to see the mistake in a meta-way, instead of an outside view as Nate do.

Though I don’t think your example of the distinction is exactly the right one: the idea from Nate of banning “should” or cashing out “should” would be able IMHO to unearth the underlying “I should be taking things seriously” apply the consequentialist analysis of “you will not be measured by how you felt or who you punished. You will be measured by what actually happened, as will we all” (paraphrasing). What I feel is different is that the Way provide a mean for systematically findind this underlying should and explaining it from the inside.

Nonetheless, I find both useful, and I am better for having the Curse of the Counterfactual in my mental toolbox.