A basic mathematical structure of intelligence

An important concept here on LW is that of a singularity of intelligence, or at least a very rapid growth. Although it seems mostly hopeless, it would be nice if we could find a mathematical approach to quantify these things. I think the first point to note is that intelligence is is surely not one-dimensional. The concept of general intelligence suggest that it might be approximately one-dimensional, in some sense. But how can we really know what is true? I would like to get a better understanding of the geometry of intelligence. To do this, a good starting point is to look for natural operations on intelligence, as operations can lead to geometry.

I see two obvious natural operations on intelligence: Acceleration and Cooperation. Acceleration means speeding up an intelligence or giving it more computation/​physical time. Cooperation means taking two intelligences and letting them cooperate on a task.

You may note that this is somewhat similar to sequential and parallel computing, although I can not rigorously state connections yet.

My main message in this post is that The “set of intelligences” with the operations of acceleration and cooperation has the mathematical structure of an semi-module. I believe that a further mathematical study of the “set of intelligences” is likelt to take note of this fundamental structure.

In the rest of the post I will elaborate on what exactly this means.

Consider an abstract set of all “intelligent agents”, where I do not want to give an explicit definition. I think there are many equivalent and non-equivalent ways to define this, but for now an intuitive idea suffices. An intelligent agent is a thing that can be given a task and an amount of time or computation to solve it. The result can then be scored. We therefore think of a task as a function .

Given an agent and a set of tasks , the overall performance of the agent is represented by the induced function . It is this function which grows during a rapid of growth of intelligence, and we would like to know exactly how it might grow. I can not answer this but finding structure is a good start.

We endow the set with an operation . For , the agent represents a team consisting of and , which may now collaborate on any task. Again, I do not wish to rigorously define this, but an example of a rigorous definition could be to model the agents as entities that write into a memory while they still have time/​computation left, and that may indicate something they wrote as an answer. The collaboration of two agents is then given by having them both write into a shared memory. They might collaborate and perform better at certain tasks than each on their own, but a priori there also exist agents which simply have no concept of this and do not collaborate, or which misunderstand or even deceive their partner. In general, for some task the inequality

might fail. More on this later.

Acceleration is represented by a semigroup . Specifically, for an agent and a number we define an agent which represents but sped up by a factor , or equivalently giving more time/​computation by a factor .

We now state a number of axioms that the structure fulfills, according to intuition.

(A1) We have for all , i.e. cooperation is a commutative operation.

(A2) We have for all , i.e. cooperation is associative (note that we assume that cooperation happens in real-time or in a large number of “turns”, meaning that it does not matter who goes first and who goes last when exchanging information).

(A2) There exists a trivial agent so that . This trivial agent represents an agent that simply gives no answer, they contribute nothing to solving any task.

This makes a semigroup. There are two axioms connecting our structures.

(S1) for all and . It does not matter if you accelerate a model 2x and then 3x, or just once 6x. This means that is a semigroup acting on .

(S2) for all and .. This is an axiom of distributivity which states that the order of the actions of acceleration and cooperation is interchangeable.

With all these axioms, we can say that is an -semimodule. What does this mean? Well for a start, you can think that we almost have a vector space except that there are no inverse elements, i.e. we have semigroups instead of groups, and the distributivity axiom

(S3)

does not hold. In fact, it is quite important that in general The example are non-parallelizable tasks. One might in fact define that a model parallelizes over a task if for every . This is not supposed to be a good definition, I just want to demonstrated that useful definitions may pop out of this approach.

I would now like to state a number of inequalities which are certainly not true for all agents and tasks, but which I believe should be true for all reasonable tasks and for all agents which have some form of general intelligence, understanding of cooperation, and actually want to perform well on the task. If these inequalities hold or when they hold may be a question of interested and a potential for new definitions.

(H1) if and vice-versa. More computation time means better performance.

(H2) . This means that you manage to actually cooperate beneficially.

(H1) for any . This means that cooperating with copies of yourself is never superior to being alone, but accelerated accordingly.

Here are a number of questions that are potential directions for new insight:

(Q1) Are there other obvious/​natural operations on intelligence? How does this augment the structure of our semi-module? Could there be a multiplication of agents?

(Q2) Now that we have an algebraic structure on the set of agents, what about the set of tasks? Can we define morphisms from agents to tasks? Note that the maps here are generally non-linear.

(Q3) Are there other intuitive inequalities? Can we decompose the tasks into various categories using such definitions (parallelizable, non-parallelizable etc.)

(Q4) Are there already natural geometries we can define on this structure? (Metric, topologies or inner products?)

(Q5) How might a blow-up of intelligence look? Perhaps we should only expect blow-up of the form but not , as an AGI can copy itself and cooperate but not trivially speed up its computing hardware -fold. This is what I mean that the function might blow up in a “certain way”, i.e. only on parallelizable tasks.

Thank you for reading my post!