Loose Threads on Intelligence

Epistemic Status: Unfinished deep-dive into the nature of intelligence[1]. I committed to writing down my research path, but three weeks in I don’t have a coherent answer to what intelligence is, and I do have a next question I want to dig into instead. Thus, here are the rough and rambly threads on intelligence that I’ve gathered. This piece is lower polish than I like cause of trade-off on writing-vs-research. Skimming might be more productive than a full read!

Thread 1: Intelligence as path finding through reality

Intelligence is path finding through world states, where ‘path finding’ is a poetic term for optimization. Taking a closer look at optimization, it turns out that bad optimizers are still optimizers. Essentially, optimizers do not need to be optimal.

There exist three categories of optimization techniques:

  1. optimization algorithms (finitely terminating)

  2. iterative methods (convergent)

  3. heuristics (approximate solutions, but no guarantee)

Genetic algorithms and evolutionary algorithms are optimization heuristics. Thus we can trace our past from the primordial soup through simpler and simpler optimization techniques, and we can project our future to the singularity through the creation of better and better optimization techniques. Humans are a point on this scale of increasingly sophisticated and optimally performing optimization techniques instantiated in reality.

Each of the optimization techniques can in turn be instantiated in three different ways:

  • Mechanical

  • Computational

  • Collective

I made these up—There must be an existing framework that outlines something like this. Or maybe I’m misunderstanding the concept of optimization or how one can categorize the types of instantiations. Either way, here is what I mean by each:

Mechanical optimization cannot learn. It’s a tree growing toward the light or a water wheel generating power.

Computational optimization can learn but cannot be divided. It can compute all computable functions (Turing machine or a human with pen/​paper). However, if you break up the cognitive processing parts, no computation will take place.

Collective optimization can be divided. Every unit can implement mechanical or computational optimization in itself, and the units work together emergently or coordinately to a greater result than the individual pieces. For instance, a fungus can be split in two such that both halves will keep growing and functioning as individuals. A flock of birds can be split in two such that both halves will coordinate their flight in the same manner as when they were one. And of course, human societies can be split up in two and both halves will coordinate again in to societies.


The structure of deep learning mimics the structure of intelligence as path finding through world states

Intelligence = Mapping current world state to target world state (or target direction)

Deep learning = Mapping input layer to output layer

This seems analogous to me, but maybe it’s not. My reasoning is that deep learning relies on hidden layers between in- and output layers. Learning consists of setting the right weights between all the neurons in all the layers. This is analogous to my understanding of human intelligence as path finding through reality—in machine learning, a neural network is finding the function that maps inputs to outputs. In human intelligence, we look for the actions that maps the current state of reality to a desired future state of or direction through reality.

Maybe this is a tautological non-insight.


Segue on data augmentation

Data augmentation is transforming input data such that the network can learn to recognize more forms of that data and extract different features from it. Is human imagination and “thinking through different ways past events might have gone” a form of data augmentation? We perturb a memory and then project out how we would have felt and what we would have wanted to do. This seems quite similar to using simulation to generate and improve predictions.


Thread 2: Alignment as preference mapping

My core insight here is 4 reasoning steps followed by an intuitive leap:

Neural Networks can encode any computable function.

Our neural activity is a computable function.

Our utility function is encoded in our neural activity.

Thus a neural network can encode our utility function.

Beep-boop-brrrrrr—MAGIC LEAP:

An aligned AGI is one that has learned the function that maps our neurally encoded utility function to observable world states.

This seems true to me but maybe is not—Loose threads indeed.


Alignment and measurement error of human-in-the-loop

Alignment is human preference profiling performed by an artificial intelligence. In preference profiling, you need to make decisions on what input parameters you will use to predict the output parameter (preferences, in this case for world states instead of products). Input parameters can be behavioral, linguistic, or biological. They can also be directly elicited or indirectly observed. Behavioral and linguistic measures are imprecise because actions are only outputted by humans based on how their own cognitive ability and conflicting drives end up converging into actions. A lot of actions are suboptimal cause humans are not good optimizers. Thus the most reliable signal of the human utility function is either:

  • Aggregation over a large enough sample that all the noise is cancelled out

  • Direct biological measures of our utility function

However, who says there are no systematic biases and errors in our behavior that do not cancel out over large samples?

And who says that observing our utility function directly won’t change it through observation? Our experiences change us, and if our experiences are limited to being measured in a lab room, then this will not represent anything current humans consider to be our utility function.

Notably, HLRF relies on linguistic (and/​or behavioral) mappings, so that leads our humans-in-the-loop into the faulty mapping between what we actually want and our words and actions.


Human Utility Functions are more hyper- than parameter

Hyperparameters are parameters across your parameters. For instance, learning rate is the parameter that controls how much you update the weights in a neural network at each step. Human utility functions seem to have hyperparameters too, which makes conceptualizing and encoded them complicated to say the least. Specifically, humans gain utility directly from various stimuli and observations like eating sweet food or looking at puppies. These would be the parameters of the human utility function. But much of the utility people strive for is not this direct hedonic payoff. Instead, we have many (scarcely known) hyperparameters where the utility we get from our observations comes from the transformation and evaluation of one or many sets of observations. For instance, the satisfaction of a job well-done relies on observing the entire process and then evaluating the end result as good. Similarly, many observations that consist of directly negative stimuli (parameters) are evaluated as positive by some hyperparameter such as the meaningfulness of childbirth or the beautiful release of a funeral.

The evaluation of the aggregates of our observations even change our biochemistry such that hyperparameters influence the parameters of direct experience. For instance, evaluating someone’s social cues as them liking you can directly generate feelings of relaxation that are physiologically embodied and thus direct parameters. While the exact same encounter, if evaluated negatively, could cause tension in the body that is also a direct parameter. Thus the exact same stimuli result in completely different reward signal purely based on the settings of the hyperparameters that control how a set of observations is transformed and then evaluated.

Thus it seems conceptually straightforward to map the parameters of our utility function in to something learnable by an AGI, but it’s much less clear how we’d map the ever-fickle hyperparameters of our utility function that entirely hinge on our evaluations and transformations we ourselves apply to our experiences … it’s a value we compute internally that would require the AGI to simulate us as full-bodied beings to get the exact same result. This would be undesireable cause such a simulation can thus validly suffer as much as we ourselves do. And thus we don’t want to map our utility function directly to an AGI but use some proxy. And the only sensible proxy is then “do as I say, don’t do as I seem to want to do”, which then boils down to needing corrigibility.


Collaborative Filtering on Values?

Collaborative filtering finds the latent factors for how to match two types of things together (like humans and movies). Are there latent factors to humans and the values they espouse? If you run Principle Component Analysis on the values, would you get a few limited clusters? This seems easy to google and has probably been done, but probably was also not encoded well and it’s hard to see how one would accurately extract value data from people such that the analysis makes sense and has useful results.


Thread 3: Natural language as data compression

Language is a data compression format that inherently encodes relational properties across abstract entities such that models of reality can be communicated and reasoned about. In contrast, images are sense data, where sense data can be compressed, but inherently is not. Similarly, sense data can encode relational properties across abstract entities, but inherently does not (e.g., a picture of a book with language in it, or a picture of a diagram).

Is it then true that AGI can result from language models but not from image models? The counterargument would be that language models lack grounding in reality. Image models can be grounded in reality cause they consume sense data and thus can be hooked up to cameras. However, we’ve created systems that allow sense data to be directly translated to language data and language data to be directly translated to actions. Thus, even though an abstract data compression format like language is not inherently grounded in reality, we have given it eyes and hands such that it can sense and act in the real world without directly consuming sense data or outputting motor data.

So actually, AGI can result from image models that read and write, but that’s many much more steps than you’d need when using a language model. Thus AGI from language models will exist before AGI from image models or other sense-data-only models.


What’s mentalese?

Human reasoning happens in “mentalese”. People’s introspection on how they reason is plausibly faulty, but many people have some experience of reasoning in language, imagery, and spatial-relationally. Are these just a side-effect of reasoning, and does it all take place “under the hood” anyway? Could one reason without having any conscious process of reasoning? Presumably, yes. Is that what the zombie-discussion points to? What happens if we input both language and image data in to a big enough neural network? Will reasoning then take place in both? Is there any value in enhancing intelligence with sense data?


Supervised Learning as the bootstrap of collective intelligence

Self-supervised learning is the default form of learning for individual agents embedded in reality. You make a prediction of what reality will look like, and then time passes and you see if your prediction is true. Or you make a prediction of what reality will look like if you do an action, then you do the action, and see if it is true.

Supervised learning in contrast is a form of collective intelligence. It only works if another intelligence has already learned the mapping and can thus output the labels for you. So supervised learning is how we bootstrap AI and launch it to a much higher entry point than we as biological organism could start with. We’ve learned to integrate supervised data since we’re a collective intelligence that uses language (mostly) for coordination. However, self-supervised learning is the only option for an AGI to learn things we don’t know yet.


Even Looser Threads

Feature engineering seems like a form of pre-processing, and thus not a relevant concept for AGI? We’d expect AGI to learn it’s own features. Which is what kernels in convolutional neural networks do, for instance.


Overfitting in a neural network is basically memorizing the data set. This lines up with Steve Byrnes’ explanation of most all of the brain being essentially memory models, but particular types of memory models. But how does this work exactly?


Are System 1 and System 2 reasoning pretty much a 2-piece ensemble? If so, wouldn’t you expect more models? Maybe there are? Maybe we are integrating those lower down? This seems not super relevant.


What’s a positive feedback loop called in the alignment problem? In current ML you already need to watch out for positive feedback loops if the output of the network influences the input it will later get. The given example is a collaborative filtering network that matches users to movies, but then mostly the matched movies will be watched and thus rated, and thus matched again, etc. This clearly creates a massive issue with AGI that interacts with humans … what is this problem called in the existing alignment literature? I had a concept on this called “manipulation threshold” meaning some formalization of how much and what kind of influence an AGI is allowed to have on any human when discussing plans that have not been signed off yet (as the other elements will be subsumed by corrigibility).

  1. ^

    The deep-dive consisted of running through the Fast.AI course in 2 weeks, generating my own spin-off questions from that, and then conceptually working through the nature of intelligence from scratch on my own, googling little bits and pieces as I went. The main body text will contain as many references as I can recall to link insights together and to sources.