Abe Dillon

Karma: 79

Abe Dillon 26 Jun 2019 1:16 UTC
24 points
in reply to: nostalgebraist’s comment on: Embedded World-Models
I think that grappling with embeddedness properly will inevitably make theories of this general type irrelevant or useless
I disagree. This is like saying, “we don’t need fluid dynamics, we just need airplanes!”. General mathematical formalizations like AIXI are just as important as special theories that apply more directly to real-world problems, like embedded agents. Without a grounded formal theory, we’re stumbling in the dark. You simply need to understand it for what it is: a generalized theory, then most of the apparent paradoxes evaporate.
Kolmogorov complexity tells us there is no such thing as a universal lossless compression algorithm, yet people happily “zip” data every day. That doesn’t mean Kolmogorov wasted his time coming up with his general ideas about complexity. Real world data tends to have a lot of structure because we live in a low-entropy universe. When you take a photo or record audio, it doesn’t look or sound like white noise because there’s structure in the universe. In math-land, the vast majority of bit-strings would look and sound like incompressible white noise.
The same holds true for AIXI. The vast majority of problems drawn from problem space would essentially be, “map this string of random bits to some other string of random bits” in which case, the best you can hope for is a brute-force tree-search of all the possibilities weighted by Occam’s razor (i.e. Solomonoff inductive inference).
Most “theories of rational belief” I have encountered—including Bayesianism in the sense I think is meant here—are framed at the level of an evaluator outside the universe, and have essentially no content when we try to transfer them to individual embedded agents. This is because these theories tend to be derived in the following way: …
I can’t speak to the motivations or processes of others, but these sound like assumptions without much basis. The reason I tend to define intelligence outside of the environment is because it generalizes much better. There are many problems where the system providing the solution can be decoupled both in time and space from the agent acting upon said solution. Agents solving problems in real-time are a special case, not a general case. The general case is: an intelligent system produces a solution/policy to a problem and an agent in an environment acts upon that solution/policy. An intelligent system might spend all night planning how to most efficiently route mail trucks the next morning, the drivers then follow those routes. A real-time model in which the driver has to plan her routs while driving is a special case. You can think of it as the drivers brain coming up with the solution/policy and the driver acting on it in situ.
You could make the case that the driver has to do on-line/real-time problem solving to navigate the roads and avoid collisions, etc. in which case the full solution would be a hybrid of real-time and off-line formulation (which is probably representative of most situations). Either way, constraining your definition of intelligence to only in-situ problem solving excludes many valid examples of intelligence.
Also, it doesn’t seem like you understand what Solomonoff inductive inference is. The weighted average is used because there will typically be multiple world models that explain your experiences at any given point in time and Occam’s razor says to favor shorter explanations that give the same result, so you weight the predictions of each model by the inverse of the length of the model (in bits, usually).
Concretely, this talk of approximations is like saying that a very successful chess player “approximates” the rule “consult all possible chess players, then weight their moves by past performance.” Yes, the skilled player will play similarly to this rule, but they are not following it, not even approximately! They are only themselves, not any other player.
I think you’re confusing behavior with implementation. When people talk about neural nets being “universal function approximators” they’re talking about the input-output behavior, not the implementation. Obviously the implementation of an XOR gate is different than a neural net that approximates an XOR gate.

Abe Dillon 1 Aug 2019 0:01 UTC
15 points
on: Abe Dillon’s Shortform
Rough is easy to find and not worth much.
Diamonds are much harder to find and worth a lot more.
I once read a post by someone who was unimpressed with the paper that introduced Generative Adversarial Networks (GANs). They pointed out some sloppy math and other such problems and were confused why such a paper had garnered so much praise.
Someone replied that, in her decades of reading research papers, she learned that finding flaws is easy and uninteresting. The real trick is being able to find the rare glint of insight that a paper brings to the table. Understanding how even a subtle idea can move a whole field forward. I kinda sympathize as a software developer.
I remember when I first tried to slog through Marcus Hutter’s book on AIXI, I found the idea absurd. I have no formal background in mathematics, so I chalked some of that up to me not fully understanding what I was reading. I kept coming back to the question (among many others): “If AIXI is incomputable, how can Hutter supposedly prove that it performs ‘optimally’? What does ‘optimal’ even mean? Surely it should include the computational complexity of the agent itself!”
I tried to modify AIXI to include some notion of computational resource utilization until I realized that any attempt to do so would be arbitrary. Some problems are much more sensitive to computational resource utilization than others. If I’m designing a computer chip, I can afford to have the algorithm run an extra month if it means my chip will be 10% faster. The algorithm that produces a sub-optimal solution in milliseconds using less than 20 MB of RAM doesn’t help me. At the same time, if a saber-toothed tiger jumps out of a bush next to me. I don’t have months to figure out a 10% faster route to get away.
I believe there are problems with AIXI, but lots of digital ink has been spilled on that subject. I plan on contributing a little to that in the near future, but I also wanted to point out that, it’s easy to look at an idea like AIXI from the wrong perspective and miss a lot of what it truly has to say.

Abe Dillon 2 Aug 2019 23:35 UTC
12 points
on: Occam’s Razor: In need of sharpening?
The Many Worlds interpretation of Quantum Mechanics is considered simple because it takes the math at face value and adds nothing more. There is no phenomenon of wave-function collapse. There is no special perspective of some observer. There is no pilot wave. There are no additional phenomena or special frames of reference imposed on the math to tell a story. You just look at the equations and that’s what they say is happening.
The complexity of a theory is related to the number of postulates you have to make. For instance: Special Relativity is actually based on two postulates:
1. the laws of physics are invariant (i.e. identical) in all inertial frames of reference (i.e. non-accelerating frames of reference); and
2. the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.
The only way to reconcile those two postulates are if space and time become variables.
The rest is derived from those postulates.
Quantum Filed Theory is based on Special Relativity and the Principal of Least Action.

Abe Dillon 5 Aug 2019 19:11 UTC
11 points
in reply to: Jimdrix_Hendri’s comment on: Occam’s Razor: In need of sharpening?
The idea of counting postulates is attractive, but it harbours a problem...
...we’d still find that each postulate encapsulates many concepts, and that a fair comparison between competing theories should consider the relative complexity of the concepts as well.
Yes, I agree. A simple postulate count is not sufficient. That’s why I said complexity is *related* to it rather than the number itself. If you want a mathematical formalization of Occam’s Razor, you should read up on Solomonoff’s Inductive Inference.
To address your point about the “complexity” of the “Many Worlds” interpretation of quantum field theory (QFT): The size of the universe is not a postulate of the QFT or General Relativity. One could derive what a universe containing only two particles would look like using QFT or GR. It’s not a fault of the theory that the universe actually contains ~ 10^80 particles†.
People used to think the solar system was the extent of the universe. Just over a century ago, the Milky Way Galaxy was thought to be the extent of the universe. Then it grew by a factor of over 100 Billion when we found that there were that many galaxies. That doesn’t mean that our theories got 100 Billion times more complex.
† Now we know that the observable universe may only be a tiny fraction of the universe at large which may be infinite. In-fact, there are several different types of multiverse that could exist simultaneously.

Abe Dillon 12 Apr 2020 1:21 UTC
6 points
in reply to: Gordon Seidoh Worley’s comment on: Human instincts, symbol grounding, and the blank-slate neocortex
Hey, G Gordon Worley III!
I just finished reading this post because Steve2152 was one of the two people (you being the other) to comment on my (accidentally published) post on formalizing and justifying the concept of emotions.
It’s interesting to hear that you’re looking for a foundational grounding of human values because I’m planning a post on that subject as well. I think you’re close with the concept of error minimization. My theory reaches back to the origins of life and what sets living systems apart from non-living systems. Living systems are locally anti-entropic which means: 1) According to the second law of thermodynamics, a living system can never be a truly closed system. 2) Life is characterized by a medium that can gather information such as genetic material.
The second law of thermodynamics means that all things decay, so it’s not enough to simply gather information, the system must also preserve the information it gathers. This creates an interesting dynamic because gathering information inherently means encountering entropy (the unknown) which is inherently dangerous (what does this red button do?). It’s somewhat at odds with the goal of preserving information. You can even see this fundamental dichotomy manifest in the collective intelligence of the human race playing tug-of-war between conservatism (which is fundamentally about stability and preservation of norms) and liberalism (which is fundamentally about seeking progress or new ways to better society).
Another interesting consequence of the ‘telos’ of life being to gather and preserve information is: it inherently provides a means of assigning value to information. That is: information is more valuable the more it pertains to the goal of gathering and preserving information. If an asteroid were about to hit earth and you were chosen to live on a space colony until Earth’s atmosphere allowed humans to return and start society anew, you would probably favor taking a 16 GB thumb drive with the entire English Wikipedia article text than a server-rack full several petabytes of high-definition recordings of all the reality television ever filmed, because that won’t be super helpful toward the goal of preserving knowledge *relevant* to man kind’s survival.
The theory also opens interesting discussions like, if all living things have a common goal; why do things like paracites, conflict, and war exist? Also, how has evolution led to a set of instincts that imperfectly approximate this goal? How do we implement this goal in an intelligent system? How do we guarantee such an implementation will not result in conflict? Etc.
Anyway, I hope you’ll read it when I publish it and let me know what you think!

Abe Dillon 30 Jul 2019 22:34 UTC
6 points
in reply to: nostalgebraist’s comment on: Embedded World-Models
I was arguing that a specific type of fully general theory lacks a specific type of practical value
In that case, your argument lacks value in its own right because it is vague and confusing. I don’t know any theories that fall in the “specific type” of general theory you tried to describe. You used Solomonoff as an example when it doesn’t match your description.
one which people sometimes expect that type of theory to have.
When someone develops a formalization, they have to explicitly state its context and any assumptions. If someone expects to use Kolmogorov complexity theory to write the next hit game, they’re going to have a bad time. That’s not Kolmogorov’s fault.
I’m arguing that certain characterizations of ideal behavior cannot help us explain why any given implementation approximates that behavior well or poorly.
Of course it can. It provides a different way of constructing a solution. You can start with an ideal then add assumptions that allow you to arrive at a more practicable implementation.
For instance, in computer vision; determining how a depth camera is moving in a scene is very difficult if you use an ideal formalization directly, but if you assume that the differences between two point-clouds are due primarily to affine transformations, then you can use the computationally cheap iterative-closest-point method based on Procrustes analysis to approximate the formal solution. Then, when you observe anomalous behavior, your usual suspects will be the list of assumptions you made to render the problem tractable. Are there non-affine transformations dominating the deltas between point clouds? Maybe that’s causing my computer vision system to glitch. Maybe I need some way to detect such situations and/or some sort of fall-back.
Not only that, but there are many other reasons to formalize ideas like intelligence other than to guide the practical implementation of intelligent systems. You can explore the concept of intelligence and its bounds.
Again if you understand a tool for what it is, there’s no problem. Of-course trying to use a purely formalized theory directly to solve real-world problems is going to yield confusing results. Trying to engineer a bridge using the standard model of particle physics is going to be just as difficult. It’s not a fault of the theory, nor does it mean studying the theory is pointless. The problem is that you want it to be something it’s not.
I don’t understand how the rest of your points engage with my argument.
It’s hard to engage much with your argument because it’s made up of vague straw men:
Most “theories of rational belief” I have encountered
I have no solid context to engage you about. If you’re talking about AIXI, then you’ve misunderstood AIXI because it isn’t about choosing strategies out of a set of all strategies. In-fact, you’ve got Solomonoff Inductive inference completely wrong too:
For example, in Solomonoff, S is defined by computability while R is allowed to be uncomputable.
Solomonoff inductive inference is defined in the context of an agent observing an environment. That’s all. It doesn’t take actions. It just observes and predicts. There is no set of strategies. There is no rule for selecting a strategy, and given your definition of S and R:
We have some class of “practically achievable” strategies S, which can actually be implemented by agents. We note that an agent’s observations provide some information about the quality of different strategies s∈S. So if it were possible to follow a rule like R≡ “find the best s∈S given your observations, and then follow that s,” this rule would spit out very good agent behavior.
It doesn’t even make sense that R would be incomputable given that S is computable.
When you say:
Concretely, this talk of approximations is like saying that a very successful chess player “approximates” the rule “consult all possible chess players, then weight their moves by past performance.” Yes, the skilled player will play similarly to this rule, but they are not following it, not even approximately! They are only themselves, not any other player.
On what grounds do you even justify the claim that the chess player’s behavior is “not even approximately” following the rule of “consult all possible chess players, then weight their moves by past performance.”?
Actually, what vanilla AIXI would prescribe is a full tree traversal similar to the min-max algorithm. Which is, of-course; impractical. However, there are things you can do to approximate a full tree traversal more practically. You can build approximate models based on experience like “given the state of the board, what moves should I consider” which prunes the width of the tree, and “given the state of the board, how likely am I to win” which limits the depth of the tree. So instead of considering every possible move at every possible step of the game to every possible conclusion, you only consider 3-4 possible moves per step and only maybe 4-5 steps into the future. Maybe diminishing the number of moves per step.
Yes, there is a good reason Solomonoff does a weighted average and not an argmax
Did you edit your original comment? Because I could have sworn you said more disparaging the use of “arbitrary” weights. At any rate, it’s not a “performance-weighted average” as it isn’t about performance. It’s about uncertainty.

Abe Dillon 2 Aug 2019 22:05 UTC
5 points
in reply to: jacobjacob’s comment on: jacobjacob’s Shortform Feed
According to the standard model of physics: information can’t be created or destroyed. I don’t know if science can be said to “generate” information rather than capturing it. It seems like you might be referring to a less formal notion of information, maybe “knowledge”.
Are short-forms really about information and knowledge? It’s my understanding that they’re about short thoughts and ideas.
I’ve been contemplating the value alignment problem and have come to the idea that the “telos” of life is to capture and preserve information. This seemingly implies some measure of the utility of information, because information that’s more relevant to the problem of capturing and preserving information is more important to capture and preserve than information that’s irrelevant to capturing and preserving information. You might call such a measure “knowledge”, but there’s probably already an information theoretic formalization of that word.
I have to admit, I don’t have a strong background in information theory. I’m not really sure if it even makes sense to discuss what some information is “about”. I think there’s something called the Data-Information-Knowledge-Wisdom (DIKW) hierarchy which may help sort that out. I think data is the bits used to store information. Like the information content of an un-compressed word document might be the same after compressing said document, it just takes up less data. Knowledge might be how information relates to other information, like you might think it takes one bit of information to convey whether the British are invading by land or by sea, but if you have more information about what factors into that decision, like the weather then the signal conveys less than one bit of information because you can make a pretty good prediction without it. In other words: our universe follows some rules and causal relationships so treating events as independent random occurrences is rarely correct. Wisdom, I believe; is about using the knowledge and information you have to make decisions.
Take all that with a grain of salt.

Abe Dillon 6 Aug 2019 2:20 UTC
4 points
on: AI Alignment Open Thread August 2019
The telos of life is to collect and preserve information. That is to say: this is the defining behavior of a living system, so it is an inherent goal. The beginning of life must have involved some replicating medium for storing information. At first, life actively preserved information by replicating, and passively collected information through the process of evolution by natural selection. Now life forms have several ways of collecting and storing information. Genetics, epigenetic, brains, immune systems, gut biomes, etc.
Obviously a system that collects and preserves information is anti-entropic, so living systems can never be fully closed systems. One can think of them as turbulent vortices that form in the flow of the universe from low-entropy to high-entropy. It may never be possible to halt entropy completely, but if the vortex grows enough, it may slow the progression enough that the universe never quite reaches equilibrium. That’s the hope, at least.
One nice thing about this goal is that it’s also an instrumental goal. It should lead to a very general form of intelligence that’s capable of solving many problems.
One question is: if all living creatures share the same goal, why is there conflict? The simple answer is that it’s a flaw in evolution. Different creatures encapsulate different information about how to survive. There are few ways to share this information, so there’s not much way to form an alliance with other creatures. Ideally, we would want to maximize our internal, low entropy part, and minimize our interface with high entropy.
Imagine playing a game of Risk. A good strategy is to maximize the number of countries you control while minimizing the number of access points to your territory. If you hold North America, you want to take Venezuela, Iceland, and Kamchatka too because they add to your territory without adding to your “interface”. You still only have three territories to defend. This principal extends to many real-world scenarios.
Of-course a better way is to form alliances with your neighbors so you don’t have to spend so many resources concurring them (that’s not a good way to win Risk, but it would be better in the real world).
The reason humans haven’t figured out how to reach a state of peace is because we have a flawed implementation of intelligence that makes it difficult to align our interests (or to recognize that our base goals are inherently aligned).
One interesting consequence of the goal of collecting and preserving information is that it inherently implies a utility function to information. That is: information that is more relevant to the problem of collecting and preserving information is more valuable than information that’s less relevant to that goal. You’re not winning at life if you have an HD box set of “Happy Days” while your neighbor has only a flash drive with all of wikipedia on it. You may have more bits of information, but those bits aren’t very useful.
Another reason for conflict among humans is the hard problem of when to favor information preservation over collection. Collecting information necessarily involves risk because it means encountering the unknown. This is the basic conflict between conservatism and liberalism in the most general form of those words.
Would an AI given the goal of collecting and preserving information completely solve the alignment problem? It seems like it might. I’d like to be able to prove such a statement. Thoughts?
EDIT: Please pardon the disorganized, stream-of-consciousness, style of this post. I’m usually skeptical of posts that seem so scatter-brained and almost… hippy-dippy… for lack of a better word. Like the kind of rambling that a stoned teenager might spout. Please work with me here. I’ve found it hard to present this idea without coming off as a spiritualist-quack, but it is a very serious proposal.

Abe Dillon 6 Aug 2019 0:30 UTC
4 points
in reply to: TAG’s comment on: Occam’s Razor: In need of sharpening?
if you cast SI on terms of a linear string of bits, as is standard, you are building in a kind of single universe assumption.
First, I assume you mean a sequential string of bits. “Linear” has a well defined meaning in math that doesn’t make sense in the context you used it.
Second, can you explain what you mean by that? It doesn’t sound correct. I mean, an agent can only make predictions about its observable universe, but that’s true of humans too. We can speculate about multiverses and how they may shape our observations (e.g. the many worlds interpretation of QFT), but so could an SI agent.

Abe Dillon 9 Aug 2019 1:04 UTC
3 points
in reply to: shminux’s comment on: An Intuitive Explanation of Solomonoff Induction
After all, a formalization of Occam’s razor is supposed to be useful in order to be considered rational.
Declaring a mathematical abstraction useless just because it is not practically applicable to whatever your purpose may be is pretty short-sighted. The concept of infinity isn’t useful to engineers, but it’s very useful to mathematicians. Does that make it irrational?

Abe Dillon 6 Aug 2019 0:58 UTC
3 points
in reply to: Jimdrix_Hendri’s comment on: Occam’s Razor: In need of sharpening?
I think you’re example of interpreting quantum mechanics gets pretty close to the heart of the matter. It’s one thing to point at solomonoff induction and say, “there’s your formalization”. It’s quite another to understand how Occam’s Razor is used in practice.
Nobody actually tries to convert the Standard Model to the shortest possible computer program, count the bits, and compare it to the shortest possible computer program for string theory or whatever.
What you’ll find, however; is that some theories amount to other theories but with an extra postulate or two (e.g. many worlds vs. Copenhagen). So they are strictly more complex. If it doesn’t explain more than the simpler theory the extra complexity isn’t justified.
A lot of the progression of science over the last few centuries has been toward unifying diverse theories under less complex, general frameworks. Special relativity helped unify theories about the electric and magnetic forces, which were then unified with the weak nuclear force and eventually the strong nuclear force. A lot of that work has helped explain the composition of the periodic table and the underlying mechanisms to chemistry. In other words, where there used to be many separate theories, there are now only two theories that explain almost every phenomenon in the observable universe. Those two theories are based on surprisingly few and surprisingly simple postulates.
Over the 20th century, the trend was towards reducing postulates and explaining more, so it was pretty clear that Occam’s razor was being followed. Since then, we’ve run into a bit of an impasse with GR and QFT not nicely unifying and discoveries like dark energy and dark matter.
What links here?
- Abe Dillon's comment on Occam’s Razor: In need of sharpening? by Jimdrix_Hendri (6 Aug 2019 21:11 UTC; 1 point)

Abe Dillon 5 Aug 2019 19:48 UTC
3 points
in reply to: TAG’s comment on: Occam’s Razor: In need of sharpening?
That’s not how algorithmic information theory works. The output tape is not a factor in the complexity of the program. Just the length of the program.
The size of the universe is not a postulate of the QFT or General Relativity. One could derive what a universe containing only two particles would look like using QFT or GR. It’s not a fault of the theory that the universe actually contains ~ 10^80 particles†.
People used to think the solar system was the extent of the universe. Just over a century ago, the Milky Way Galaxy was thought to be the extent of the universe. Then it grew by a factor of over 100 Billion when we found that there were that many galaxies. That doesn’t mean that our theories got 100 Billion times more complex.
If you take the Many Worlds interpretation and decide to follow the perspective of a single particle as though it were special, Copenhagen is what falls out. You’re left having to explain what makes that perspective so special.
† Now we know that the observable universe may only be a tiny fraction of the universe at large which may be infinite. In-fact, there are several different types of multiverse that could exist simultaneously.

Abe Dillon 5 Aug 2019 19:37 UTC
3 points
in reply to: cousin_it’s comment on: Occam’s Razor: In need of sharpening?
Once you’ve observed a chunk of binary tape that has at least one humanlike brain (you), it shouldn’t take that many bits to describe another (Thor).
Maxwell’s Equations don’t contain any such chunk of tape. In current physical theories (the Standard Model and General Relativity), the brains are not described in the math, rather brains are a consequence of the theories carried out under specific conditions.
Theories are based on postulates which are equivalent to axioms in mathematics. They are the statements from which everything else is derived but which can’t be derived themselves. Statements like “the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.”
At the turn of the 20th century, scientists were confused by the apparent contradiction between Galilean Relativity and the implication from Maxwell’s Equations and empirical observation that the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer. Einstein formulate Special Relativity by simply asserting that both were true. That is: the postulates of SR are:
1. the laws of physics are invariant (i.e. identical) in all inertial frames of reference (i.e. non-accelerating frames of reference); and
2. the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.
The only way to reconcile those two statements is if time and space become variables. The rest of SR is derived from those two postulates.
Quantum Field Theory is similarly derived from only a few postulates. None of them postulate that some intelligent being just exists. Any program that would describe such a postulate would be relatively enormous.

Abe Dillon 29 Jun 2019 1:35 UTC
3 points
in reply to: dxu’s comment on: Decision Theory
The reason it’s untrue is because the concept of “I/O channels” does not exist within physics as we know it.
Yes. They most certainly do. The only truly consistent interpretation I know of current physics is information theoretic anyway, but I’m not interested in debating any of that. The fact is I’m communicating to you with physical I/O channels right now so I/O channels certainly exist in the real world.
the true laws of physics make no reference to inputs, outputs, or indeed any kind of agents at all.
Agents are emergent phenomenon. They don’t exist on the level of particles and waves. The concept is an abstraction.
“I/O channels” are simply arrangements of matter and energy, the same as everything else in our universe. There are no special XML tags attached to those configurations of matter and energy, marking them “input”, “output”, “processor”, etc. Such a notion is unphysical.
An I/O channel doesn’t imply modern computer technology. It just means information is collected from or imprinted upon the environment. It could be ant pheromones, it could be smoke signals, its physical implementation is secondary to the abstract concept of sending and receiving information of some kind. You’re not seeing the forest through the trees. Information most certainly does exist.
Why might this distinction be important? It’s important because an algorithm that is implemented on physically existing hardware can be physically disrupted. Any notion of agency which fails to account for this possibility—such as, for example, AIXI, which supposes that the only interaction it has with the rest of the universe is by exchanging bits of information via the input/output channels—will fail to consider the possibility that its own operation may be disrupted.
I’ve explained in previous posts that AIXI is a special case of AIXI_lt. AIXI_lt can be conceived of in an embedded context, in which case; its model of the world would include a model of itself which is subject to any sort of environmental disturbance.
To some extent, an agent must trust its own operation to be correct, because you quickly run into infinite regression if the agent is modeling all the possible that it could be malfunctioning. What if the malfunction effects the way it models the possible ways it could malfunction? It should model all the ways a malfunction could disrupt how it models all the ways it could malfunction, right? It’s like saying “well the agent could malfunction, so it should be aware that it can malfunction so that it never malfunctions”. If the thing malfunctions, it malfunctions, it’s as simple as that.
Aside from that, AIXI is meant to be a purely mathematical formalization, not a physical implementation. It’s an abstraction by design. It’s meant to be used as a mathematical tool for understanding intelligence.
AIXI also fails on various decision problems that involve leaking information via a physical side channel that it doesn’t consider part of its output; for example, it has no regard for the thermal emissions it may produce as a side effect of its computations.
Do you consider how the 30 Watts leaking out of your head might effect your plans to every day? I mean, it might cause a typhoon in Timbuktu! If you don’t consider how the waste heat produced by your mental processes effect your environment while making long or short-term plans, you must not be a real intelligent agent...
In the extreme case, AIXI is incapable of conceptualizing the possibility that an adversarial agent may be able to inspect its hardware, and hence “read its mind”.
AIXI can’t play tic-tac-toe with itself because that would mean it would have to model itself as part of the environment which it can’t do. Yes, I know there are fundamental problems with AIXI...
This is, again, because AIXI is defined using a framework that makes it unphysical
No. It’s fine to formalize something mathematically. People do it all the time. Math is a perfectly valid tool to investigate phenomena. The problem with AIXI proper, is that it’s limited to a context in which the agent and environment are independent entities. There are actually problems where that is a decent approximation, but it would be better to have a more general formulation, like AIXI_lt that can be applied to contexts in which an agent is embedded in its environment.
This applies even to computable formulations of AIXI, such as AIXI-tl: they have no way to represent the possibility of being simulated by others, because they assume they are too large to fit in the universe.
That’s simply not true.
I’m not sure what exactly is so hard to understand about this, considering the original post conveyed all of these ideas fairly well. It may be worth considering the assumptions you’re operating under—and in particular, making sure that the post itself does not violate those assumptions—before criticizing said post based on those assumptions.
I didn’t make any assumptions. I said what I believe to be correct.
I’d love to hear you or the author explain how an agent is supposed to make decisions about what to do in an environment if it’s agency is completely undefined.
I’d also love to hear your thoughts on the relationship between math, science, and the real world if you think comparing a physical implementation to a mathematical formalization is any more fruitful than comparing apples to oranges.
Did you know that engineers use the “ideal gas law” every day to solve real-world problems even though they know that no real-world gas actually follows the “ideal gas law”?! You should go tell them that they’re doing it wrong!

Abe Dillon 31 Jul 2019 23:22 UTC
2 points
on: Abe Dillon’s Shortform
Drop the “A”
Flight is a phenomenon exhibited by many creatures and machines alike. We don’t say mosquitos are capable of flight and helicopters are capable of “artificial flight” as though the word means something fundamentally different for man-made devices. Flight is flight: the process by which an object moves through an atmosphere (or beyond it, as in the case of spaceflight) without contact with the surface.
So why do we feel the need to discuss intelligence as though it wasn’t a phenomenon in its own right, but something fundamentally different depending on implementation?
If we were approaching this rationally, we’d first want to formalize the concept of intelligence mathematically so that we can bring to bear the full power of math to the pursuit and we could put to rest all the arguments and confusion caused by leaving the term so vaguely defined. Then we’d build a science dedicated to studying the phenomenon regardless of implementation. Then we’d develop ways to engineer intelligent systems (biological or otherwise) guided by the understanding granted us by a proper scientific field.
What we’ve done, instead; is developed a few computer science techniques, coined the term “Artificial Intelligence” and stumbled around in the dark trying to wear both the hat of engineer and scientist while leaving the word itself undefined. Seriously, our best definition amounts to: “we’ll know it when we see it” (i.e. the Turing Test). That doesn’t provide any guidance. That doesn’t allow us to say “this change will make the system more intelligent” with any confidence.
Keeping the word “Artificial” in the name of what should be a scientific field only encourages tunnel vision. We should want to understand the phenomenon of intelligence whether it be exhibited by a computer, a human, a raven, a fungus, or a space alien.

Abe Dillon 11 Apr 2020 22:51 UTC
1 point
in reply to: Steven Byrnes’s comment on: A Formal Justification of Emotions
Thanks for the insight!
This is actually an incomplete draft that I didn’t mean to publish, so I do intend to cover some of your points. It’s probably not going to go into the depth you’re hoping for since it’s pretty much just a synthesis of the bit of information from a segment from a Radiolab episode and three theorems about neural networks.
My goal was to simply use those facts to provide an informal proof that a trade-off exists between latency and optimality* in neural networks and that said trade-off explains why some agents (including biological creatures) might use multiple models at different points in that trade-off instead of devoting all their computational resources to one very deep model or one low-latency model. I don’t think it’s a particularly earth-shattering revelation, but sometimes; even pretty straight forward ideas can have an impact**.
I also don’t think that subconscious processing is exactly the same as emotions.
The position I present here is a little more subtle than that. It doesn’t directly equate subconscious processing to emotions. I state that emotions are: a conscious recognition of physiological processes triggered by faster stimulus-response paths in your nervous system.
The examples given in the podcast focus mostly on fight-or-flight until they get later into the discussion about research on paraplegic subjects. I think that might hint at a hierarchy of emotional complexity. It’s easy to explain the most basic ‘emotion’ that even the most primitive brains should express. As you point out; emotions like guilt are more difficult to explain. I don’t know if I can give a satisfactory response to that point because it’s beyond my lay understanding, but my best guess is: this feed-back loop from stimulus to response back to stimulus and so on can be initiated from something other than direct sensory input and the information fed back might include more than physiological state.
Each path has some input which propagates through it and results in some output. The output might include more than signals that directly physiological control signals such as various muscles. It include more abstract information such as a compact representation of the internal state of the path. The input might include more than sensory input. The feedback might be more direct.
For instance, I believe I’ve read that some parts of the brain receive a copy of recent motor commands which may or may not correspond to physiological change. Along with the in-direct feedback from sensors that measure your sweaty palms, the output of a path may directly feed back the command signals to release hormones or to blink eyes or whatever as input to other paths. A path might output signals that don’t correspond to any physiological control, they may be specifically meant to be feedback signals that communicate more abstract information.
Another example is: you don’t cry at the end of Schindler’s List because of any direct sensory input. The emotion arises from a more complex, higher-order cognition of the situation. Perhaps there are abstract outputs from the slower paths that feed back into the faster paths which makes the whole feed-back system more complex and allows for a higher-order cognition paths to indirectly result in physiological responses that they don’t directly control.
Another piece of the puzzle may be that the slowest path which I, perhaps erroneously; refer to consciousness, is supposedly where the physiological state triggered by faster paths gets labeled. That slower path almost definitely uses other context to arrive at such a label. A physiological state can have multiple causes. If you’ve just run a marathon on a cold day, it’s unlikely you’ll feel you’re frightened if you register as an elevated heart rate, sweaty palms, goosebumps, etc.
I lump all those ‘faster stimulus-response paths’ including reflexes under the umbrella term ‘subconscious’ which might not be correct. I’m not sure if any of the related fields (neurology, psychology, etc.) have a more precise definition for subconscious. The word used in the podcast is the ‘autonomic nervous system’ which, according to Google means: the part of the nervous system responsible for control of the bodily functions not consciously directed, such as breathing, the heartbeat, and digestive processes.
There’s a bit of a blurred line there, since reflexes are often included as part of the autonomic nervous system even though they govern responses that can also be consciously directed, such as blinking. Also, I believe the debate of what, exactly, ‘consciously directed’ means, is still out since, AFAIK; there’s no generally agreed upon formal definition of the word ‘consciousness’.
In fact, the term “subconscious” lumps together “some of the things happening in the neocortex” with “everything happening elsewhere in the brain” (amygdala, tectum, etc.) which I think are profoundly different and well worth distinguishing. … I think a neocortex by itself cannot do anything biologically useful.
I think there are a lot of words related to the phenomenon of intelligence and consciousness that have nebulous, informal meanings which vaguely reference concrete implementations (like the human mind and brain), but could and should be formalized mathematically. In that pursuit, I’d like to extract the essence of those words from the implementation details like the neocortex.
There are many other creatures, such as octopuses and crows; which are on a similar evolutionary path of increasing intelligence but have completely different anatomy to humans and each other. I agree that focusing research on the neocortex itself is a terrible way to understand intelligence. It’s like trying to understand how a computer works by looking only at media files on the hard drive. Ignoring the BIOS, operating system, file system, CPU, and other underlying systems that render that data useful.
I believe, for instance; Artificial Intelligence is a misnomer. We should be studying the phenomenon of intelligence as an abstract property that a system can exhibit regardless of whether it’s man-made. There is no scientific field of artificial aerodynamics or artificial chemistry. There’s no fundamental difference between the way air behaves when it interacts with a wing that depends upon whether the wing is natural or man-made.
Without a formal definition of ‘intelligence’ we have no way of making basic claims like, “system X is more intelligent than system Y”. It’s similar to how fields like physics were stuck until previously vague words like force and energy were given formal mathematical definitions. The engineering of heat engines benefited greatly when thermodynamics was developed and formalized ideas like ‘heat’ and ‘entropy’. Computer science wasn’t really possible until Church and Turing formalized the vague ideas of computation and computability. Later Shannon formalized the concept of information and allowed even greater progress.
We can look to specific implementations of a phenomenon to draw inspiration and help us understand the more universal truths about the phenomenon in question (as I do in this post), but if an alien robot came from outer-space and behaved in every way like a human, I see no reason to treat its intelligence as a fundamentally distinct phenomenon. When it exhibits emotion, I see no reason to call it anything else.
Anyway, I haven’t read your post yet, but I look forward to it! Thanks, again!
*here, optimality refers to producing the absolute best outputs for a given input. It’s independent of the amount of resources required to arrive at those outputs.
**I mean: Special Relativity (SR) came from the fact that the velocity of light (measured in space/time) appeared constant across all reference frames according to Maxwell’s equations (and backed up by observation). Einstein made the genius but obvious (in hind-sight) conclusion that the only way it’s possible for a value of space/time to remain constant between reference frames is if the measure space and time themselves are variable. The Lorentz transform is the only transform consistent with such dimensional variability between reference frames. There are only three terms in c = time/space, If c is constant and different reference frames demand variability, time and space must not be constant.
Not that I think I’m presenting anything as amazing as Special Relativity or that I think I’m anywhere near Einstein. It’s just a convenient example.

Abe Dillon 10 Apr 2020 15:33 UTC
1 point
in reply to: Gordon Seidoh Worley’s comment on: A Formal Justification of Emotions
In short, your second paragraph is what I’m after.
Philosophically, I don’t think the distinction you make between a design choice and an evolved feature carries much relevance. It’s true that some things evolve that have no purpose and it’s easy to imagine that emotions are one of things especially since people often conceptualize emotion as the “opposite” of rationality, however; some things evolve that clearly do serve a purpose (in other words there is a justification for their existence), like the eye. Of course nobody sat down with the intent to design an eye. It evolved, was useful, and stuck around because of that utility. The utility of the eye (its justification for sticking around) exists independent of whether the eye exists. A designer recognizes the utility before hand and purposefully implements it. Evolution “recognizes” the utility after stumbling into it.

Abe Dillon 9 Aug 2019 0:58 UTC
1 point
in reply to: TAG’s comment on: Occam’s Razor: In need of sharpening?
Thinking this through some more, I think the real problem is that S.I. is defined in the perspective of an agent modeling an environment, so the assumption that Many Worlds has to put any un-observable on the output tape is incorrect. It’s like stating that Copenhagen has to output all the probability amplitudes onto the output tape and maybe whatever dice god rolled to produce the final answer as well. Neither of those are true.

Abe Dillon 9 Aug 2019 0:44 UTC
1 point
in reply to: TAG’s comment on: Occam’s Razor: In need of sharpening?
That’s a link to somebody complaining about how someone else presented an argument. I have no idea what point you think it makes that’s relevant to this discussion.

Abe Dillon 9 Aug 2019 0:37 UTC
1 point
in reply to: TAG’s comment on: Occam’s Razor: In need of sharpening?
output of a TM that just runs the SWE doesn’t predict your and only your observations. You have to manually perform an extra operation to extract them, and that’s extra complexity that isn’t part of the “complexity of the programme”.
First, can you define “SWE”? I’m not familiar with the acronym.
Second, why is that a problem? You should want a theory that requires as few assumptions as possible to explain as much as possible. The fact that it explains more than just your point of view (POV) is a good thing. It lets you make predictions. The only requirement is that it explains at least your POV.
The point is to explain the patterns you observe.
>The size of the universe is not a postulate of the QFT or General Relativity.
That’s not relevant to my argument.
It most certainly is. If you try to run the Copenhagen interpretation in a Turing machine to get output that matches your POV, then it has to output the whole universe and you have to find your POV on the tape somewhere.
The problem is: That’s not how theories are tested. It’s not like people are looking for a theory that explains electromagnetism and why they’re afraid of clowns and why their uncle “Bob” visited so much when they were a teenager and why their’s a white streak in their prom photo as though a cosmic ray hit the camera when the picture was taken, etc. etc.
The observations we’re talking about are experiments where a particular phenomenon is invoked with minimal disturbance from the outside world (if you’re lucky enough to work in a field like Physics which permits such experiments). In a simple universe that just has an electron traveling toward a double-slit wall and a detector, what happens? We can observe that and we can run our model to see what it predicts. We don’t have to run the Turing machine with input of 10^80 particles for 13.8 billion years then try to sift through the output tape to find what matches our observations.
Same thing for the Many Worlds interpretation. It explains the results of our experiments just as well as Copenhagen, it just doesn’t posit any special phenomenon like observation, observation is just what entanglement looks like from the perspective of one of the entangled particles (or system of particles if you’re talking about the scientist).
Operationally, something like copenhagen, ie. neglect of unobserved predictions, and renormalisation , hasto occur, because otherwise you can’t make predictions.
First of all: Of course you can use many worlds to make predictions, You do it every time you use the math of QFT. You can make predictions about entangled particles, can’t you? The only thing is: while the math of probability is about weighted sums of hypothetical paths, in MW you take it quite literally as paths the actually being traversed. That’s what you’re trading for the magic dice machine in non-deterministic theories.
Secondly: Just because Many Worlds says those worlds exist, doesn’t mean you have to invent some extra phenomenon to justify renormalization. At the end of the day the unobservable universe is still unobservable. When you’re talking about predicting what you might observe when you run experiment X, it’s fine to ultimately discard the rest of the multiverse. You just don’t need to make up some story about how your perspective is special and you have some magic power to collapse waveforms that other particles don’t have.
Hence my comment about SU&C. Different adds some extra baggage about what that means—occurred in a different branch versus didn’t occur—but the operation still needs to occur.
Please stop introducing obscure acronyms without stating what they mean. It makes your argument less clear. More often than not it results in *more* typing because of the confusion it causes. I have no idea what this sentence means. SU&C = Single Universe and Collapse? Like objective collapse? “Different” what?