How? The person I’m responding to gets the math of probability wrong and uses it to make a confusing claim that “there’s nothing wrong” as though we have no more agency over the development of AI than we do over the chaotic motion of a dice.
It’s foolish to liken the development of AI to a roll of the dice. Given the stakes, we must try to study, prepare for, and guide the development of AI as best we can.
This isn’t hypothetical. We’ve already built a machine that’s more intelligent than any man alive and which brutally optimizes toward a goal that’s incompatible with the good of man kind. We call it, “Global Capitalism”. There isn’t a man alive who knows how to stock the shelves of stores all over the world with #2 pencils that cost only 2 cents each, yet it happens every day because *the system* knows how. The problem is: that system operates with a sociopathic disregard for life (human or otherwise) and has exceeded all limits of sustainability without so much as slowing down. It’s a short-sighted, cruel leviathan and there’s no human at the reigns.
At this point, it’s not about waiting for the dice to settle, it’s about figuring out how to wrangle such a beast and prevent the creation of more.
This is a pretty lame attitude towards mathematics. If William Rowan Hamilton showed you his discovery of quaternions, you’d probably scoff and say “yeah, but what can that do for ME?”.
Occam’s razor has been a guiding principal for science for centuries without having any proof for why it’s a good policy, Now Solomonoff comes along and provides a proof and you’re unimpressed. Great.
After all, a formalization of Occam’s razor is supposed to be useful in order to be considered rational.
Declaring a mathematical abstraction useless just because it is not practically applicable to whatever your purpose may be is pretty short-sighted. The concept of infinity isn’t useful to engineers, but it’s very useful to mathematicians. Does that make it irrational?
Thinking this through some more, I think the real problem is that S.I. is defined in the perspective of an agent modeling an environment, so the assumption that Many Worlds has to put any un-observable on the output tape is incorrect. It’s like stating that Copenhagen has to output all the probability amplitudes onto the output tape and maybe whatever dice god rolled to produce the final answer as well. Neither of those are true.
That’s a link to somebody complaining about how someone else presented an argument. I have no idea what point you think it makes that’s relevant to this discussion.
output of a TM that just runs the SWE doesn’t predict your and only your observations. You have to manually perform an extra operation to extract them, and that’s extra complexity that isn’t part of the “complexity of the programme”.
First, can you define “SWE”? I’m not familiar with the acronym.
Second, why is that a problem? You should want a theory that requires as few assumptions as possible to explain as much as possible. The fact that it explains more than just your point of view (POV) is a good thing. It lets you make predictions. The only requirement is that it explains at least your POV.
The point is to explain the patterns you observe.
>The size of the universe is not a postulate of the QFT or General Relativity.
That’s not relevant to my argument.
It most certainly is. If you try to run the Copenhagen interpretation in a Turing machine to get output that matches your POV, then it has to output the whole universe and you have to find your POV on the tape somewhere.
The problem is: That’s not how theories are tested. It’s not like people are looking for a theory that explains electromagnetism and why they’re afraid of clowns and why their uncle “Bob” visited so much when they were a teenager and why their’s a white streak in their prom photo as though a cosmic ray hit the camera when the picture was taken, etc. etc.
The observations we’re talking about are experiments where a particular phenomenon is invoked with minimal disturbance from the outside world (if you’re lucky enough to work in a field like Physics which permits such experiments). In a simple universe that just has an electron traveling toward a double-slit wall and a detector, what happens? We can observe that and we can run our model to see what it predicts. We don’t have to run the Turing machine with input of 10^80 particles for 13.8 billion years then try to sift through the output tape to find what matches our observations.
Same thing for the Many Worlds interpretation. It explains the results of our experiments just as well as Copenhagen, it just doesn’t posit any special phenomenon like observation, observation is just what entanglement looks like from the perspective of one of the entangled particles (or system of particles if you’re talking about the scientist).
Operationally, something like copenhagen, ie. neglect of unobserved predictions, and renormalisation , hasto occur, because otherwise you can’t make predictions.
First of all: Of course you can use many worlds to make predictions, You do it every time you use the math of QFT. You can make predictions about entangled particles, can’t you? The only thing is: while the math of probability is about weighted sums of hypothetical paths, in MW you take it quite literally as paths the actually being traversed. That’s what you’re trading for the magic dice machine in non-deterministic theories.
Secondly: Just because Many Worlds says those worlds exist, doesn’t mean you have to invent some extra phenomenon to justify renormalization. At the end of the day the unobservable universe is still unobservable. When you’re talking about predicting what you might observe when you run experiment X, it’s fine to ultimately discard the rest of the multiverse. You just don’t need to make up some story about how your perspective is special and you have some magic power to collapse waveforms that other particles don’t have.
Hence my comment about SU&C. Different adds some extra baggage about what that means—occurred in a different branch versus didn’t occur—but the operation still needs to occur.
Please stop introducing obscure acronyms without stating what they mean. It makes your argument less clear. More often than not it results in *more* typing because of the confusion it causes. I have no idea what this sentence means. SU&C = Single Universe and Collapse? Like objective collapse? “Different” what?
Well, the original comment was about explaining lightning
You’re right. I think I see your point more clearly now. I may have to think about this a little deeper. It’s very hard to apply Occam’s razor to theories about emergent phenomena. Especially those several steps removed from basic particle interactions. There are, of course, other ways to weigh on theory against another. One of which is falsifiability.
If the Thor theory must be constantly modified so to explain why nobody can directly observe Thor, then it gets pushed towards un-falsifiability. It gets ejected from science because there’s no way to even test the theory which in-turn means it has no predictive power.
As I explained in one of my replies to Jimdrix_Hendri, thought there is a formalization for Occam’s razor, Solomonoff induction isn’t really used. It’s usually more like: individual phenomena are studied and characterized mathematically, then; links between them are found that explain more with fewer and less complex assumptions.
In the case of Many Worlds vs. Copenhagen, it’s pretty clear cut. Copenhagen has the same explanatory power as Many Worlds and shares all the postulates of Many Worlds, but adds some extra assumptions, so it’s a clear violation of Occam’s razor. I don’t know of a *practical* way to handle situations that are less clear cut.
Thor isn’t quite as directly in the theory :-) In Norse mythology...
Tetraspace Grouping’s original post clearly invokes Thor as an alternate hypothesis to Maxwell’s equations to explain the phenomenon of electromagnetism. They’re using Thor as a generic stand-in for the God hypothesis.
Norse mythology he’s a creature born to a father and mother, a consequence of initial conditions just like you.
Now you’re calling them “initial conditions”. This is very different from “conditions” which are directly observable. We can observe the current conditions of the universe, come up with theories that explain the various phenomena we see and use those theories to make testable predictions about the future and somewhat harder to test predictions about the past. I would love to see a simple theory that predicts that the universe not only had a definite beginning (hint: your High School science teacher was wrong about modern cosmology) but started with sentient beings given the currently observable conditions.
Sure, you’d have to believe that initial conditions were such that would lead to Thor.
Which would be a lineage of Gods that begins with some God that created everything and is either directly or indirectly responsible for all the phenomena we observe according to the mythology.
I think you’re the one missing Tetraspace Grouping’s point. They weren’t trying to invoke all of Norse mythology, they were trying to compare the complexity of explaining the phenomenon of electromagnetism by a few short equations vs. saying some intelligent being does it.
You wouldn’t penalize the Bob hypothesis by saying “Bob’s brain is too complicated”, so neither should you penalize the Thor hypothesis for that reason.
The existence of Bob isn’t a hypothesis it’s not used to explain any phenomenon. Thor is invoked as the cause of, not consequence of, a fundamental phenomenon. If I noticed some loud noise on my roof every full moon, and you told me that your friend bob likes to do parkour on rooftops in my neighborhood in the light of the full moon, that would be a hypothesis for a phenomenon that I observed and I could test that hypothesis and verify that the noise is caused by Bob. If you posited that Bob was responsible for some fundamental forces of the universe, that would be much harder for me to swallow.
The true reason you penalize the Thor hypothesis is because he has supernatural powers, unlike Bob. Which is what I’ve been saying since the first comment.
No. The supernatural doesn’t just violate Occam’s Razor: it is flat-out incompatible with science. The one assumption in science is naturalism. Science is the best system we know for accumulating information without relying on trust. You have to state how you performed an experiment and what you observed so that others can recreate your result. If you say, “my neighbor picked up sticks on the sabbath and was struck by lightning” others can try to repeat that experiment.
It is, indeed, possible that life on Earth was created by an intelligent being or a group of intelligent beings. They need not be supernatural. That theory, however; is necessarily more complex than any a-biogenesis theory because you have to then explain how the intelligent designer(s) came about which would eventually involve some form of a-biogenesis.
You’re trying to conflate theory, conditions, and what they entail in a not so subtle way. Occam’s razor is about the complexity of a theory, not conditions, not what the theory and conditions entail. Just the theory. The Thor hypothesis puts Thor directly in the theory. It’s not derived from the theory under certain conditions. In the case of the Thor theory, you have to assume more to arrive at the same conclusion.
It’s really not that complicated.
That’s not how rolling a die works. Each roll is completely independent. The expected value of rolling a 20 sided die is 10.5 but there’s no logical way to assign an expected outcome of any given roll. You can calculate how many times you’d have to roll before you’re more likely than not to have rolled a specific value (1-P(specific value))^n < 0.5 so log(0.5)/log(1-P(specific_value)) < n. In this case P(specific_value) is 1⁄20 = 0.05. So n > log(0.5)/log(0.95) = 13.513. So you’re more likely than not to have rolled a “1” after 14 rolls, but that still doesn’t tell you what to expect your Nth roll to be.
I don’t see how your dice rolling example supports a pacifist outlook. We’re not rolling dice here. This is a subject we can study and gain more information about to understand the different outcomes better. You can’t do that with a dice. The outcomes of rolling a dice are not so dire. Probability is quite useful for making decisions in the face of uncertainty if you understand it better.
The telos of life is to collect and preserve information. That is to say: this is the defining behavior of a living system, so it is an inherent goal. The beginning of life must have involved some replicating medium for storing information. At first, life actively preserved information by replicating, and passively collected information through the process of evolution by natural selection. Now life forms have several ways of collecting and storing information. Genetics, epigenetic, brains, immune systems, gut biomes, etc.
Obviously a system that collects and preserves information is anti-entropic, so living systems can never be fully closed systems. One can think of them as turbulent vortices that form in the flow of the universe from low-entropy to high-entropy. It may never be possible to halt entropy completely, but if the vortex grows enough, it may slow the progression enough that the universe never quite reaches equilibrium. That’s the hope, at least.
One nice thing about this goal is that it’s also an instrumental goal. It should lead to a very general form of intelligence that’s capable of solving many problems.
One question is: if all living creatures share the same goal, why is there conflict? The simple answer is that it’s a flaw in evolution. Different creatures encapsulate different information about how to survive. There are few ways to share this information, so there’s not much way to form an alliance with other creatures. Ideally, we would want to maximize our internal, low entropy part, and minimize our interface with high entropy.
Imagine playing a game of Risk. A good strategy is to maximize the number of countries you control while minimizing the number of access points to your territory. If you hold North America, you want to take Venezuela, Iceland, and Kamchatka too because they add to your territory without adding to your “interface”. You still only have three territories to defend. This principal extends to many real-world scenarios.
Of-course a better way is to form alliances with your neighbors so you don’t have to spend so many resources concurring them (that’s not a good way to win Risk, but it would be better in the real world).
The reason humans haven’t figured out how to reach a state of peace is because we have a flawed implementation of intelligence that makes it difficult to align our interests (or to recognize that our base goals are inherently aligned).
One interesting consequence of the goal of collecting and preserving information is that it inherently implies a utility function to information. That is: information that is more relevant to the problem of collecting and preserving information is more valuable than information that’s less relevant to that goal. You’re not winning at life if you have an HD box set of “Happy Days” while your neighbor has only a flash drive with all of wikipedia on it. You may have more bits of information, but those bits aren’t very useful.
Another reason for conflict among humans is the hard problem of when to favor information preservation over collection. Collecting information necessarily involves risk because it means encountering the unknown. This is the basic conflict between conservatism and liberalism in the most general form of those words.
Would an AI given the goal of collecting and preserving information completely solve the alignment problem? It seems like it might. I’d like to be able to prove such a statement. Thoughts?
EDIT: Please pardon the disorganized, stream-of-consciousness, style of this post. I’m usually skeptical of posts that seem so scatter-brained and almost… hippy-dippy… for lack of a better word. Like the kind of rambling that a stoned teenager might spout. Please work with me here. I’ve found it hard to present this idea without coming off as a spiritualist-quack, but it is a very serious proposal.
I think you’re example of interpreting quantum mechanics gets pretty close to the heart of the matter. It’s one thing to point at solomonoff induction and say, “there’s your formalization”. It’s quite another to understand how Occam’s Razor is used in practice.
Nobody actually tries to convert the Standard Model to the shortest possible computer program, count the bits, and compare it to the shortest possible computer program for string theory or whatever.
What you’ll find, however; is that some theories amount to other theories but with an extra postulate or two (e.g. many worlds vs. Copenhagen). So they are strictly more complex. If it doesn’t explain more than the simpler theory the extra complexity isn’t justified.
A lot of the progression of science over the last few centuries has been toward unifying diverse theories under less complex, general frameworks. Special relativity helped unify theories about the electric and magnetic forces, which were then unified with the weak nuclear force and eventually the strong nuclear force. A lot of that work has helped explain the composition of the periodic table and the underlying mechanisms to chemistry. In other words, where there used to be many separate theories, there are now only two theories that explain almost every phenomenon in the observable universe. Those two theories are based on surprisingly few and surprisingly simple postulates.
Over the 20th century, the trend was towards reducing postulates and explaining more, so it was pretty clear that Occam’s razor was being followed. Since then, we’ve run into a bit of an impasse with GR and QFT not nicely unifying and discoveries like dark energy and dark matter.
if you cast SI on terms of a linear string of bits, as is standard, you are building in a kind of single universe assumption.
First, I assume you mean a sequential string of bits. “Linear” has a well defined meaning in math that doesn’t make sense in the context you used it.
Second, can you explain what you mean by that? It doesn’t sound correct. I mean, an agent can only make predictions about its observable universe, but that’s true of humans too. We can speculate about multiverses and how they may shape our observations (e.g. the many worlds interpretation of QFT), but so could an SI agent.
That’s not how algorithmic information theory works. The output tape is not a factor in the complexity of the program. Just the length of the program.
The size of the universe is not a postulate of the QFT or General Relativity. One could derive what a universe containing only two particles would look like using QFT or GR. It’s not a fault of the theory that the universe actually contains ~ 10^80 particles†.
People used to think the solar system was the extent of the universe. Just over a century ago, the Milky Way Galaxy was thought to be the extent of the universe. Then it grew by a factor of over 100 Billion when we found that there were that many galaxies. That doesn’t mean that our theories got 100 Billion times more complex.
If you take the Many Worlds interpretation and decide to follow the perspective of a single particle as though it were special, Copenhagen is what falls out. You’re left having to explain what makes that perspective so special.
† Now we know that the observable universe may only be a tiny fraction of the universe at large which may be infinite. In-fact, there are several different types of multiverse that could exist simultaneously.
Once you’ve observed a chunk of binary tape that has at least one humanlike brain (you), it shouldn’t take that many bits to describe another (Thor).
Maxwell’s Equations don’t contain any such chunk of tape. In current physical theories (the Standard Model and General Relativity), the brains are not described in the math, rather brains are a consequence of the theories carried out under specific conditions.
Theories are based on postulates which are equivalent to axioms in mathematics. They are the statements from which everything else is derived but which can’t be derived themselves. Statements like “the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.”
At the turn of the 20th century, scientists were confused by the apparent contradiction between Galilean Relativity and the implication from Maxwell’s Equations and empirical observation that the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer. Einstein formulate Special Relativity by simply asserting that both were true. That is: the postulates of SR are:
the laws of physics are invariant (i.e. identical) in all inertial frames of reference (i.e. non-accelerating frames of reference); and
the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.
The only way to reconcile those two statements is if time and space become variables. The rest of SR is derived from those two postulates.
Quantum Field Theory is similarly derived from only a few postulates. None of them postulate that some intelligent being just exists. Any program that would describe such a postulate would be relatively enormous.
The idea of counting postulates is attractive, but it harbours a problem...
...we’d still find that each postulate encapsulates many concepts, and that a fair comparison between competing theories should consider the relative complexity of the concepts as well.
Yes, I agree. A simple postulate count is not sufficient. That’s why I said complexity is *related* to it rather than the number itself. If you want a mathematical formalization of Occam’s Razor, you should read up on Solomonoff’s Inductive Inference.
To address your point about the “complexity” of the “Many Worlds” interpretation of quantum field theory (QFT): The size of the universe is not a postulate of the QFT or General Relativity. One could derive what a universe containing only two particles would look like using QFT or GR. It’s not a fault of the theory that the universe actually contains ~ 10^80 particles†.
The Many Worlds interpretation of Quantum Mechanics is considered simple because it takes the math at face value and adds nothing more. There is no phenomenon of wave-function collapse. There is no special perspective of some observer. There is no pilot wave. There are no additional phenomena or special frames of reference imposed on the math to tell a story. You just look at the equations and that’s what they say is happening.
The complexity of a theory is related to the number of postulates you have to make. For instance: Special Relativity is actually based on two postulates:
The only way to reconcile those two postulates are if space and time become variables.
The rest is derived from those postulates.
Quantum Filed Theory is based on Special Relativity and the Principal of Least Action.
According to the standard model of physics: information can’t be created or destroyed. I don’t know if science can be said to “generate” information rather than capturing it. It seems like you might be referring to a less formal notion of information, maybe “knowledge”.
Are short-forms really about information and knowledge? It’s my understanding that they’re about short thoughts and ideas.
I’ve been contemplating the value alignment problem and have come to the idea that the “telos” of life is to capture and preserve information. This seemingly implies some measure of the utility of information, because information that’s more relevant to the problem of capturing and preserving information is more important to capture and preserve than information that’s irrelevant to capturing and preserving information. You might call such a measure “knowledge”, but there’s probably already an information theoretic formalization of that word.
I have to admit, I don’t have a strong background in information theory. I’m not really sure if it even makes sense to discuss what some information is “about”. I think there’s something called the Data-Information-Knowledge-Wisdom (DIKW) hierarchy which may help sort that out. I think data is the bits used to store information. Like the information content of an un-compressed word document might be the same after compressing said document, it just takes up less data. Knowledge might be how information relates to other information, like you might think it takes one bit of information to convey whether the British are invading by land or by sea, but if you have more information about what factors into that decision, like the weather then the signal conveys less than one bit of information because you can make a pretty good prediction without it. In other words: our universe follows some rules and causal relationships so treating events as independent random occurrences is rarely correct. Wisdom, I believe; is about using the knowledge and information you have to make decisions.
Take all that with a grain of salt.
A flaw in the Gödel Machine may provide a formal justification for evolution
I’ve never been a fan of the concept of evolutionary computation. Evolution isn’t fundamentally different than other forms of engineering, rather it’s the most basic concept in engineering. The idea slightly modifying an existing solution to arrive at a better solution is a fundamental part of engineering. When you take away all of an engineer’s other tools, like modeling, analysis, heuristics, etc. You’re left with evolution.
Designing something can be modeled as a series of choices like traversing a tree. There are typically far more possibilities per choice than is practical to explore, so we use heuristics and intelligence to prune the tree. Sure, an evolutionary algorithm might consider branches you never would have considered, but it’s still aimless, you could probably do better simply less aggressively pruning the search tree if you have the resources, there will always be countless branches that are clearly not worth exploring. You want to make a flying machine? What material should the fuselage be made of? What? You didn’t even consider peanut butter? Why?!
I think that some of the draw to evolution comes from the elegant forms found in nature, many of which are beyond the capabilities of human engineering, but a lot of that can be chalked up to the fact that biology started by default with the “holy grail” of manufacturing technologies: codified molecular self-assembly. If we could harness that capability and bring all the techniques we’ve learned over the last few centuries about managing complexity (particularly from computer science), we would quickly be able to engineer some mind-blowing technology in a matter of decades rather than billions of years.
Despite all this, people still find success using evolutionary algorithms and generate a lot of hype even though the techniques are doomed not to scale. Is there a time and place where evolution really is the best technique? Can we derive some rule for when to try evolutionary techniques? Maybe.
There’s a particular sentence in the paper on the Gödel Machine paper that always struck me as odd:
Any formal system that encompasses arithmetics (or ZFC etc) is either flawed or allows for unprovable but true statements. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove
It seems like the machine is making an arbitrary decision in the face of undecidability, especially after admitting that a formal system is either flawed or allows for unprovable but true statements. The more appropriate behavior should be for the Gödel machine to copy itself where one copy implements the change and the other doesn’t. This introduces some more problems, like; what is the cost of duplicating the machine and how should that be factored in, but I thought that observation might provide some food for thought.
Rough is easy to find and not worth much.
Diamonds are much harder to find and worth a lot more.
I once read a post by someone who was unimpressed with the paper that introduced Generative Adversarial Networks (GANs). They pointed out some sloppy math and other such problems and were confused why such a paper had garnered so much praise.
Someone replied that, in her decades of reading research papers, she learned that finding flaws is easy and uninteresting. The real trick is being able to find the rare glint of insight that a paper brings to the table. Understanding how even a subtle idea can move a whole field forward. I kinda sympathize as a software developer.
I remember when I first tried to slog through Marcus Hutter’s book on AIXI, I found the idea absurd. I have no formal background in mathematics, so I chalked some of that up to me not fully understanding what I was reading. I kept coming back to the question (among many others): “If AIXI is incomputable, how can Hutter supposedly prove that it performs ‘optimally’? What does ‘optimal’ even mean? Surely it should include the computational complexity of the agent itself!”
I tried to modify AIXI to include some notion of computational resource utilization until I realized that any attempt to do so would be arbitrary. Some problems are much more sensitive to computational resource utilization than others. If I’m designing a computer chip, I can afford to have the algorithm run an extra month if it means my chip will be 10% faster. The algorithm that produces a sub-optimal solution in milliseconds using less than 20 MB of RAM doesn’t help me. At the same time, if a saber-toothed tiger jumps out of a bush next to me. I don’t have months to figure out a 10% faster route to get away.
I believe there are problems with AIXI, but lots of digital ink has been spilled on that subject. I plan on contributing a little to that in the near future, but I also wanted to point out that, it’s easy to look at an idea like AIXI from the wrong perspective and miss a lot of what it truly has to say.