Fake Causality

Eliezer Yudkowsky23 Aug 2007 18:12 UTC

113 points

Phlogiston was the eighteenth century’s answer to the Elemental Fire of the Greek alchemists. Ignite wood, and let it burn. What is the orangey-bright “fire” stuff? Why does the wood transform into ash? To both questions, the eighteenth-century chemists answered, “phlogiston.”

. . . and that was it, you see, that was their answer: “Phlogiston.”

Phlogiston escaped from burning substances as visible fire. As the phlogiston escaped, the burning substances lost phlogiston and so became ash, the “true material.” Flames in enclosed containers went out because the air became saturated with phlogiston, and so could not hold any more. Charcoal left little residue upon burning because it was nearly pure phlogiston.

Of course, one didn’t use phlogiston theory to predict the outcome of a chemical transformation. You looked at the result first, then you used phlogiston theory to explain it. It’s not that phlogiston theorists predicted a flame would extinguish in a closed container; rather they lit a flame in a container, watched it go out, and then said, “The air must have become saturated with phlogiston.” You couldn’t even use phlogiston theory to say what you ought not to see; it could explain everything.

This was an earlier age of science. For a long time, no one realized there was a problem. Fake explanations don’t feel fake. That’s what makes them dangerous.

Modern research suggests that humans think about cause and effect using something like the directed acyclic graphs (DAGs) of Bayes nets. Because it rained, the sidewalk is wet; because the sidewalk is wet, it is slippery:

From this we can infer—or, in a Bayes net, rigorously calculate in probabilities—that when the sidewalk is slippery, it probably rained; but if we already know that the sidewalk is wet, learning that the sidewalk is slippery tells us nothing more about whether it rained.

Why is fire hot and bright when it burns?

It feels like an explanation. It’s represented using the same cognitive data format. But the human mind does not automatically detect when a cause has an unconstraining arrow to its effect. Worse, thanks to hindsight bias, it may feel like the cause constrains the effect, when it was merely fitted to the effect.

Interestingly, our modern understanding of probabilistic reasoning about causality can describe precisely what the phlogiston theorists were doing wrong. One of the primary inspirations for Bayesian networks was noticing the problem of double-counting evidence if inference resonates between an effect and a cause. For example, let’s say that I get a bit of unreliable information that the sidewalk is wet. This should make me think it’s more likely to be raining. But, if it’s more likely to be raining, doesn’t that make it more likely that the sidewalk is wet? And wouldn’t that make it more likely that the sidewalk is slippery? But if the sidewalk is slippery, it’s probably wet; and then I should again raise my probability that it’s raining . . .

Judea Pearl uses the metaphor of an algorithm for counting soldiers in a line. Suppose you’re in the line, and you see two soldiers next to you, one in front and one in back. That’s three soldiers, including you. So you ask the soldier behind you, “How many soldiers do you see?” They look around and say, “Three.” So that’s a total of six soldiers. This, obviously, is not how to do it.

A smarter way is to ask the soldier in front of you, “How many soldiers forward of you?” and the soldier in back, “How many soldiers backward of you?” The question “How many soldiers forward?” can be passed on as a message without confusion. If I’m at the front of the line, I pass the message “1 soldier forward,” for myself. The person directly in back of me gets the message “1 soldier forward,” and passes on the message “2 soldiers forward” to the soldier behind them. At the same time, each soldier is also getting the message “N soldiers backward” from the soldier behind them, and passing it on as “N + 1 soldiers backward” to the soldier in front of them. How many soldiers in total? Add the two numbers you receive, plus one for yourself: that is the total number of soldiers in line.

The key idea is that every soldier must separately track the two messages, the forward-message and backward-message, and add them together only at the end. You never add any soldiers from the backward-message you receive to the forward-message you pass back. Indeed, the total number of soldiers is never passed as a message—no one ever says it aloud.

An analogous principle operates in rigorous probabilistic reasoning about causality. If you learn something about whether it’s raining, from some source other than observing the sidewalk to be wet, this will send a forward-message from [Rain] to [Sidewalk Wet] and raise our expectation of the sidewalk being wet. If you observe the sidewalk to be wet, this sends a backward-message to our belief that it is raining, and this message propagates from [Rain] to all neighboring nodes except the [Sidewalk Wet] node. We count each piece of evidence exactly once; no update message ever “bounces” back and forth. The exact algorithm may be found in Judea Pearl’s classic Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.

So what went wrong in phlogiston theory? When we observe that fire is hot and bright, the [Fire Hot and Bright] node can send backward-evidence to the [Phlogiston] node, leading us to update our beliefs about phlogiston. But if so, we can’t count this as a successful forward-prediction of phlogiston theory. The message should go in only one direction, and not bounce back.

Alas, human beings do not use a rigorous algorithm for updating belief networks. We learn about parent nodes from observing children, and predict child nodes from beliefs about parents. But we don’t keep rigorously separate books for the backward-message and forward-message. We just remember that phlogiston is hot, which causes fire to be hot. So it seems like phlogiston theory predicts the hotness of fire. Or, worse, it just feels like phlogiston makes the fire hot.

Until you notice that no advance predictions are being made, the non-constraining causal node is not labeled “fake.” It’s represented the same way as any other node in your belief network. It feels like a fact, like all the other facts you know: Phlogiston makes the fire hot.

A properly designed AI would notice the problem instantly. This wouldn’t even require special-purpose code, just correct bookkeeping of the belief network. (Sadly, we humans can’t rewrite our own code, the way a properly designed AI could.)

Speaking of “hindsight bias” is just the nontechnical way of saying that humans do not rigorously separate forward and backward messages, allowing forward messages to be contaminated by backward ones.

Those who long ago went down the path of phlogiston were not trying to be fools. No scientist deliberately wants to get stuck in a blind alley. Are there any fake explanations in your mind? If there are, I guarantee they’re not labeled “fake explanation,” so polling your thoughts for the “fake” keyword will not turn them up.

Thanks to hindsight bias, it’s also not enough to check how well your theory “predicts” facts you already know. You’ve got to predict for tomorrow, not yesterday. It’s the only way a messy human mind can be guaranteed of sending a pure forward message.

What links here?

Eliezer Yudkowsky23 Aug 2007 18:12 UTC

113 points

88 comments4 min readLW link Archive

Causality

Brandon_Reinhart 23 Aug 2007 18:37 UTC
39 points
I just wanted to say that this is the best damn blog I’ve read. The high level of regular, insightful, quality updates is stunning. Reading this blog, I feel like I’ve not just accumulated knowledge, but processes I can apply to continue to refine my understanding of how I think and how I accumulate further knowledge.

I am honestly surprised, with all the work the contributors do in another realms, that you are able to maintain this high level of quality output on a blog.

Recently I have been continuing my self-education in ontology and epistemology. Some sources are more rigorous than others. Reading Rand, for example, shows an author who seems to utilize “phlogiston” like mechanics to describe her ethical solutions to moral problems. Explanations that use convincing, but unbounded turns of phrase instead of a meaningful process of explanation. It can be very challenging to read and process new data and also maintain a lack of bias (or at least an awareness of bias, that can be accounted for and challenged). It requires a very high level of active, conscious information processing. Rereading, working exercises, and thinking through what a person is saying and why they are saying it. This blog has provided me lots of new tools to improve my methods of critical thinking.

Rock on.
Eliezer Yudkowsky 23 Aug 2007 18:40 UTC
26 points
I feel like I’ve not just accumulated knowledge, but processes I can apply to continue to refine my understanding of how I think and how I accumulate further knowledge.

You’ve warmed my heart for the day.
Hopefully_Anonymous 23 Aug 2007 19:13 UTC
1 point
Great post and I agree with Brandon. Eliezer, I recommend you admin a message board (I’ve been recommending an overcomingbias message board for a while) but I think in particular you’d thrive in that environment due to your high posting volume and multiple threads of daily interest. I think you’re a bit constrained intellectually, pedagogically, and speculatively by this format.
TGGP3 23 Aug 2007 21:09 UTC
4 points
I think I’ve said this before, but there is some defense that can be made for the phlogiston theorists. Phlogiston is like an absence of oxygen in modern combustion theory. The falsifiable prediction that caused phlogiston to be abandoned was that phlogiston would have mass, whereas an absence of oxygen (what it was in reality) does not.
Cure_of_Ars 23 Aug 2007 22:38 UTC
2 points
Could evolution be a fake explanation in that it doesn’t predict anything? I’m no creationist but what your explaining in regards to phlogiston seems to have a lot of similarity to evolution. Seems to me like no matter what the data is you can put the tag of evolution on it. Now I’m no expert on evolution so don’t flame me. Just a question on how evolution is different.
- Perplexed 23 Jul 2010 15:14 UTC
  13 points
  Parent
  Since the Theory of Evolution is in the business of explaining the past and present rather than predicting the future, it certainly runs the risk of deluding itself. But running a risk is not the same thing as failing. And whenever individual biologists succumb to hindsight bias, there are other biologists ready to point out their mistakes.
  
  Evolutionary biology is a remarkably introspective discipline with plenty of remora-like philosopher-commensals waiting to devour any sloppy thinking that gets generated. See, for example, the wikipedia article on “The Spandrels of Sam Marco” or on G. C. Williams’s book “Adaptation and Natural Selection”. Or, check the high level of scorn in which practitioners of unfalsifiable “Evolutionary Psychology” are held by other evolutionary scientists.
  
  Evolutionary biology is (in part) a historical science in which hypotheses about the past are used to explain features of the present. So too is geology and much of astro-physics. This class of scientific disciplines certainly seems to run afoul of the deprecation which Eliezer dispenses in the final paragraph of his posting. But, no problem. Other philosophers are on the case, and they point out that this way of doing science can still be ok, in spite of the natural human tendency toward hindsight-bias, so long as you are careful to check that you get more bits worth of explanation out than you feed bits of hypothesis in. The evolutionary meta-hypothesis of common descent and the various more detailed hypotheses (such as that man diverged from chimp roughly 6 million years ago in Africa) have definitely yielded far more information in explanation that was supplied in hypothesis.
  
  But there are many aspects of evolutionary biology that can be tested using methods of which Eliezer’s final paragraph would approve. For example the theories of evolutionary mechanism (random mutation, natural selection, and drift) can and have been tested in the lab and in the field. And, most importantly, we can reasonably extrapolate these observations of small-scale evolution in a short time to hypotheses of large scale evolution over deep time. What justifies this extrapolation? Well, one thing that does is the modern validation of Darwin’s prediction that, once the underlying basis of heritable variation were known, it would be confirmed that variation between species is the same kind of thing as variation within species—only more so. And as anyone who knows some molecular genetics can testify, Darwin’s prediction has been spectacularly confirmed in the genomics era.
  - timtyler 15 Aug 2010 19:26 UTC
    1 point
    Parent
    “Since the Theory of Evolution is in the business of explaining the past and present rather than predicting the future”
    
    Ouch! I hope not! That makes it sound awful! Theories should be consistent with existing observations—sure—but a bigger challenge for them comes in predicting new observations before they are made.
    - bigjeff5 29 Jan 2011 4:25 UTC
      12 points
      Parent
      I think the key is that theories don’t predict the future at all.
      
      They predict observations.
      
      Because of my model, I expect to see X under the given conditions. If I test for X, and I do not find it, this is evidence that my model is wrong. If I test for X and find it, this is evidence that my model is correct.
      
      This says nothing about the future or past, only what you have and have not observed yet, and what you expect to observe next (which can be about the future or past, it doesn’t matter).
    - [deleted] 13 Dec 2014 14:07 UTC
      0 points
      Parent
      There is at least one rather specialised area in which theory offers predictions—evolution of communities. It’s like, ‘when true grasses appeared, they made, through having some novel features, created grasslands. They circumvented successions that would lead to preexistent plant habitats; in the beginning, they were weeds compared to the rest of vegetation. Nowadays, we have a group of species that spread widely, are considered weeds and share ecological similarity, not [relatively recent] common ancestor. We predict that in future, these weeds will form habitats through disruption of current eco networks.’ Admittedly, this is hard to observe.
- listo 30 Dec 2011 22:38 UTC
  7 points
  Parent
  “Could evolution be a fake explanation in that it doesn’t predict anything?”
  
  I wouldn’t think so. Evolution theory predicted things like the kind of animals that are in some islands before exploring them. It also predicts what kind of fosils will you find someplace. It also predicts that if you get a bunch of dogs and select only the bigger ones, several generations later you will have “created” a new race of bigger dogs. Etc.
Steve_Massey 23 Aug 2007 22:43 UTC
1 point
What TGGP said. Also, would an AI really be better at determining the falsifiability of a theory? It seems to me that, given a particular theory, an algorithm for determining the set of testable predictions thereof isn’t going to be easy to optimize. How does the AI prove that one algorithm is better than another? Test it against a set of random theories?
- Joshua 29 Jan 2011 5:00 UTC
  1 point
  Parent
  When I read that the “properly” part really stood out to me. I felt like I was reading about a “true” Scotsman, the sort that would never commit a crime.
Davis 24 Aug 2007 0:32 UTC
2 points
C of A, TalkOrigins addresses your argument.
Vladimir_Nesov2 24 Aug 2007 1:04 UTC
1 point
Phlogiston is not necessarily a bad thing. Concepts are utilized in reasoning to reduce and structure search space. Concepts can be placed in correspondence with multitude of contexts, selecting a branch with required properties, which correlate with its usage. In this case active ‘phlogiston’ concept correlates with presence of fire. Unifying all processes that exhibit fire under this tag can help in development of induction contexts. Process of this refinement includes examination of protocols which include ‘phlogiston’ concept. It’s just not a causal model, which can rigorously predict nontrivial results through deduction.
- tristanhaze 25 Feb 2014 1:07 UTC
  0 points
  Parent
  More than six years late, but better late than never...
  
  ‘Concepts are utilized in reasoning to reduce and structure search space’ - anyone have any references or ideas for further developments of this line of thought? Seems very interesting and related to the philosophical idea of abduction or inference to the best explanation. (Perhaps the relation is one of justification.)
  
  Also, since I find the OP compelling despite this point, I would be interested to see how far they can be reconciled.
  
  My guess, loosely expressed, is that the stuff in Eliezer’s OP above about the importance of good bookkeeping to prevent update messages bouncing back is sound, and should be implemented in designing intelligent systems, but some additional, more abductionesque process could be carefully laid on top. And when interpreting human reasoning, we should perhaps try to learn to distinguish whether, in a given case of a non-predictive empirical belief, the credence comes from bad bookkeeping, in which case it’s illegitimate, or an abductive process which may be legitimate, and indeed may be legitimated along the lines of Vladimir’s tantalizing hint in the parent comment.
Hopefully_Anonymous 24 Aug 2007 1:18 UTC
0 points
Eliezer, we need more posts from you elucidating the importance of optimizing science, etc., as opposed to the current, functional elements of it. In my opinion people are wasting significant comment time responding to each of your posts by saying “hey, such-and-such that you criticized actually has some functionality”.
Robin_Hanson2 24 Aug 2007 1:32 UTC
3 points
An analogous principle operates in rigorous probabilistic reasoning about causality. … We count each piece of evidence exactly once; no update message ever “bounces” back and forth. The exact algorithm may be found in Judea Pearl’s classic “Probabilistic Reasoning …

Actually, Pearl’s algorithm only works for a tree of cause/effects. For non-trees it is provably hard, and it remains an open question how best to update. I actually need a good non-tree method without predictable errors for combinatorial market scoring rules.
TGGP3 24 Aug 2007 1:46 UTC
1 point
In response to Hopefully Anonymous, I think there is a real difference between unfalsifiable pseudosciences and genuine scientific theories (both correct and incorrect). Coming up with methods to distinguish the two will be helpful for us in doing science. It is easy in hindsight to say how obviously wrong something is, it is another to understand why it is wrong and whether its wrongness could have been detected then with the information available as this could assist us later when we do not have all the information we would wish to.
Eliezer Yudkowsky 24 Aug 2007 2:51 UTC
2 points
Robin: Yes indeed. If you can find a cutset for the tree, or cluster a manageable set of variables, all is well and good. I suspect this is what happens with most real-life causal models.

But in general, finding a good non-tree method is not just NP-hard but AI-complete. It is the problem of modeling reality itself.
Gray_Area 24 Aug 2007 10:58 UTC
1 point
Robin Hanson said: “Actually, Pearl’s algorithm only works for a tree of cause/effects. For non-trees it is provably hard, and it remains an open question how best to update. I actually need a good non-tree method without predictable errors for combinatorial market scoring rules.”

To be even more precise, Pearl’s belief propagation algorithm works for the so-called ‘poly-tree graphs,’ which are directed acyclic graphs without undirected cycles (e.g., cycles which show up if you drop directionality). The state of the art for exact inference in bayesian networks are various junction tree based algorithms (essentially you run an algorithm similar to belief propagation on a graph where you force cycles out by merging nodes). For large intractable networks people resort to approximating what they are interested in by sampling. Of course there are lots of approaches to this problem: bayesian network inference is a huge industry.
Geoff 24 Aug 2007 13:53 UTC
9 points
Very interesting. In computer networking, we deal with this same information problem, and the solution (not sending the information from the forward node back to the forward node) is referred to as Split Horizon.

Suppose that Node A can reach Network 1 directly—in one hop. So he tells his neighbor, Node B, “I can get to Network 1 in one hop!”. Node B records “okay, I can get there in two hops then.” The worry is that when Node A loses his connection to Network 1, he asks Node B how to get there, and Node B says “don’t worry, I can get there in two hops!”. This causes Node A to hand his traffic to Node B, who promptly turns it around and hands it back, and thus a loop is created. The solution, split horizon, is exactly as you say here: when you learn a piece of information, record which direction you learned it, and do not advertise that information back in that direction.
Cure_of_Ars 24 Aug 2007 13:56 UTC
0 points
Thanks for the link Davis but it does not address the issue that is brought up in the original post. The examples given in your link were “retrodictions”. To quote the original post...

“Thanks to hindsight bias, it’s also not enough to check how well your theory “predicts” facts you already know. You’ve got to predict for tomorrow, not yesterday. It’s the only way a messy human mind can be guaranteed of sending a pure forward message.”

I’m not arguing that evolution is pseudoscience. I’m just saying that evolution as an explanation could makes us think we understand more than we really do. Again I am no creationist, the data clearly does not fit the creationist explanation.
- simplicio 13 Mar 2010 2:53 UTC
  6 points
  Parent
  @C of A:
  
  Prediction doesn’t have to mean literally predicting future events; it can mean predicting what more we will discover about the past.
  
  E by NS holds that there is one tree of life (at least for complex organisms), just like a family tree. That is a prediction. It means that we won’t find a human in the same fossil stratum and dating to the same time period as a fishlike creature that’s supposed to be our great-to-the-nth-power grammy. So that’s a prediction about our future discoveries, one that has been borne out. That’s one example from a non-expert.
Richard_Hollerith 25 Aug 2007 17:11 UTC
1 point
Another suberb post. I learn so much from your writings.
aram 27 Aug 2007 17:14 UTC
3 points
Is phlogiston theory so much worse than dark matter? Both are place-holders for our ignorance, but neither are completely mysterious, nor do they prevent further questions or investigation into their true nature. If people had an excellent phenomenological understanding of oxygen, but called it phlogiston and didn’t know about atoms or molecules, I wouldn’t discount that. Similarly, it can be very useful to use partial, vague and not-completely-satisfactory models, like dark matter.
- rocurley 1 Aug 2011 21:55 UTC
  6 points
  Parent
  This is an old comment, but dark matter has made predictions that were verified: see the Bullet Cluster
  - Dreaded_Anomaly 1 Aug 2011 22:51 UTC
    1 point
    Parent
    And it’s not just the Bullet Cluster!
JoshuaZ 15 Apr 2010 17:37 UTC
13 points
I’m not sure this that this is fair to phlogiston or the scientists who worked with it. In fact, phlogiston theory made predictions. For example, substances in general were supposed to lose mass when combusting. The fact that metals gain mass when rusting was a data point that was actively against what was predicted by phlogiston theory. The theory also helped scientists see general patterns. So even if the term had been a placeholder, it allowed scientists to see that combustion, rust and metabolism were all ultimately linked procedures. The very fact that these shared patterns and that phlogiston predicted changes in mass (and that it failed to predict the behavior of air rich in oxygen and carbon dioxide (although those terms were not used until late)) in the wrong direction helped lead to the rejection of phlogiston theory. There are classical examples of theories with zero or close to zero predictive value. But phlogiston is not one of them.

Edit: Said lost mass when meant gain mass. Metals gain mass when rusting they don’t lose mass. Phlogiston said they should lose mass but they actually gain mass. Also, fixed some grammar issues.
What links here?
- JoshuaZ's comment on What is missing from rationality? by Roko (2 May 2010 6:23 UTC; 2 points)
[deleted] 6 Jun 2010 14:09 UTC
0 points
Maybe phlogiston is also magic, thus witches were imbued with it. This might explain some things.
TheatreAddict 8 Jul 2011 15:04 UTC
1 point
So.. How precisely would I go about doing this? I mean, let’s say I really thought that phlogiston was the reason fire was hot and bright when it burns. Something that today, we know to be untrue. But if I really thought it was true, and I decided to test my hypothesis, how would I go about proving it false?

What I think the point is about, is that if I already believe that phlogiston was the reason fire is hot and bright, and I observe fire being both hot and bright, then I think this proves that phlogiston is the reason fire is hot and bright. When actually, that’s pointless because I’d have to prove that phlogiston is indeed the reason fire is hot and bright, not the other way around. Am I right? Or did I just end up confusing myself even more, because I’m not entirely sure that what I said is correct and/or makes any sense. O_o
- KPier 8 Jul 2011 16:12 UTC
  3 points
  Parent
  Yes, this is right. A better way of saying it might be: “Phlogiston”, as ancient chemists understood it, meant “that which makes stuff burn”. So saying “Phlogiston causes fire” is like saying “The stuff that makes things burn causes stuff to burn.” If you look at the second statement, phlogiston obviously doesn’t mean anything.
  
  If you wanted to test the hypothesis “phlogiston causes stuff to burn” you really couldn’t, because phlogiston isn’t a proper explanation—there aren’t any conditions that would disprove it. If you want to even consider the hypothesis in the first place it has to make better predictions than other hypotheses.
  - Peterdjones 8 Jul 2011 17:08 UTC
    1 point
    Parent
    The phlogiston theory was tested and disproved because combustion products increase in mass rather than decreasing as the theory predicted.
    - KPier 8 Jul 2011 20:57 UTC
      1 point
      Parent
      I thought that just made theorists respond “So phlogiston must be lighter than air”. But you’re right, the article exaggerates the unfalsifiable, fails-to-constrain-expectations, fake-causality aspects of the theory and oversimplifies it a bit.
      - TheatreAddict 9 Jul 2011 6:47 UTC
        0 points
        Parent
        Ehh, I don’t mind the exaggeration and oversimplification.. If it wasn’t simplified, I probably wouldn’t understand it. :3
  - TheatreAddict 8 Jul 2011 16:57 UTC
    0 points
    Parent
    Thank you for clearing that up for me.
a_mshri 5 Oct 2011 8:12 UTC
0 points
I really enjoyed reading this post, especially its connection with the Pearl’s belief propagation algorithm in bayesian networks.Thank you Eliezer!
royf 4 Jun 2012 2:45 UTC
0 points
This is a great layperson explanation of the belief propagation algorithm.

However, the phlogiston example doesn’t show how this algorithm is improperly implemented in humans. To show this, you need an example of incorrect beliefs drawn from a correct model, i.e. good input to the algorithm resulting in bad output. The phlogiston model was clearly incorrect. As other commenters have pointed out, contemporary scientists were painfully aware of this, and have eventually abandoned the model. Bad output from bad input doesn’t demonstrate a bug in implementation, certainly not the specific bug you mentioned:

we don’t keep rigorously separate books for the backward-message and forward-message

Such a defect would probably not even allow a mouse to be as intelligent as it is.
royf 4 Jun 2012 3:53 UTC
−2 points

Sadly, we humans can’t rewrite our own code, the way a properly designed AI could.

Sure we can!
In fact, we can’t stop rewriting our own code.

When you use the word “code” to describe humans, you take a certain degree of semantic liberty. So we first need to understand what is meant by “code” in this context.

In artificial computing machines, code is nothing more than a state of a chunk of memory hardware that causes the computation hardware to operate in a certain way (for a certain input). Only a tiny subset of the possible states of any chunk of memory hardware are “executable”, i.e. don’t cause the computation hardware to reach a certain “failure” state. This gives us an almost clear-cut distinction between (executable) code and (non-executable) data, under the assumption that data is very unlikely to be executable by chance. Given the correct design, a machine can write code to its memory and then execute it.

In humans, the distinction between memory hardware and computation hardware is unclear, if it exists at all. Moreover, it’s unclear how to apply the above distinction between code and data: what is a human’s “failure” state? I guess a state of the brain (containing both memory and computation hardware, until and unless we can ever separate the two) can be said to be “executable” if, placed in a certain environment, the person doesn’t go and die.

It follows that any change that the brain does to its own state, which then affects its computation, to the result of not dying, is, in fact, “rewriting its own code”. This, of course, happens all the time and (perhaps ironically) cannot be stopped without killing the brain.

In a wider loop, we also have drugs, medications and, eventually, gene therapy. But that’s more similar to a robot reaching for the keyboard (or the screwdriver).
- wedrifid 4 Jun 2012 3:55 UTC
  1 point
  Parent
  
  Sadly, we humans can’t rewrite our own code, the way a properly designed AI could.
  
  Sure we can!
  
  Not the way a properly designed AI could. The difference is qualitative.
  - royf 4 Jun 2012 4:51 UTC
    0 points
    Parent
    Having asserted that your claim is, in fact, new information: can you please clarify and explain why you believe that?
    - CuSithBell 4 Jun 2012 4:56 UTC
      2 points
      Parent
      An advanced AI could reasonably be expected to be able to explicitly edit any part of its code however it desires. Humans are unable to do this.
      - royf 4 Jun 2012 5:08 UTC
        0 points
        Parent
        I believe that is a misconception. Perhaps I’m not being reasonable, but I would expect the level at which you could describe such a creature in terms of “desires” to be conceptually distinct from the level at which it can operate on its own code.
        
        This is the same old question of “free will” again. Desires don’t exist as a mechanism. They exist as an approximate model of describing the emergent behavior of intelligent agents.
        CuSithBell 4 Jun 2012 5:18 UTC
        0 points
        Parent
        You are saying that a GAI being able to alter its own “code” on the actual code-level does not imply that it is able to alter in a deliberate and conscious fashion its “code” in the human sense you describe above?
        
        Generally GAIs are ascribed extreme powers around here—if it has low-level access to its code, then it will be able to determine how its “desires” derive from this code, and will be able to produced whatever changes it wants. Similarly, it will be able to hack human brains with equal finesse.
        royf 4 Jun 2012 6:13 UTC
        3 points
        Parent
        I am saying pretty much exactly that. To clarify further, the words “deliberate”, “conscious” and “wants” again belong to the level of emergent behavior: they can be used to describe the agent, not to explain it (what could not be explained by “the agent did X because it wanted to”?).
        
        Let’s instead make an attempt to explain. A complete control of an agent’s own code, in the strict sense, is in contradiction of Gödel’s incompleteness theorem. Furthermore, information-theoretic considerations significantly limit the degree to which an agent can control its own code (I’m wondering if anyone has ever done the math. I expect not. I intend to look further into this). In information-theoretic terminology, the agent will be limited to typical manipulations of its own code, which will be a strict (and presumably very small) subset of all possible manipulations.
        
        Can an agent be made more effective than humans in manipulating its own code? I have very little doubt that it can. Can it lead to agents qualitatively more intelligent than humans? Again, I believe so. But I don’t see a reason to believe that the code-rewriting ability itself can be qualitatively different than a human’s, only quantitatively so (although of course the engineering details can be much different; I’m referring to the algorithmic level here).
        
        Generally GAIs are ascribed extreme powers around here
        
        As you’ve probably figured out, I’m new here. I encountered this post while reading the sequences. Although I’m somewhat learned on the subject, I haven’t yet reached the part (which I trust exists) where GAI is discussed here.
        
        On my path there, I’m actively trying to avoid a certain degree of group thinking which I detect in some of the comments here. Please take no offense, but it’s phrases like the above quote which worry me: is there really a consensus around here about such profound questions? Hopefully it’s only the terminology which is agreed upon, in which case I will learn it in time. But please, let’s make our terminology “pay rent”.
        CuSithBell 4 Jun 2012 14:49 UTC
        0 points
        Parent
        
        You are saying that a GAI being able to alter its own “code” on the actual code-level does not imply that it is able to alter in a deliberate and conscious fashion its “code” in the human sense you describe above?
        
        I am saying pretty much exactly that. To clarify further, the words “deliberate”, “conscious” and “wants” again belong to the level of emergent behavior: they can be used to describe the agent, not to explain it (what could not be explained by “the agent did X because it wanted to”?).
        
        Sure, but we could imagine an AI deciding something like “I do not want to enjoy frozen yogurt”, and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?
        
        Let’s instead make an attempt to explain. A complete control of an agent’s own code, in the strict sense, is in contradiction of Gödel’s incompleteness theorem. Furthermore, information-theoretic considerations significantly limit the degree to which an agent can control its own code (I’m wondering if anyone has ever done the math. I expect not. I intend to look further into this). In information-theoretic terminology, the agent will be limited to typical manipulations of its own code, which will be a strict (and presumably very small) subset of all possible manipulations.
        
        This seems trivially false—if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?
        
        Can an agent be made more effective than humans in manipulating its own code? I have very little doubt that it can. Can it lead to agents qualitatively more intelligent than humans? Again, I believe so. But I don’t see a reason to believe that the code-rewriting ability itself can be qualitatively different than a human’s, only quantitatively so (although of course the engineering details can be much different; I’m referring to the algorithmic level here).
        
        You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim—what are the salient features of human self-rewriting that you think must be preserved?
        
        Generally GAIs are ascribed extreme powers around here
        
        As you’ve probably figured out, I’m new here. I encountered this post while reading the sequences. Although I’m somewhat learned on the subject, I haven’t yet reached the part (which I trust exists) where GAI is discussed here.
        
        On my path there, I’m actively trying to avoid a certain degree of group thinking which I detect in some of the comments here. Please take no offense, but it’s phrases like the above quote which worry me: is there really a consensus around here about such profound questions? Hopefully it’s only the terminology which is agreed upon, in which case I will learn it in time. But please, let’s make our terminology “pay rent”.
        
        I don’t think it’s a “consensus” so much as an assumed consensus for the sake of argument. Some do believe that any hypothetical AI’s influence is practically unlimited, some agree to assume that because it’s not ruled out and is a worst-case scenario or an interesting case (see wedrifid’s comment on the grandparent (aside: not sure how unusual or nonobvious this is, but we often use familial relationships to describe the relative positions of comments, e.g. the comment I am responding to is the “parent” of this comment, the one you were responding to when you wrote it is the “grandparent”. I think that’s about as far as most users take the metaphor, though.)).
        royf 4 Jun 2012 23:27 UTC
        0 points
        Parent
        Thanks for challenging my position. This discussion is very stimulating for me!
        
        Sure, but we could imagine an AI deciding something like “I do not want to enjoy frozen yogurt”, and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?
        
        I’m actually having trouble imagining this without anthropomorphizing (or at least zoomorphizing) the agent. When is it appropriate to describe an artificial agent as enjoying something? Surely not when it secretes serotonin into its bloodstream and synapses?
        
        This seems trivially false—if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?
        
        It’s not a question of stopping it. Gödel is not giving it a stern look, saying: “you can’t alter your own code until you’ve done your homework”. It’s more that these considerations prevent the agent from being in a state where it will, in fact, alter its own code in certain ways. This claim can and should be proved mathematically, but I don’t have the resources to do that at the moment. In the meanwhile, I’d agree if you wanted to disagree.
        
        You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim—what are the salient features of human self-rewriting that you think must be preserved?
        
        I believe that this is likely, yes. The “salient feature” is being subject to the laws of nature, which in turn seem to be consistent with particular theories of logic and probability. The problem with such a claim is that these theories are still not fully understood.
        CuSithBell 5 Jun 2012 18:20 UTC
        0 points
        Parent
        
        Thanks for challenging my position. This discussion is very stimulating for me!
        
        It’s a pleasure!
        
        Sure, but we could imagine an AI deciding something like “I do not want to enjoy frozen yogurt”, and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?
        
        I’m actually having trouble imagining this without anthropomorphizing (or at least zoomorphizing) the agent. When is it appropriate to describe an artificial agent as enjoying something? Surely not when it secretes serotonin into its bloodstream and synapses?
        
        Yeah, that was sloppy of me. Leaving aside the question of when something is enjoying something, let’s take a more straightforward example: Suppose an AI were to design and implement more efficient algorithms for processing sensory stimuli? Or add a “face recognition” module when it determines that this would be useful for interacting with humans?
        
        This seems trivially false—if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?
        
        It’s not a question of stopping it. Gödel is not giving it a stern look, saying: “you can’t alter your own code until you’ve done your homework”. It’s more that these considerations prevent the agent from being in a state where it will, in fact, alter its own code in certain ways. This claim can and should be proved mathematically, but I don’t have the resources to do that at the moment. In the meanwhile, I’d agree if you wanted to disagree.
        
        Hm. It seems that you should be able to write a simple program that overwrites its own code with an arbitrary value. Wouldn’t that be a counterexample?
        
        You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim—what are the salient features of human self-rewriting that you think must be preserved?
        
        I believe that this is likely, yes. The “salient feature” is being subject to the laws of nature, which in turn seem to be consistent with particular theories of logic and probability. The problem with such a claim is that these theories are still not fully understood.
        
        This sounds unjustifiably broad. Certainly, human behavior is subject to these restrictions, but it is also subject to much more stringent ones—we are not able to do everything that is logically possible. Do we agree, then, that humans and artificial agents are both subject to laws forbidding logical contradictions and the like, but that artificial agents are not in principle necessarily bound by the same additional restrictions as humans?
        royf 5 Jun 2012 21:50 UTC
        0 points
        Parent
        
        Suppose an AI were to design and implement more efficient algorithms for processing sensory stimuli? Or add a “face recognition” module when it determines that this would be useful for interacting with humans?
        
        The ancient Greeks have developed methods of improved memorization. It has been shown that human-trained dogs and chimps are more capable of human-face recognition than others of their kind. None of them were artificial (discounting selective breeding in dogs and Greeks).
        
        It seems that you should be able to write a simple program that overwrites its own code with an arbitrary value. Wouldn’t that be a counterexample?
        
        Would you consider such a machine an artificial intelligent agent? Isn’t it just a glorified printing press?
        
        I’m not saying that some configurations of memory are physically impossible. I’m saying that intelligent agency entails typicality, and therefore, for any intelligent agent, there are some things it is extremely unlikely to do, to the point of practical impossibility.
        
        Do we agree, then, that humans and artificial agents are both subject to laws forbidding logical contradictions and the like, but that artificial agents are not in principle necessarily bound by the same additional restrictions as humans?
        
        I would actually argue the opposite.
        
        Are you familiar with the claim that people are getting less intelligent since modern technology allows less intelligent people and their children to survive? (I never saw this claim discussed seriously, so I don’t know how factual it is; but the logic of it is what I’m getting at.) The idea is that people today are less constrained in their required intelligence, and therefore the typical human is becoming less intelligent.
        
        Other claims are that activities such as browsing the internet and video gaming are changing the set of mental skills which humans are good at. We improve in tasks which we need to be good at, and give up skills which are less useful. You gave yet another example in your comment regarding face recognition.
        
        The elasticity of biological agents is (quantitatively) limited, and improvement by evolution takes time. This is where artificial agents step in. They can be better than humans, but the typical agent will only actually be better if it has to. Generally, more intelligent agents are those which are forced to comply to tighter constraints, not looser ones.
        Expand this thread
        Ronny Fernandez 5 Jun 2012 22:15 UTC
        2 points
        Parent
        
        The idea is that people today are less constrained in their required intelligence, and therefore the typical human is becoming less intelligent.
        
        That’s an empirical inquiry, which I’m sure has been answered within some acceptable error range (it’s interesting and easy-ish to test). If you’re going to use it as evidence for your conclusion, or part of your worldview, you should really be sure that it’s true, because using “logic” that leads to empirically falsifiable claims—is essentially never fruitful.
        
        Check out Stephen Pinker for a start.
        royf 6 Jun 2012 4:07 UTC
        0 points
        Parent
        
        If you’re going to use it as evidence for your conclusion, or part of your worldview, you should really be sure that it’s true
        
        (I never saw this claim discussed seriously, so I don’t know how factual it is; but the logic of it is what I’m getting at.)
        
        Was my disclaimer insufficient? I was using the unchecked claim to convey a piece of reasoning. The claim itself is unimportant in this context, only its reasoning that its conclusion should follow from its premise. Checking the truth of the conclusion may not be difficult, but the premise itself could be false, and I suspect that it is, and that it’s much harder to verify.
        
        And even the reasoning, which is essentially mathematically provable, I have repeatedly urged the skeptic reader to doubt until they see a proof.
        
        using “logic” that leads to empirically falsifiable claims—is essentially never fruitful.
        
        Did you mean false claims? I sure do hope that my logic (without quotes) implies empirically flasifiable (but unfalsified) claims.
        Ronny Fernandez 6 Jun 2012 17:34 UTC
        1 point
        Parent
        Any set of rules for determining validity, is useless, if even sound arguments have empirically false conclusions every now and again. So my point was, that if it is sound, but has a false conclusion, you should forget about the reasoning altogether.
        
        And yes, I did mean “empirically falsified.” My mistake.
        
        (edit):
        
        Actually, it’s not a sound or unsound, or valid or invalid argument. The argument points out some pressures that should make us expect that people are getting dumber, and ignores the presence of pressures which should make us expect that we’re getting smarter. Either way, if from your “premises” you can derive too much belief for certain false claims, either you are too confident in your premises, or your rules for deriving belief are crappy, i.e., far from approximating Bayesian updating.
        royf 6 Jun 2012 21:26 UTC
        0 points
        Parent
        
        if it [...] has a false conclusion, you should forget about the reasoning altogether
        
        That’s both obvious and irrelevant.
        
        [...] either you are too confident in your premises [...]
        
        the premise itself could be false, and I suspect that it is
        
        Are you even trying to have a discussion here? Or are you just stating obvious and irrelevant facts about rationality?
        Ronny Fernandez 7 Jun 2012 3:33 UTC
        0 points
        Parent
        Above you said that you weren’t sure if the conclusion of some argument you were using was true, don’t do that. That is all the advice I wanted to give.
        royf 7 Jun 2012 3:46 UTC
        0 points
        Parent
        I’ll try to remember that, if only for the reason that some people don’t seem to understand contexts in which the truth value of a statement is unimportant.
        Ronny Fernandez 7 Jun 2012 3:48 UTC
        0 points
        Parent
        
        if it [...] has a false conclusion, you should forget about the reasoning altogether
        
        and
        
        some people don’t seem to understand contexts in which the truth value of a statement is unimportant.
        
        You see no problem here?
        royf 7 Jun 2012 4:09 UTC
        0 points
        Parent
        Not at all. If you insist, let’s take it from the top:
        
        I wanted to convey my reasoning, let’s call it R.
        
        I quoted a claim of the form “because P is true, Q is true”, where R is essentially “if P then Q”. This was a rhetorical device, to help me convey what R is.
        
        I indicated clearly that I don’t know whether P or Q are true. Later I said that I suspect P is false.
        
        Note that my reasoning is, in principle, falsifiable: if P is true and Q is false, then R must be false.
        
        While Q may be relatively easy to check, I think P is not.
        
        I expect to have other means of proving R.
        
        I feel that I’m allowed to focus on conveying R first, and attempting to prove or falsify it at a later date. The need to clarify my ideas helped me understand them better, in preparation of future proof.
        
        I stated clearly and repeatedly that I’m just conveying an idea here, not providing evidence for it, and that I agree with readers who choose to doubt it until shown evidence.
        
        Do you still think I’m at fault here?
        
        EDIT: Your main objection to my presentation was that Q could be false. Would you like to revise that objection?
        Ronny Fernandez 7 Jun 2012 20:13 UTC
        0 points
        Parent
        I don’t want to revise my objection, because it’s not really a material implication that you’re using. You’re using probabilistic reasoning in your argument,i.e., pointing out certain pressures that exist, which rule out certain ways that people could be getting smarter, and therefor increases our probability that people are not getting smarter. But if people are in fact getting smarter, this reasoning is either too confident in the pressures, or is using far from bayesian updating.
        
        Either way, I feel like we took up too much space already. If you would like to continue, I would love to do so in a private message.
        CuSithBell 6 Jun 2012 17:45 UTC
        0 points
        Parent
        
        Suppose an AI were to design and implement more efficient algorithms for processing sensory stimuli? Or add a “face recognition” module when it determines that this would be useful for interacting with humans?
        
        The ancient Greeks have developed methods of improved memorization. It has been shown that human-trained dogs and chimps are more capable of human-face recognition than others of their kind. None of them were artificial (discounting selective breeding in dogs and Greeks).
        
        It seems that you should be able to write a simple program that overwrites its own code with an arbitrary value. Wouldn’t that be a counterexample?
        
        Would you consider such a machine an artificial intelligent agent? Isn’t it just a glorified printing press?
        
        I’m not saying that some configurations of memory are physically impossible. I’m saying that intelligent agency entails typicality, and therefore, for any intelligent agent, there are some things it is extremely unlikely to do, to the point of practical impossibility.
        
        Certainly that doesn’t count as an intelligent agent—but a GAI with that as its only goal, for example, why would that be impossible? An AI doesn’t need to value survival.
        
        I’d be interested in the conclusions derived about “typical” intelligences and the “forbidden actions”, but I don’t see how you have derived them.
        
        Do we agree, then, that humans and artificial agents are both subject to laws forbidding logical contradictions and the like, but that artificial agents are not in principle necessarily bound by the same additional restrictions as humans?
        
        I would actually argue the opposite.
        
        Are you familiar with the claim that people are getting less intelligent since modern technology allows less intelligent people and their children to survive? (I never saw this claim discussed seriously, so I don’t know how factual it is; but the logic of it is what I’m getting at.) The idea is that people today are less constrained in their required intelligence, and therefore the typical human is becoming less intelligent.
        
        Other claims are that activities such as browsing the internet and video gaming are changing the set of mental skills which humans are good at. We improve in tasks which we need to be good at, and give up skills which are less useful. You gave yet another example in your comment regarding face recognition.
        
        The elasticity of biological agents is (quantitatively) limited, and improvement by evolution takes time. This is where artificial agents step in. They can be better than humans, but the typical agent will only actually be better if it has to. Generally, more intelligent agents are those which are forced to comply to tighter constraints, not looser ones.
        
        I think we have our quantifiers mixed up? I’m saying an AI is not in principle bound by these restrictions—that is, it’s not true that all AIs must necessarily have the same restrictions on their behavior as a human. This seems fairly uncontroversial to me. I suppose the disconnect, then, is that you expect a GAI will be of a type bound by these same restrictions. But then I thought the restrictions you were talking about were “laws forbidding logical contradictions and the like”? I’m a little confused—could you clarify your position, please?
        royf 6 Jun 2012 22:19 UTC
        0 points
        Parent
        
        a GAI with [overwriting its own code with an arbitrary value] as its only goal, for example, why would that be impossible? An AI doesn’t need to value survival.
        
        A GAI with the utility of burning itself? I don’t think that’s viable, no.
        
        I’d be interested in the conclusions derived about “typical” intelligences and the “forbidden actions”, but I don’t see how you have derived them.
        
        At the moment it’s little more than professional intuition. We also lack some necessary shared terminology. Let’s leave it at that until and unless someone formalizes and proves it, and then hopefully blogs about it.
        
        could you clarify your position, please?
        
        I think I’m starting to see the disconnect, and we probably don’t really disagree.
        
        You said:
        
        This sounds unjustifiably broad
        
        My thinking is very broad but, from my perspective, not unjustifiably so. In my research I’m looking for mathematical formulations of intelligence in any form—biological or mechanical.
        
        Taking a narrower viewpoint, humans “in their current form” are subject to different laws of nature than those we expect machines to be subject to. The former use organic chemistry, the latter probably electronics. The former multiply by synthesizing enormous quantities of DNA molecules, the latter could multiply by configuring solid state devices.
        
        Do you count the more restrictive technology by which humans operate as a constraint which artificial agents may be free of?
        CuSithBell 8 Jun 2012 14:29 UTC
        0 points
        Parent
        
        a GAI with [overwriting its own code with an arbitrary value] as its only goal, for example, why would that be impossible? An AI doesn’t need to value survival.
        
        A GAI with the utility of burning itself? I don’t think that’s viable, no.
        
        What do you mean by “viable”? You think it is impossible due to Godelian concerns for there to be an intelligence that wishes to die?
        
        As a curiosity, this sort of intelligence came up in a discussion I was having on LW recently. Someone said “why would an AI try to maximize its original utility function, instead of switching to a different / easier function?”, to which I responded “why is that the precise level at which the AI would operate, rather than either actually maximizing its utility function or deciding to hell with the whole utility thing and valuing suicide rather than maximizing functions (because it’s easy)”.
        
        But anyway it can’t be that Godelian reasons prevent intelligences from wanting to burn themselves, because people have burned themselves.
        
        I’d be interested in the conclusions derived about “typical” intelligences and the “forbidden actions”, but I don’t see how you have derived them.
        
        At the moment it’s little more than professional intuition. We also lack some necessary shared terminology. Let’s leave it at that until and unless someone formalizes and proves it, and then hopefully blogs about it.
        
        Fair enough, though for what it’s worth I have a fair background in mathematics, theoretical CS, and the like.
        
        could you clarify your position, please?
        
        I think I’m starting to see the disconnect, and we probably don’t really disagree.
        
        You said:
        
        This sounds unjustifiably broad
        
        My thinking is very broad but, from my perspective, not unjustifiably so. In my research I’m looking for mathematical formulations of intelligence in any form—biological or mechanical.
        
        I meant that this was a broad definition of the qualitative restrictions to human self-modification, to the extent that it would be basically impossible for something to have qualitatively different restrictions.
        
        Taking a narrower viewpoint, humans “in their current form” are subject to different laws of nature than those we expect machines to be subject to. The former use organic chemistry, the latter probably electronics. The former multiply by synthesizing enormous quantities of DNA molecules, the latter could multiply by configuring solid state devices.
        
        Do you count the more restrictive technology by which humans operate as a constraint which artificial agents may be free of?
        
        Why not? Though of course it may turn out that AI is best programmed on something unlike our current computer technology.
        royf 11 Jun 2012 2:43 UTC
        0 points
        Parent
        
        A GAI with the utility of burning itself? I don’t think that’s viable, no.
        
        What do you mean by “viable”?
        
        Intelligence is expensive. More intelligence costs more to obtain and maintain. But the sentiment around here (and this time I agree) seems to be that intelligence “scales”, i.e. that it doesn’t suffer from diminishing returns in the “middle world” like most other things; hence the singularity.
        
        For that to be true, more intelligence also has to be more rewarding. But not just in the sense of asymptotically approaching optimality. As intelligence increases, it has to constantly find new “revenue streams” for its utility. It must not saturate its utility function, in fact its utility must be insatiable in the “middle world”. A good example is curiosity, which is probably why many biological agents are curious even when it serves no other purpose.
        
        Suicide is not such a utility function. We can increase the degree of intelligence an agent needs to have to successfully kill itself (for example, by keeping the gun away). But in the end, it’s “all or nothing”.
        
        But anyway it can’t be that Godelian reasons prevent intelligences from wanting to burn themselves, because people have burned themselves.
        
        Gödel’s theorem doesn’t prevent any specific thing. In this case I was referring to information-theoretic reasons. And indeed, suicide is not a typical human behavior, even without considering that some contributing factors are irrelevant for our discussion.
        
        Do you count the more restrictive technology by which humans operate as a constraint which artificial agents may be free of?
        
        Why not? Though of course it may turn out that AI is best programmed on something unlike our current computer technology.
        
        In that sense, I completely agree with you. I usually don’t like making the technology distinction, because I believe there’s more important stuff going on in higher levels of abstraction. But if that’s where you’re coming from then I guess we have resolved our differences :)
        Kindly 5 Jun 2012 1:58 UTC
        0 points
        Parent
        
        It’s not a question of stopping it. Gödel is not giving it a stern look, saying: “you can’t alter your own code until you’ve done your homework”. It’s more that these considerations prevent the agent from being in a state where it will, in fact, alter its own code in certain ways. This claim can and should be proved mathematically, but I don’t have the resources to do that at the moment. In the meanwhile, I’d agree if you wanted to disagree.
        
        I’d like to understand what you’re saying here better. An agent instantiated as a binary program can do any of the following:
        
        Rewrite its own source code with a random binary string.
        
        Do things until it encounters a different agent, obtain its source code, and replace its own source code with that.
        
        It seems to me that either of these would be enough to provide “complete control” over the agent’s source code in the sense that any possible program can be obtained as a result. So you must mean something different. What is it?
        royf 5 Jun 2012 2:19 UTC
        1 point
        Parent
        
        Rewrite its own source code with a random binary string
        
        This is in a sense the electronic equivalent of setting oneself on fire—replacing oneself with maximum entropy. An artificial agent is extremely unlikely to “survive” this operation.
        
        any possible program can be obtained as a result
        
        Any possible program could be obtained, and the huge number of possible programs should hint that most are extremely unlikely to be obtained.
        
        I assumed we were talking about an agent that is active and kicking, and with some non-negligible chance to keep surviving. Such an agent must have a strongly non-uniform distribution over its next internal state (code included). This means that only a tiny fraction of possible programs will have any significant probability of being obtained. I believe one can give a formula for (at least an upper bound on) the expected size of this fraction (actually, the expected log size), but I also believe nobody has ever done that, so you may doubt this particular point until I prove it.
        Expand this thread
        Kindly 5 Jun 2012 2:30 UTC
        0 points
        Parent
        I don’t think “surviving” is a well-defined term here. Every time you self-modify, you replace yourself with a different agent, so in that sense any agent that keeps surviving is one that does not self-modify.
        
        Obviously, we really think that sufficiently similar agents are basically the same agent. But “sufficiently similar” is vague. Can I write a program that begins by computing the cluster of all agents similar to it, and switches to the next one (lexicographically) every 24 hours? If so, then it would eventually take on all states that are still “the same agent”.
        
        The natural objection is that there is one part of the agent’s state that is inviolate in this example: the 24-hour rotation period (if it ever self-modified to get rid of the rotation, then it would get stuck in that state forever, without “dying” in an information theoretic sense). But I’m skeptical that this limitation can be encoded mathematically.
        Strange7 5 Jun 2012 3:20 UTC
        0 points
        Parent
        In addition to the rotation period, the “list of sufficiently similar agents” would become effectively non-modifiable in that case. If it ever recalculated the list, starting from a different baseline or with a different standard of ‘sufficiently similar,’ it would not be rotating, but rather on a random walk through a much larger cluster of potential agent-types.
        royf 5 Jun 2012 2:56 UTC
        0 points
        Parent
        
        I don’t think “surviving” is a well-defined term here. Every time you self-modify, you replace yourself with a different agent, so in that sense any agent that keeps surviving is one that does not self-modify.
        
        I placed “survive” in quotation marks to signal that I was aware of that, and that I meant “the other thing”. I didn’t realize that this was far from clear enough, sorry.
        
        For lack of better shared terminology, what I meant by “surviving” is continuing to be executable. Self modification is not suicide, you and I are doing it all the time.
        
        Can I write a program that begins by computing the cluster of all agents similar to it, and switches to the next one (lexicographically) every 24 hours?
        
        No, you cannot. This function is non-computable in the Turing sense.
        
        A computable limited version of it (whatever it is) could be possible. But this particular agent cannot modify itself “in any way it wants”, so it’s consistent with my proposition.
        
        The natural objection is that there is one part of the agent’s state that is inviolate in this example: the 24-hour rotation period
        
        This is a very weak limitation of the space of possible modifications. I meant a much stronger one.
        
        But I’m skeptical that this limitation can be encoded mathematically.
        
        This weak limitation is easy to formalize.
        
        The stronger limitation I’m thinking of is challenging to formalize, but I’m pretty confident that it can be done.
        Kindly 5 Jun 2012 3:22 UTC
        0 points
        Parent
        
        No, you cannot. This function is non-computable in the Turing sense.
        
        Aha! I think this is the important bit. I’ll have to think about this, but it’s probably what the problem is.
        TheOtherDave 5 Jun 2012 1:17 UTC
        0 points
        Parent
        When is it appropriate to describe a natural agent as enjoying something?
        royf 5 Jun 2012 1:47 UTC
        0 points
        Parent
        As I said, when it secretes serotonin into its bloodstream and synapses.
        Expand this thread
        TheOtherDave 5 Jun 2012 3:47 UTC
        0 points
        Parent
        You didn’t say; rather, you said (well, implied) that it wasn’t appropriate to describe an artificial agent as enjoying something in that case. But, OK, you’ve said now. Thanks for clarifying.
        wedrifid 5 Jun 2012 2:47 UTC
        0 points
        Parent
        
        As I said, when it secretes serotonin into its bloodstream and synapses.
        
        That strikes me as terrible definition of enjoyment—particularly because seratonin release isn’t nearly as indicative of enjoyment as popular culture would suggest. Even using dopamine would be better (but still not particularly good).
        royf 5 Jun 2012 3:09 UTC
        0 points
        Parent
        I wasn’t basing it on popular culture, but that doesn’t mean I’m not wrong.
        
        Do you have a better suggestion?
        
        If not, I’d ask CuSithBell to please clarify her (or his) ideas without using controversially defined terminology (which was also my sentiment before).
        wedrifid 5 Jun 2012 3:46 UTC
        0 points
        Parent
        
        I’d ask CuSithBell to please clarify his ideas
        
        My impression was ‘her’, not ‘his’.
        royf 5 Jun 2012 3:49 UTC
        0 points
        Parent
        That’s a big “ouch” on my part. Sorry. Lesson learned.
        wedrifid 4 Jun 2012 5:47 UTC
        0 points
        Parent
        
        Generally GAIs are ascribed extreme powers around here
        
        (Yes, and this is partly just because AIs that don’t meet a certain standard are implicitly excluded from the definition of the class being described. AIs below that critical threshold are considered boring and irrelevant for most purposes.)
        What links here?
        CuSithBell's comment on Fake Causality by Eliezer Yudkowsky (4 Jun 2012 14:49 UTC; 0 points)
        TheOtherDave 4 Jun 2012 13:27 UTC
        0 points
        Parent
        Indeed, the same typically goes for NIs. Though some speakers make exceptions for some speakers.
    - wedrifid 4 Jun 2012 5:09 UTC
      0 points
      Parent
      
      Having asserted that your claim is, in fact, new information
      
      I wouldn’t assert that. I thought I was stating the obvious.
      
      can you please clarify and explain why you believe that?
      
      See CuSithBell’s reply.
      - CuSithBell 4 Jun 2012 5:20 UTC
        1 point
        Parent
        
        Having asserted that your claim is, in fact, new information
        
        I wouldn’t assert that. I thought I was stating the obvious.
        
        Yes, I think I misspoke earlier, sorry. It was only “new information” in the sense that it wasn’t in that particular sentence of Eliezer’s—to anyone familiar with discussions of GAI, your assertion certainly should be obvious.
        wedrifid 4 Jun 2012 5:23 UTC
        0 points
        Parent
        Ahh. That’s where the “new information” thing came in to it. I didn’t think I’d said anything about new so I’d wondered.
  - CuSithBell 4 Jun 2012 4:04 UTC
    0 points
    Parent
    To be fair, when structured as
    
    Sadly, we humans can’t rewrite our own code, the way a properly designed AI could.
    
    then the claim is in fact “we humans can’t rewrite our own code (but a properly designed AI could)”. If you remove a comma:
    
    Sadly, we humans can’t rewrite our own code the way a properly designed AI could.
    
    only then is the sentence interpreted as you describe.
    - wedrifid 4 Jun 2012 4:14 UTC
      0 points
      Parent
      To be even more fair I also explicitly structured my own claim such that it still technically applies to your reading. That allowed me to make the claim both technically correct to a pedantic reading and an expression of the straightforward point that the difference is qualitative. (The obvious alternative response was to outright declare the comment a mere equivocation.)
      
      only then is the sentence interpreted as you describe.
      
      Meaning that I didn’t, in fact, describe.
      - CuSithBell 4 Jun 2012 4:26 UTC
        0 points
        Parent
        Not meant as an attack. I’m saying, “to be fair it didn’t actually say that in the original text, so this is new information, and the response is thus a reasonable one”. Your comment could easily be read as implying that this is not new information (and that the response is therefore mistaken), so I wanted to add a clarification.
Mati_Roy 21 Jan 2014 1:53 UTC
0 points
Another interesting (and sad) example: during the conversation between Deepak Chopra and Richard Dawkins here, Deepak Chopra used the words “quantum leap” as an “explanation” for the origin of language, the origin of life, jumps in the fissile record, etc.

Edit: Finally he claimed it was a metaphor.
Epictetus 2 Mar 2015 7:32 UTC
1 point
There’s a fair amount of hindsight bias going on with this critique of phlogiston. Phlogiston sounds plausible on the surface. It’s a reasonable conjecture to make given the knowledge at the time and certainly worth investigation. Is it really any less absurd to postulate some mystery substance in the air that essentially plays the same role? If they’d chosen the latter, we’d be lauding them for their foresight.

It’s perfectly feasible to draw up tables of the presumed phlogiston content of various materials and use this to deduce the mass of the residue after complete combustion. You could even predict how much of a material would burn if placed in a closed container. Phlogiston or oxygen, you get the same tables and the same empirical laws—before anyone discovered the latter. It’s a lot easier to come up with empirical laws than it is to explain the underlying mechanism.

Is anyone satisfied with a physical theory content to give empirical laws and which makes no comment on underlying mechanisms? Gravity was used because it worked, but I doubt anyone was comfortable with the action at a distance. The Copenhagen Interpretation is an active refusal to engage in any kind of speculation as to why quantum mechanics “really” works, and routinely gets slammed on this site.
- CellBioGuy 24 Jul 2015 22:43 UTC
  0 points
  Parent
  Indeed it works great for any substance burning that produces gaseous oxides (CO2, water, etc). It broke down when people noted that burning metals produced solid oxides that weighed more, and the excess mass came from the air. Thus, it was revealed that the sign was wrong.
paul ince 31 Jan 2019 2:41 UTC
0 points
Does Phlogiston make the fire hot the same way CO2 makes the climate change?
- [Error communicating with LW2 server] 1 Jan 2020 16:35 UTC
  3 points
  Parent
  CO2 as a cause of climate change “pays rent in anticipation”. Phlogiston as a cause of fire doesn’t.
Mark Neyer 14 May 2020 16:15 UTC
9 points
The phlogiston theory gets a bad rap. I 100% agree with the idea that theories need to make constraints on our anticipations, but i think you’re taking for granted all the constraints phlogiston makes.
The phlogiston theory is basically a baby step towards empiricism and materialism. Is it possible that our modern perspective causes us to take these things for granted to the point that the steps phlogiston ads aren’t noticed? In another essay you talk about walking through the history of science, trying to imagine being in the perspective of someone taken in by a new theory, and i found that practice particularly instructive here. I came up with a number of ways in which this theory DOES constrain anticipation. Seeing these predictions may make it easier to help raise new predictions for existing theories, as well as suggest that theories don’t need to be rigorous and mathematical in order to constrain the space of anticipations.
The phlogiston theory says “there is no magic here, fire is caused by some physical property of the substances involved in it”. By modern standards this does nothing to constrain anticipation further, but from a space of total ignorance about what fire is and how it works, the phlogiston theory rules out such things as:
- performing the correct incantation can make the difference between something catching and not catching fire
- If some elements catch fire in one location, changing the location of those elements, or the time of day, or the time of year that the experiment is performed, shouldn’t make it easier or harder to start a fire. The material conditions are the only element that matters when determining whether something will catch fire.
- If a jar placed over the candle caused the candle to go out, because the air is ‘saturated with phlogiston’, then placing a new candle under the same jar should result in the new candle also going out . As long as the air under the jar hasn’t been swapped out, if it was ‘saturated with phlogiston’ before we changed the candle, it should remain ‘saturated with phlogiston’ after the candle.
The last example is particularly instructive, because the phrase “saturated with phlogiston” is correct as long as we interpret it to mean “no longer containing sufficient oxygen.” That is a correct prediction based on the same mechanism as our current (extremely predictive) understanding of what makes fires go out. It’s that the phlogiston model just got the language upside down and backwards, and mistakes the absence of fuel for the presence of something that inhibits the reaction. They did call oxygen “dephlogisticated air”, and so again, the theory says “this stuff is flammable, wherever it goes, whatever the time of day, or whatever incantation or prayer you say over it”—which is correct, but so obviously true that we perhaps aren’t seeing it as constraining anticipation.
From my understanding of the history of science, it’s possible that the phlogiston theory constrained the hypothesis space enough to get people to search for strictly material-based explanations of phenomena like fire. In this sense, a belief that “there is a truth, and our models can come closer to it over time” also constrains anticipation, because it says what you won’t experience: a search for truth that involves gathering evidence over time, and refining models, which never get better at predicting experience.
Is a model still useful if it only constrains the space of hypotheses that are likely to pan out with predictive models, rather than constraining the space of empirical observations?