Air Conditioner Test Results & Discussion

Background is in the preregistration post. This post will assume you’ve read that one.

First, the headline result: my main prediction was wrong. Paul’s predictions were also wrong. By the preregistered metric, the second hose improves performance by much less than either of us expected. That said, it looks like the experiment’s main metric mostly did not measure the thing it was intended to measure, which is why we were both so far off.

I did collect a bunch of data, which allows us to estimate the air conditioner’s performance in other ways. That analysis was not pre-registered and you should therefore be suspicious of it, but I nonetheless believe that it gives a more accurate view of the air conditioner’s performance. Main takeaway of that analysis is that the air conditioner performs about twice as well with two hoses as with one. Going by that analysis, both Paul’s formula and my prediction were correct.

Experiment Setup

Here’s the air conditioner with the cardboard “second hose” attached:

The t-shirts are to cover gaps. It doesn’t actually need to be airtight.

I hung thermometers at each corner of the room, in the middle of each wall, and on the ceiling in the center of the room. I also placed one thermometer in the inlet (i.e. the hole in the window covering through which the cardboard hose draws air), the outlet (i.e. the hole in the window covering through which the other hose blows air), and outside on the balcony.

I ran a few different tests:

  • Air conditioner with and without the cardboard intake hose, on low fan

  • Air conditioner with and without the cardboard intake hose, on high fan

  • Air conditioner off (control)

For both the 1-hose and control tests, I left the inlet hole in the window covering open. (This was mainly to reduce infiltration from places other than outside.)

Assorted Notes:

  • I did try the experiment previously (about a month ago), and ran into issues which resulted in some minor changes to the experiment setup; info about that is here.

  • None of the experiment setup, thermometers, or the room were in direct sun.

  • The time for each test was mostly determined by when I had meetings scheduled, and when the temperatures seemed to stop changing.


Data is here. Some numbers:

  • Outdoor temperature was 85-88°F (29.4 − 31.1°C) for most of the testing, though it dropped to 82°F (27.8°C) in the evening during the control test

  • Temperature difference between outdoor and indoor (higher is better), in each test:

    • Low fan: 20.6°F (11.4°C) with one hose, 22.7°F (12.6°C) with two hoses

    • High fan: 18.7°F (10.4°C) with one hose, 22.2°F (12.3°C) with two hoses

    • Control: 13.1°F (7.3°C)

  • Temperature variance across the room was fairly high (~2.5 − 3.0°F, or ~1.4 − 1.7°C), and consistent (i.e. measurements 30 minutes apart had similar relative temperature patterns)

  • Exhaust temperature was 98 − 100°F (36.7 − 37.8°C) with one hose, 112 − 119°F (44.4 − 48.3°C) with two


My main prediction was that the outdoor-indoor temperature difference would be at least 50% greater with two hoses than with one hose, with my median estimate around 100% (i.e. a factor-of-two difference). That was definitely wrong: the actual number was 10% with fan on low, 19% with fan on high.

Paul’s prediction for the same number was 33-43%, though that was based on some very rough guesses about indoor, outdoor and exhaust temperatures. Using Paul’s formula with the actual indoor, outdoor and exhaust temperatures from the tests gives predictions anywhere from 75% to 180%, consistent with my own factor-of-two median guess. (The difference is mainly because the exhaust temperature was considerably lower than Paul had speculated. It really is a very shitty air conditioner.)

So experimentally, the difference came out way lower than anybody guessed. Why?

The result from the control test basically tells us the answer. With the air conditioner off, the room did not return to anywhere near outdoor temperature over the course of an hour. That implies some combination of:

  • Very slow equilibration, such that the other test results were probably also not near steady-state.

  • Indoor temperatures driven more by (cooler) temperatures in neighboring rooms, rather than outdoor temperature.

The high and consistent-over-time variance of temperatures in different locations within the room also points toward some combination of these two issues. I would guess the second issue is more important than the first, though I’m not confident in that.

One relatively simple way to correct for the problem: use the temperature from the control test as the baseline temperature, rather than using the outdoor temperature as baseline. If we do that, then with the fan on high the results are:

  • AC cools the room by 2.6°F (1.4°C) relative to control with one hose

  • AC cools the room by 5.1°F (2.8°C) relative to control with two hoses (despite the outdoor temperature being slightly higher during the two-hose test)

Two hose performs better by about a factor of two, consistent with both Paul’s formula (using the real indoor/​outdoor/​exhaust temperatures) and my guess.

My Updates

On the Experiment

First and foremost: I made a confident wrong prediction, so there better be some substantial update from that. My main update from that is to put even more weight on “the specific metric you choose will inevitably fail to measure the thing you thought it would measure, especially on your first try”. That’s something I already knew on some level, I already considered it the main bottleneck to making prediction markets actually useful in practice, and in hindsight it’s embarrassing that I didn’t put more weight on it when making the air conditioner prediction.

Second: I’ve basically thrown out my pre-registration and done a bunch of analysis which disagrees with the preregistered analysis. I do think that was the right choice for maximal epistemic accuracy (and in fact is very often the right choice for maximal epistemic accuracy, because the specific metric you choose will inevitably fail to measure the thing you thought it would measure, especially on your first try). But it’s important to increment the counter in the back of one’s head when doing that, and occasionally go re-examine to see if one is systematically steering away from some undesired conclusion. Counter incremented.

Third: shortly after I put up the preregistration post, both Paul and ADifferentAnonymous left comments explaining where Paul’s formula came from, and I updated a lot toward that formula being a good model based on the thermodynamic argument they gave. (The main thing the formula leaves out is extra waste heat generated with one hose vs two, and I still had some uncertainty about that.) After seeing the (admittedly very rough) agreement between the formula and the performance-relative-to-control, I currently think the formula is basically correct.

Fourth: both the temperature-change-relative-to-control and the rough thermodynamic calculation (i.e. Paul’s formula) with real exhaust temperature point to two hose performing about twice as well as one. I think that’s probably about right (modulo some large error bars, of course, since none of this was super precise).

On One vs Two Hose

Now for the real claim of interest: that single hose air conditioners are “stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems”.

That claim boils down to a cost-benefit analysis, and this whole experiment has been about the benefit. What about the cost side? Air conditioner hoses similar to the one my unit uses cost about $20 on amazon. Presumably the actual cost-to-the-manufacturer is lower, especially if they’re shipping in a box with the rest of the air conditioner. So, the second hose adds at most $20 to a $300-500 air conditioner. It also adds a little bit more annoying fiddliness when setting up the AC.

That cost sounds like it is very obviously worth paying for a factor-of-two performance improvement. It would be very obviously worth paying for the second hose even if my estimates of performance improvement are quite far off; the performance improvement would have to be well below 30% before I’d even start to consider a second hose not-obviously-worthwhile. The marginal cost is just so small.

So, yeah, I do think that single hose air conditioners are stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.

On Civilizational Adequacy

The air conditioner was intended as an example in which a product is shitty in ways the large majority of consumers don’t notice, and therefore market pressures don’t fix it. Two further implications of our ability to actually find such an example:

  • It can’t be that rare for products to be shitty in ways the large majority of consumers don’t notice, otherwise we wouldn’t have found one.

  • If there’s products where it takes an unusually-good-relative-to-the-populace understanding of technical topics to recognize major problems, then there’s probably products which have major problems nobody recognizes, because nobody yet knows the right technical topics well enough. Again, such cases probably aren’t that rare.

I still think that such cases are not only “not rare”, but common. I still expect major problems which nobody is able to recognize, due to an insufficient understanding of the right technical topics, to be the main source of AI X-risk. And I still expect that opportunities to iterate will mostly not help us to directly fix such problems, for the same reason that markets don’t fix single hose air conditioners: people have to notice the problem in order for the feedback loop to fix it.

(Also, of course, the Department of Energy coming up with an utterly bullshit energy rating which makes single hose air conditioners look much less bad than they are is a metaphor for everything, and is very much the sort of thing I expect to generalize.)

… though at the same time, a counter has incremented in the back of my head, and I do have a slight concern that I’m avoiding evidence against the “people don’t notice major problems” model. I don’t actually think I’m updating incorrectly, at this point, but it’s a possibility which has risen to my conscious attention and I’m keeping an eye on it.