Preregistration: Air Conditioner Test

Warning: None of the participants in the Great Air Conditioner Debate Of 2022 have endorsed my summaries of their positions in this post. Including me.

Background

In Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon, I complained about the top-rated new air conditioner on Amazon. I claimed that it’s a straightforward example of a product with a major problem, but a major problem which most people will not notice, and which therefore never gets fixed. Specifically: although it does cool the room, the air conditioner also pulls hot air from outside into the house. People do notice the cool air blowing from the air conditioner, but don’t think to blame the air conditioner for hot air drawn into the house elsewhere. Simply adding a second hose would fix the problem at relatively low extra cost, and dramatically improve the effectiveness of the air conditioner. But companies don’t actually do that because (apparently) people mostly don’t notice the problem.

To my surprise, multiple commenters disagreed with my interpretation of the air conditioner example. They argue that in fact one-hose air conditioners work fine. Sure, single-hose air conditioners are less-than-ideally efficient compared to two-hose, but it’s not a very big difference in practice. CEER efficiency ratings account for the problems, and the efficiency difference is typically only about 20-30%. Also, The Wirecutter tested lots of portable air conditioners and found that there wasn’t much difference between one-hose and two-hose designs. (Credit to Paul for both those pieces of evidence.) Really, what this example illustrates is that simple models and clever arguments are not actually very reliable at predicting how things work in practice. One should instead put more trust in experiment and reported experiences, including all those 5-star ratings on Amazon.

I, on the other hand, think the “second hose doesn’t help much” claim is a load of baloney. I think it is far more probable that CEER ratings are bullshit and The Wirecutter messed up their test, than that a second hose makes only a small difference.

And so began The Great Air Conditioner Debate Of 2022.

… Why Do Air Conditioners Need Hoses?

Ideally, an air conditioner should work much like a fridge: it pumps heat from air inside to air outside. Inside and outside air do not touch or mix; only heat flows from one to the other.

A portable air conditioner sits inside the house. So, in order to pump heat to the outside air (without letting it mix with inside air) it needs two hoses. One hose runs from a window to the air conditioner, and sucks in outside air. The other runs from the air conditioner back to the window, and blows the outside air back out. Inside air comes in and out through vents in the air conditioner, and the unit pumps heat from the inside air to the outside air, keeping the two separate throughout the process.

A single-hose air conditioner doesn’t do that. A single hose air conditioner sucks in indoor air, splits it into two streams, and pumps heat from one stream to the other. The hotter stream blows out the window (via the one hose); the cooler stream blows back into the room.

The problem with a single-hose design is that it blows air from inside to outside; it removes air from the room. That lowers the pressure in the room slightly, so new air is pulled back in via whatever openings the house has. That air comes from outside, so presumably it’s warm—and it’s replacing formerly-cool indoor air. (Technical term for this problem: “infiltration”.)

Oversimplified Summary Of The Debate

I’m not even going to try to do justice here, just give what I currently think are the key points, in roughly chronological order:

  • A couple people thought I was claiming that single-hose air conditioners do not cool a room at all. (I did not intend to claim that, though in hindsight I could see how the original post was unclear, so I added a clarification.) Paul, Shminux and jbash all correctly explained why single-hose air conditioners can cool a room: the exhaust is much hotter than outdoor air, so heat is removed on net even with some warm outdoor air coming back in.

  • Paul claimed that two-hose only improves efficiency by about 25-30%. He cited CEER ratings and The Wirecutter’s air conditioner tests (and provided some very helpful links). In particular, The Wirecutter did a direct comparison between the same air conditioner in one-hose and two-hose mode, and found little difference.

  • I called bullshit on The Wirecutter, and went looking for how their tests screwed up so badly. Paul showed that my first guess was completely wrong (after I was pretty sure it was right) and I lost some Bayes points (and actually came close to being convinced). But it eventually turned out that The Wirecutter was only measuring temperature within 6 feet of the unit.

  • Habryka looked into things, concluded that “The testing procedure of Wirecutter does not seem to address infiltration in any way”, but also that “Overall efficiency loss from going from dual to single is something like 20-30%”.

  • I also called bullshit on the “two-hose is only about 20-30% better” claim, and sketched out a test and bet. (More on that later in this post.) Ben also offered to run a test; I’m not sure whether he still intends to do so.

  • Since the 20-30% claim came largely from CEER ratings, I looked into the CEER test setup. Turns out that they use a weighted average of test conditions, with 80% of the weight on conditions where the difference between outdoor vs indoor temperature is only 3°F/​1.6°C.

That last link includes some other important information too: one estimate that 20-30% lower CEER ratings imply one-hose air conditioners have roughly 0% efficiency under a 15°F/​8.3°C temperature delta, as well as some quotes from discussion on the Department of Energy’s CEER rulemaking process suggesting that air conditioner manufacturers themselves thought single-hose units might not be viable in the marketplace at all if infiltration were fully included in ratings.

Paul and I each suggested a quantitative toy model during the discussion as well. Those models are in the appendix, for those interested.

Why Is This Interesting?

One thing to keep in mind throughout all of this: the actual claim of interest is that single-hose air conditioners are

stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.

Why are alignment researchers debating this claim?

There’s a general model/​worldview that the world is filled with problems which are not fixed because most people do not notice them. (This is a particular form of “civilizational inadequacy”.) This includes problems which are bad enough that people would have a strong preference to fix the problem if they did notice it; we’re not just talking about small problems here. That worldview informs AI strategy: if we expect that ultimately-fatal problems with AI will not be fixed because most people do not notice them, then we’re generally more pessimistic about things working out all right “by default” and more reliant on doing things ourselves. Also, it means that we ourselves could easily miss the key problems, so we need to invest heavily in deep understanding, and in the kinds of models which tell us which questions to ask, and in techniques for noticing when our models are missing key pieces.

On the other hand, if we expect that major problems are usually noticed and fixed “by default”, and that AI will also work this way, that suggests very different strategies. We can rely more on marginal progress, making problems marginally more visible, helping existing institutions deal with problems marginally better, etc. We also don’t have to worry as much that we ourselves will miss the key problems for lack of understanding.

In general, alignment has terrible feedback loops: we can’t just build an AGI and test it. In this case, we can’t just have a team build an AGI and see whether any problems come up which the team missed. So if we want to test these two models/​worldviews, then we need to get our bits from somewhere else. Fortunately, the real world is absolutely packed with bits of evidence; these worldviews make predictions in lots of different places, so there’s lots of opportunities to compare them.

In this case, the air conditioner example was cherry-picked, so even if my claims turn out to be correct it’s not very strong evidence for the civilizational inadequacy worldview in general. But if even my cherry-picked example is wrong, then that is a nontrivial chunk of evidence against the inadequacy worldview. I myself was almost convinced at one point during the debate, and started to think about how I’d have to adjust my priors on AI strategy in general (mostly it would have meant spending more effort researching questions which I had previously considered settled or irrelevant).

Also, we can update on the kinds of evidence and reasoning on display during the debate. For instance, many people took CEER ratings as strong evidence. If those indeed turn out to be bullshit, then it should produce a correspondingly strong update against trusting that kind of evidence in the future. Same with The Wirecutter’s tests.

Test Plan

I myself bought this single-hose portable air conditioner back in 2019 (for an apartment in Mountain View). My plan is to rig up a cardboard “second hose” for it, and try it out in my apartment both with and without the second hose next time we have a decently-hot day.

Particulars:

  • I plan to perform the test in a roughly 250 sq ft, roughly square bedroom. (Estimated dimensions; I haven’t measured it carefully.) It will not be in direct sun. The door will be closed.

  • I plan to open the outside door and window in the apartment’s main room and air it out beforehand, so any infiltration should be outdoor-temperature air. (So e.g. if the neighbor is running an AC, I shouldn’t suck in their cool air.)

  • I plan to wait for a day when it’s at least 80°F/​26.7°C outside. AC will be set to its minimum temperature (60°F/​15.6°C).

  • I am assuming that the AC runs continuously (as opposed to getting the room down to target temperature easily, at which point it will shut off until the temperature goes back up). If that’s not the case, I will consider the test invalid, and retry on a hotter day.

  • I plan to run the AC until the temperature distribution in the room equilibrates (i.e. my temperature measurements stop noticeably trending), then measure temperatures. I expect that will take under an hour, but I’m not sure.

If there are other particulars of the experiment which people think will be relevant, leave a comment and I’ll declare how I plan to control the variables in question.

Predictions

The main experimental endpoint I plan to test is temperature, not efficiency. Specifically, once the temperature equilibrates, I plan to check air temperature at nine points around the room (4 corners, midpoint of each wall, and center) at roughly head height, average them, and also check temperature outside around the same time. The main outcome of interest will be the difference in temperature between inside and outside (“equilibrium temperature delta”). Two reasons for testing equilibrium temperature delta rather than efficiency:

  • Equilibrium indoor temperature was the main thing I cared about when using this air conditioner; electricity is relatively cheap.

  • I don’t have the equipment on hand to easily measure power consumption.

Main prediction: equilibrium temperature delta in two-hose mode will be at least 50% greater than in one-hose mode. Example: suppose it’s 80°F/​26.7°C outside. In one-hose mode, the average equilibrium temperature in the room is 75°F/​23.9°C (temperature delta = 5°F/​2.8°C). Then I expect the average equilibrium temperature in two-hose mode to be below 72.5°F/​22.5°C (temperature delta > 7.5°F/​4.2°C).

Confidence: originally I put 80% on this. After finding the problem with CEER ratings, I think I’m up to more like 90%. My median expectation is that equilibrium temperature delta in two-hose mode will be ~double the equilibrium temperature delta in one-hose mode.

Paul disagrees with this, and expects the two-hose temperature delta to be more like 20% greater than the one-hose delta (roughly proportional to the efficiency difference he expects). [EDIT: Paul clarified that he expects a 25-30% efficiency difference, which he expects to translate into a 33-43% difference in temperature delta. He also listed a few conditions under which that prediction would change. 33-43% is pretty close to my 50% cutoff, though my median expectation is much bigger, so we do still have a substantive disagreement to test.]

Prediction Market & Bets

There’s a Manifold prediction market for the experiment here. If you want everyone to see your probability on LessWrong, you can also use this prediction widget:

10%20%30%40%50%60%70%80%90%
Will John's air conditioner test find at least 50% better temperature delta with two hoses compared to one hose?

If anybody wants to make real-money bets, feel free to use the comment section on this post.

Appendix: Toy Models

In the course of the discussion, two simple models came up.

One model which I introduced, for equilibrium temperature: model the single-hose air conditioner as removing air from the room, and replacing with a mix of air at two temperatures: (the temperature of cold air coming from the air conditioner), and (the temperature outdoors). If we assume that is constant and that the cold and hot air are introduced in roughly 1:1 proportions (i.e. the flow rate from the exhaust is roughly equal to the flow rate from the cooling outlet), then we should end up with an equilibrium average temperature of . If we model the switch to two-hose as just turning off the stream of hot air, then the equilibrium average temperature should drop to . So, the two-hose system has double the equilibrium temperature delta of the one-hose system.

Note that the 1:1 flow rate assumption does a lot of work here, but I think it’s on the right order of magnitude based on seeing my single-hose air conditioner in action; if anything the exhaust blows more. The constant cold-temperature is more suspect.

Paul instead talked about efficiency, and claimed that

… the efficiency lost is roughly (outside temp—inside temp) /​ (exhaust temp—inside temp). And my guess was that exhaust temp is ~130.

I’m not sure how that formula was derived, but here’s my best guess. [EDIT: Paul summarizes his actual argument in this comment, and it makes much more sense than my guess below. Leaving the guess here for legibility, but it’s definitely not the calculation Paul did.]

In general, for an efficient air conditioner, the “efficiency” is , where:

  • is hot temperature, is cold temperature

  • is work required per unit of heat pumped

For the two-hose setup, there’s no downside to blowing lots and lots of outdoor air through the system, so the “hot” side of the heat pump can be kept at outdoor temperature. So, = (outside temp—inside temp)/​(inside temp). But in the one-hose setup, the exhaust flow rate needs to be kept low to minimize infiltration losses, resulting in a higher exhaust temp. So, = (exhaust temp—inside temp)/​(inside temp). Combine those two, and we get the ratio of work required to pump the same amount of heat in one-hose vs two-hose mode:

= (outside temp—inside temp)/​(exhaust temp—inside temp)

… i.e. Paul’s formula. With outside temp 90°F/​32.2°C, inside temp 80°F/​26.7°C, and exhaust temp 130°F/​54.4°C, this ratio would be around 20%.

… but that’s not a formula for efficiency lost. That formula is saying that two-hose takes only 20% as much energy as single-hose to pump the same amount of heat. The efficiency loss would be one minus that, i.e. around 80%. So my current best guess is that Paul found this formula, but accidentally used one minus efficiency loss rather than efficiency loss, and it just happened to match the 20-30% number he expected so he didn’t notice the error.

How realistic are the assumptions for this model? I think the two main problems are:

  • It’s not accounting for heat lost to infiltration; instead it’s assuming that exhaust flow is low enough to make infiltration loss small. I don’t think that’s realistic. Once we include infiltration losses, single-hose will need to pump more heat than two-hose in order to maintain the same temperature, whereas the ratio above is work required to pump the same amount of heat.

  • I doubt that exhaust in two-hose mode will be close to (especially for my experiment with a second hose attached to an air conditioner designed to operate with one hose, but also for ordinary two-hose air conditioners).

These two issues would push the error in opposite directions, though, so it’s not clear whether the 80% efficiency loss estimate is too high or too low.