anonymousaisafety

Karma: 403
• Is the concept of “murphyjitsu” supposed to be different than the common exercise known as a premortem in traditional project management? Or is this just the same idea, but rediscovered under a different name, exactly like how what this community calls a “double crux” is just the evaporating cloud, which was first described in the 90s.

If you’ve heard of a postmortem or possibly even a retrospective, then it’s easy to guess what a premortem is. I cannot say the same for “murphyjitsu”.

I see that premortem is even referenced in the “further resources” section, so I’m confused why you’d describe it under a different name that cannot be researched easily outside of this site, where there is tons of literature and examples of how to do premortems correctly.

• The core problem remains computational complexity.

Statements like “does this image look reasonable” or saying “you pay attention to regularities in the data”, or “find the resolution by searching all possible resolutions” are all hiding high computational costs behind short English descriptions.

Let’s consider the case of a 1280x720 pixel image.
That’s the same as 921600 pixels.

How many bytes is that?

It depends. How many bytes per pixel?[1] In my post, I explained there could be 1-byte-per-pixel grayscale, or perhaps 3-bytes-per-pixel RGB using [0, 255] values for each color channel, or maybe 6-bytes-per-pixel with [0, 65535] values for each color channel, or maybe something like 4-bytes-per-pixel because we have 1-byte RGB channels and a 1-byte alpha channel.

Let’s assume that a reasonable cutoff for how many bytes per pixel an encoding could be using is say 8 bytes per pixel, or a hypothetical 64-bit color depth.

How many ways can we divide this between channels?

If we assume 3 channels, it’s 1953.
If we assume 4 channels, it’s 39711.
Also if it turns out to be 5 channels, it’s 595665.

This is a pretty fast growing function. The following is a plot.

Note that the red line is `O(2^N)` and the black line barely visible at the bottom is `O(N^2)`. `N^2` is a notorious runtime complexity because it’s right on the threshold of what is generally unacceptable performance.[2]

Let’s hope that this file isn’t actually a frame buffer from a graphics card with 32 bits per channel or a 128 bit per pixel /​ 16 byte per pixel.

Unfortunately, we still need to repeat this calculation for all of the possibilities for how many bits per pixel this image could be. We need to add in the possibility that it is 63 bits per pixel, or 62 bits per pixel, or 61 bits per pixel.

In case anyone wants to claim this is unreasonable, it’s not impossible to have image formats that have RGBA data, but only 1 bit associated with the alpha data for each pixel. [3]

And for each of these scenarios, we need to question how many channels of color data there are.

• 1? Grayscale.

• 2? Grayscale, with an alpha channel maybe?

• 3? RGB, probably, or something like HSV.

• 4? RGBA, or maybe it’s the RGBG layout I described for a RAW encoding of a Bayer filter, or maybe it’s CMYK for printing.

• 5? This is getting weird, but it’s not impossible. We could be encoding additional metadata into each pixel, e.g. distance from the camera.

• 6? Actually, this question how how many channels there are is very important, given the fast growing function above.

• 7? This one question, if we don’t know the right answer, is sufficient to make this algorithm pretty much impossible to run.

• 8? When we say we can try all of options, that’s not actually possible.

• 9? What I think people mean is that we can use heuristics to pick the likely options first and try them, and then fall back to more esoteric options if the initial results don’t make sense.

• 10? That’s the difference between average run-time and worst case run-time.

• 11? The point that I am trying to make is that the worst case run-time for decoding an arbitrary binary file is pretty much unbounded, because there’s a ridiculous amount of choice possible.

• 12? Some examples of “image” formats that have large numbers of channels per “pixel” are things like RADAR /​ LIDAR sensors, e.g. it’s possible to have 5 channels per pixel for defining 3D coordinates (relative to the sensor), range, and intensity.

You actually ran into this problem yourself.

Similarly (though you’d likely do this first), you can tell the difference between RGB and RGBA. If you have (255, 0, 0, 255, 0, 0, 255, 0, 0, 255, 0, 0), this is probably 4 red pixels in RGB, and not a fully opaque red pixel, followed by a fully transparent green pixel, followed by a fully transparent blue pixel in RGBA. It could be 2 pixels that are mostly red and slightly green in 16 bit RGB, though. Not sure how you could piece that out.

Summing up all of the possibilities above is left as an exercise for the reader, and we’ll call that sum `K`.

Without loss of generality, let’s say our image was encoded as 3 bytes per pixel divided between 3 RGB color channels of 1 byte each.

Our 1280x720 image is actually 2764800 bytes as a binary file.

But since we’re decoding it from the other side, and we don’t know it’s 1280x720, when we’re staring at this pile of 2764800 bytes, we need to first assume how many bytes per pixel it is, so that we can divide the total bytes by the bytes per pixel to calculate the number of pixels.

Then, we need to test each possible resolutions as you’ve suggested.

The number of possible resolutions is the same as the number of divisors of the number of pixels. The equation for providing an upper bound is `exp(log(N)/log(log(N)))`[4], but the average number of divisors is approximately `log(N)`.

Oops, no it isn’t!

Files have headers! How large is the header? For a bitmap, it’s anywhere between 26 and 138 bytes. The JPEG header is at least 2 bytes. PNG uses 8 bytes. GIF uses at least 14 bytes.

Now we need to make the following choices:

1. Guess at how many bytes per pixel the data is.

2. Guess at the length of the header. (maybe it’s 0, there is no header!)

3. Calculate the factorization of the remaining bytes N for the different possible resolutions.

4. Hope that there isn’t a footer, checksum, or any type of other metadata hanging out in the sea of bytes. This is common too!

Once we’ve made our choices above, then we multiply that by `log(N)` for the number of resolutions to test, and then we’ll apply the suggested metric. Remember that when considering the different pixel formats and ways the color channel data could be represented, the number was `K`, and that’s what we’re multiplying by `log(N)`.

In most non-random images, pixels near to each other are similar. In an MxN image, the pixel below is a[i+M], whereas in an NxM image, it’s a[i+N]. If, across the whole image, the difference between a[i+M] is less than the difference between a[i+N], it’s more likely an MxN image. I expect you could find the resolution by searching all possible resolutions from 1x<length> to <length>x1, and finding which minimizes average distance of “adjacent” pixels.

What you’re describing here is actually similar to a common metric used in algorithms for automatically focusing cameras by calculating the contrast of an image, except for focusing you want to maximize contrast instead of minimize it.

The interesting problem with this metric is that it’s basically a one-way function. For a given image, you can compute this metric. However, minimizing this metric is not the same as knowing that you’ve decoded the image correctly. It says you’ve found a decoding, which did minimize the metric. It does not mean that is the correct decoding.

A trivial proof:

1. Consider an image and the reversal of that image along the horizontal axis.

2. These have the same metric.

3. So the same metric can yield two different images.

A slightly less trivial proof:

1. For a given “image” of `N` bytes of image data, there are `2^(N*8)` possible bit patterns.

2. Assuming the metric is calculated as an 8-byte IEEE 754 double, there are only `2^(8*8)` possible bit patterns.

3. When `N > 8`, there are more bit patterns than values allowed in a double, so multiple images need to map to the same metric.

The difference between our `2^(2764800*8)` image space and the `2^64` metric is, uhhh, `10^(10^6.8)`. Imagine `10^(10^6.8)` pigeons. What a mess.[5]

The metric cannot work as described. There will be various arbitrary interpretations of the data possible to minimize this metric, and almost all of those will result in images that are definitely not the image that was actually encoded, but did minimize the metric. There is no reliable way to do this because it isn’t possible. When you have a pile of data, and you want to reverse meaning from it, there is not one “correct” message that you can divine from it.[6] See also: numerology, for an example that doesn’t involve binary file encodings.

Even pretending that this metric did work, what’s the time complexity of it? We have to check each pixel, so it’s `O(N)`. There’s a constant factor for each pixel computation. How large is that constant? Let’s pretend it’s small and ignore it.

So now we’ve got `K*O(N*log(N))` which is the time complexity of lots of useful algorithms, but we’ve got that awkward constant `K` in the front. Remember that the constant `K` reflects the number of choices for different bits per pixel, bits per channel, and the number of channels of data per pixel. Unfortunately, that constant is the one that was growing a rate best described as “absurd”. That constant is the actual definition of what it means to have no priors. When I said “you can generate arbitrarily many hypotheses, but if you don’t control what data you receive, and there’s no interaction possible, then you can’t rule out hypotheses”, what I’m describing is this constant.

I think it would be very weird, if we were trying to train an AI, to send it compressed video, and much more likely that we do, in fact, send it raw RGB values frame by frame.

What I care about is the difference between:

1. Things that are computable.
2. Things that are computable efficiently.

These sets are not the same.
Capabilities of a superintelligent AGI lie only in the second set, not the first.

It is important to understand that a superintelligent AGI is not brute forcing this in the way that has been repeatedly described in this thread. Instead the superintelligent AGI is going to use a bunch of heuristics or knowledge about the provenance of the binary file, combined with access to the internet so that it can just lookup the various headers and features of common image formats, and it’ll go through and check all of those, and then if it isn’t any of the usual suspects, it’ll throw up metaphorical hands, and concede defeat. Or, to quote the title of this thread, intelligence isn’t magic.

1. ^

This is often phrased as bits per pixel, because a variety of color depth formats use less than 8 bits per channel, or other non-byte divisions.

2. ^

3. ^

A fun question to consider here becomes: where are the alpha bits stored? E.g. if we assume 3 bytes for RGB data, and then we have the 1 alpha bit, is each pixel taking up 9 bits, or are the pixels stored in runs of 8 pixels followed by a single “alpha” pixel with 8 bits describing the alpha channels of the previous 8 pixels?

4. ^

5. ^

6. ^

The way this works for real reverse engineering is that we already have expectations of what the data should look like, and we are tweaking inputs and outputs until we get the data we expected. An example would be figuring out a camera’s RAW format by taking pictures of carefully chosen targets like an all white wall, or a checkerboard wall, or an all red wall, and using the knowledge of those targets to find patterns in the data that we can decode.

• Why do you say that Kolmogorov complexity isn’t the right measure?

most uniformly sampled programs of equal KC that produce a string of equal length.

...

“typical” program with this KC.

I am worried that you might have this backwards?

Kolmogorov complexity describes the output, not the program. The output file has low Kolmogorov complexity because there exists a short computer program to describe it.

• 28 Jun 2022 16:26 UTC
12 points
6 ∶ 0

I have mixed thoughts on this.

I was delighted to see someone else put forth an challenge, and impressed with the amount of people who took it up.

I’m disappointed though that the file used a trivial encoding. When I first saw the comments suggesting it was just all doubles, I was really hoping that it wouldn’t turn out to be that.

I think maybe where the disconnect is occurring is that in the original That Alien Message post, the story starts with aliens deliberately sending a message to humanity to decode, as this thread did here. It is explicitly described as such:

From the first 96 bits, then, it becomes clear that this pattern is not an optimal, compressed encoding of anything. The obvious thought is that the sequence is meant to convey instructions for decoding a compressed message to follow...

But when I argued against the capability of decoding binary files in the I No Longer Believe Intelligence To Be Magical thread, that argument was on a tangent—is it possible to decode an arbitrary binary files? I specifically ruled out trivial encodings in my reasoning. I listed the features that make a file difficult to decode. A huge issue is ambiguity because in almost all binary files, the first problem is just identifying when fields start or end.

I gave examples like

1. Camera RAW formats

2. Compressed image formats like PNG or JPG

3. Video codecs

4. Any binary protocol between applications

1. Network traffic

2. Serialization to or from disk

3. Data in RAM

On the other hand, an array of doubles falls much more into this bucket

data that is basically designed to be interpreted correctly, i.e. the data, even though it is in a binary format, is self-describing.

With all of the above said, the reason why I did not bother uploading an example file in the first thread is frankly because it would have taken me some number of hours to create and I didn’t think there would be any interest in actually decoding it by enough people to justify the time spent. That assumption seems wrong now! It seems like people really enjoyed the challenge. I will update accordingly, and I’ll likely post my example of a file later this week after I have an evening or day free to do so.

• Some thoughts for people looking at this:

• It’s common for binary schemas to distinguish between headers and data. There could be a single header at the start of the file, or there could be multiple headers throughout the file with data following each header.

• There’s often checksums on the header, and sometimes on the data too. It’s common for the checksums to follow the respective thing being checksummed, i.e. the last bytes of the header are a checksum, or the last bytes after the data are a checksum. 16-bit and 32-bit CRCs are common.

• If the data represents a sequence of messages, e.g. from a sensor, there will often be a counter of some sort in the header on each message. E.g. a 1, 2, or 4-byte counter that provides ordering (“message 1”, “message 2″, “message N”) that wraps back to 0.

• I’m not sure if your comment is disagreeing with any of this. It sounds like we’re on the same page about the fact that exact reasoning is prohibitively costly, and so you will be reasoning approximately, will often miss things, etc.

I agree. The term I’ve heard to describe this state is “violent agreement”.

so in practice wrong conclusions are almost always due to a combination of both “not knowing enough” and “not thinking hard enough” /​ “not being smart enough.”

The only thing I was trying to point out (maybe more so for everyone else reading the commentary than for you specifically) is that it is perfectly rational for an actor to “not think hard enough” about some problem and thus arrive at a wrong conclusion (or correct conclusion but for a wrong reason), because that actor has higher priority items requiring their attention, and that puts hard time constraints on how many cycles they can dedicate to lower priority items, e.g. debating AC efficiency. Rational actors will try to minimize the likelihood that they’ve reached a wrong conclusion, but they’ll also be forced to minimize or at least not exceed some limit on allowed computation cycles, and on most problems that means the computation cost + any type of hard time constraint is going to be the actual limiting factor.

Although even that, I think that’s more or less what you meant by

in some sense you’ve probably spent too long thinking about the question relative to doing something else

In engineering R&D we often do a bunch of upfront thinking at the start of a project, and the goal is to identify where we have uncertainty or risk in our proposed design. Then, rather than spend 2 more months in meetings debating back-and-forth who has done the napkin math correctly, we’ll take the things we’re uncertain about and design prototypes to burn down risk directly.

• First, it only targeted Windows machines running an Microsoft SQL Server reachable via the public internet. I would not be surprised if ~70% or more theoretically reachable targets were not infected because they ran some other OS (e.g. Linux) or server software instead (e.g. MySQL). This page makes me think the market share was actually more like 15%, so 85% of servers were not impacted. By not impacted, I mean, “not actively contributing to the spread of the worm”. They were however impacted by the denial-of-service caused by traffic from infected servers.

Second, the UDP port (1434) that the worm used could be trivially blocked. I have discussed network hardening in many of my posts. The easiest way to prevent yourself from getting hacked is to not let the hacker send traffic to you—blocking IP ranges, ports, unneeded Ethernet or IP protocols, and other options available in both network hardware (routers) or software firewalls provides a low cost and highly effective way to do so. This contained the denial-of-service.

Third, the worm’s attack only persisted in RAM, so the only thing a host had to do was restart the infected application. Combined with the second point, this would prevent the machine from being reinfected.

This graph[1] shows the result of wide-spread adoption of filter rules within hours of the attack being detected

1. ^

• This was actually a kind of fun test case for a priori reasoning. I think that I should have been able to notice the consideration denkenbgerger raised, but I didn’t think of it. In fact when I stared reading his comment my immediate reaction was “this methodology is so simple, how could the equilibrium infiltration rate end up being relevant?” My guess would be that my a priori reasoning about AI is wrong in tons of similar ways even in “simple” cases. (Though obviously the whole complexity scale is shifted up a lot, since I’ve spent hundreds of hours thinking about key questions.)

This idea—that you should have been able to notice the issue with infiltration rates—is what I’ve been questioning when I ask “what is the computational complexity of general intelligence” or “what does rational decision making look like in a world with computational costs for reasoning”.

There is a mindset that people are simply not rational enough, and if they were more rational, they wouldn’t fall to those traps. Instead, they would more accurately model the situation, correctly anticipate what will and won’t matter, and arrive at the right answer, just by exercising more careful, diligent thought.

My hypothesis is that whatever that optimal “general intelligence” algorithm[1] is—the one where you reason a priori from first principles, and then you exhaustively check all of your assumptions for which one might be wrong, and then you recursively use that checking to re-reason from first principles—it is computational inefficient enough in such a way that for most interesting[2] problems, it is not realistic to assume that it can run to completion in any reasonable[3] time with realistic computation resources, e.g. a human brain, or a supercomputer.[4]

I suspect that the human brain is implementing some type of randomized vaguely-Monte-Carlo-like algorithm when reasoning, which is how people can (1) often solve problems in a reasonable amount of time[5], (2) often miss factors during a priori reasoning but understand them easily after they’ve seen it confirmed experimentally, (3) different people miss different things, (4) often if someone continues to think about a problem for an arbitrarily long people of time[6] they will continue to generate insights, and (5) often those insights generated from thinking about a problem for an arbitrarily long period of time are only loosely correlated[7].

In that world, while it is true that you should have been able to notice the problem, there is no guarantee on how much time it would have taken you to do so.

1. ^

The “God algorithm” for reasoning, to use a term that Jeff Atwood wrote about in this blog post. It describes the idea of an optimal algorithm that isn’t possible to actually use, but the value of thinking about that algorithm is that it gives you a target to aim towards.

2. ^

The use of the word “interesting” is intended to describe the nature of problems in the real world, which require institutional knowledge, or context-dependent reasoning.

3. ^

The use of the word “reasonable” is intended to describe the fact that if a building is on fire and you are inside of it, you need to calculate the optimal route out of that burning building in a time period that is than a few minutes in length in order to maximize your chance of survival. Likewise, if you are tasked to solve a problem at work, you have somewhere between weeks and months to show progress or be moved to a separate problem. For proving a theorem, it might be reasonable to spend 10+ years on it if there’s nothing necessitating a more immediate solution.

4. ^

This is mostly based on an observation that for any scenario with say some fixed number of “obvious” factors influencing it, there are effectively arbitrarily many “other” factors that may influence the scenario, and the process of deterministically ordering an arbitrarily long list and then preceding down the list from “most likely to impact the situation” and “least likely to impact the scenario” to manually check if each “other” factor actually does matter has an arbitrarily high computational cost.

5. ^

Feel free to put “solve” in quotes and read this as “halt in a reasonable time” instead. Getting the correct answer is optional.

6. ^

Like mathematical proofs, or the thing where people take a walk and suddenly realize the answer to a question they’ve been considering.

7. ^

It’s like the algorithm jumped from one part of solution space where it was stuck to a random, new part of the solution space and that’s where it made progress.

• 23 Jun 2022 3:32 UTC
3 points
0 ∶ 0

I deliberately tried to focus on “external” safety features because I assumed everyone else was going to follow the task-as-directed and give a list of “internal” safety features. I figured that I would just wait until I could signal-boost my preferred list of “internal” safety features, and I’m happy to do so now—I think Lauro Langosco’s list here is excellent and captures my own intuition for what I’d expect from a minimally useful AGI, and that list does so in probably a clearer /​ easier to read manner than what I would have written. It’s very similar to some of the other highly upvoted lists, but I prefer it because it explicitly mentions various ways to avoid weird maximization pitfalls, like that the AGI should be allowed to fail at completing a task.

• We can even consider the proposed plan (add a 2nd hose and increase the price by \$20) in the context of an actual company.

The proposed plan does not actually redesign the AC unit around the fact that we now have 2 hoses. It is “just” adding an additional hose.

Let’s assume that the distribution of AC unit cooling effectively looks something like this graphic that I made in 3 seconds.

In this image, we are choosing to assume that yes, in fact, 2-hose units are more efficient on average than a 1-hose unit. We are also recognizing that perhaps there is some overlap. Perhaps there are especially bad 2-hose units, and especially good 1-hose units.

Based on all of the evidence, I’m going to say that the average 1-hose unit does represent the minimum efficiency needed for cooling in an average consumer’s use-case—i.e. it is sufficient for their needs.

When I consider what would make a 2-hose unit good or bad, I suspect it has a lot to do with how much of the design is built around the fact that there are 2-hoses.

In your proposal, we simply add a 2nd hose to a unit that was otherwise designed functionally as a 1-hose unit. Let’s consider where that might be plotted on this graph.

I’m going to claim based on vague engineering intuition /​ judgment /​ experience that it goes right here.

If I am right about where this proposal falls against the competition, then here’s what we’ve done:

1. This is not a 1-hose unit any more. Despite it being more efficient than the average 1-hose units, and only slightly more expensive, consumers looking at 1-hose units (because they are concerned about cost) will not see this model. The argument that it is “only \$20 more expensive” is irrelevant. Their search results are filtered, they read online that they wanted a one-hose unit, this product has been removed from their consideration.

2. This is a bad 2-hose unit. It is at the bottom of the efficiency scale, because other 2-hose units were actually designed to take full advantage of the 2-hoses. They will beat you on efficiency, even if they cost more. Wirecutter will list this in the “also ran” when discussing 2-hose units, “So and so sells a 2-hose model, but it was barely more efficient than a 1-hose, we cannot recommend it”.

3. A consumer looking at 2-hose units is already selecting for efficiency over cost, so they will not buy the “just add another hose” 2-hose unit, since it is on the wrong end of the 2-hose distribution.

4. You will acquire a reputation as the company that sells “cheap” products—your unit is cheaper than other 2-hose units, but isn’t better because it wasn’t designed as a 2-hose unit, and it was torn apart by reviewers.

5. Fixing this inefficiency requires actually designing around 2-hoses, which likely results in something like this

“Minimum viable”, in the context of a “minimum viable product” or MVP, is a term in engineering that represents the minimal thing that a consumer will pay to acquire. This is a product that can actually be sold. It’s not the literal worst in its category, and it has a clear supremacy over cheaper categories. This is also called table stakes. Reviewers will consider it fairly, consumers will not rage review it, etc.

However, it’s probably also a lot more expensive than the hypothetical “only \$20 more” that has been repeatedly stated.

Even in the scenario where a reviewer does consider the “just add another hose” model when viewing one-hose units, we’ve already established that the one-hose unit is cheaper (by \$20! if it’s a \$200 unit, that’s 10%), and that the average 1-hose unit is sufficient for some average use-case. Therefore the rational consumer choice is to buy the cheaper one-hose anyway, because it’s irrational to pay more for efficiency that isn’t needed![1][2]

1. ^

The exception here is some hypothetical consumer who knows, for a fact, that their unique situation requires a two-hose unit, e.g. they tried a one-hose unit already and it was insufficient.

2. ^

There’s also an argument here that a rational option is to buy a 1-hose unit, and then if you need slightly more efficiency, just buy & wrap the 1-hose with insulation, as described here. This allows the consumer to purchase at the lower price point and then add efficiency if needed for the cost of the insulation. It’s unclear to me that the “just add another hose” AC would still perform better than an insulated 1-hose.

• I didn’t even think to check this math, but now that I’ve gone and tried to calculate it myself, here’s what I got:

EDIT: I see the issue. The parent post says that the control test was done at evening, where the temperature was 82 F. So it’s not even comparable at all, imo.

• I’ll edit the range, and note that “uncomfortably hot” is my opinion. Rest of my analysis /​ rant still applies. In fact, in your case, you don’t need need the AC unit at all, since you’d be fine with the control temperature.

• I take fault with your primary conclusion, for the same reasons I gave in the first thread:

1. You claim how little adding a 2nd hose would impact the system, without analyzing the actual constraints that apply to engineers building a product that must be shipped & distributed

2. You still neglect the existence of insulating wraps for the hose which do improve efficiency, but are also not sold with the single-hose AC system, which lends evidence to my first point—companies are aware of small cost items that improve AC system efficiency, but do not include them with the AC by default, suggesting that there is an actual price point /​ consumer market /​ confounding issue at play that prevents them doing so

The full posts, quoted here for convenience

I think one reason that this error occurs is that there’s a mistaken assumption that the available literature captures all institutional knowledge on a topic, so if one simply spends enough time reading the literature, they’ll have all requisite knowledge needed for policy recommendations. I realize that this statement could apply equally to your own claims here, but in my experience I see it happen most often when someone reads a handful of the most recently released research papers and from just that small sample of work tries to draw conclusions applicable that are broadly applicable to the entire field.

Engineering claims are particularly suspect because institutional knowledge (often in the form of proprietary or confidential information held by companies and their employees) is where the difference between what is theoretically efficient and what is practically more efficient is found. It doesn’t even need to be protected information though—it can also just be that due to manufacturing reasons, or marketing reasons, or some type of incredibly aggravating constraint like “two hoses require a larger box and the larger box pushes you into a shipping size with much higher per-volume /​ mass costs so the overall cost of the product needs to be non-linearly higher than what you’d expect would be needed for a single hose unit, and that final per-unit cost is outside of what people would like to pay for an AC unit, unless you then also make drastic improvements to the motor efficiency, thermal efficiency, and reduce the sound level, at which point the price is now even higher than before, but you have more competitive reasons to justify it which will be accepted by a large enough % of the market to make up for the increased costs elsewhere, except the remaining % of the market can’t afford that higher per-unit cost at all, so we’re back to still making and selling a one-hose unit for them”.

Concrete example while we’re on the AC unit debate—there’s a very simple way to increase efficiency of portable AC units, and it’s to wrap the hot exhaust hose with insulating duct wrap so that less of the heat on that very hot hose radiates directly back into the room you’re trying to cool. Why do companies not sell their units with that wrap? Probably for one of any of the following reasons—A.) takes up a lot of space, B.) requires a time investment to apply to the unit which would dissuade buyers who think they can’t handle that complexity, C.) would cost more money to sell and no longer be profitable at the market’s price point, D.) has to be applied once the AC unit is in place, and generally is thick enough that the unit is no longer “portable” which during market testing was viewed as a negative by a large % of surveyed people, or E.) some other equally trivial sounding reason that nonetheless means it’s more cost effective for companies to NOT sell insulating duct wrap in the same box as the portable AC unit.

Example of an AC company that does sell an insulating wrap as an optional add-on: https://​​www.amazon.com/​​DeLonghi-DLSA003-Conditioner-Insulated-Universal/​​dp/​​B07X85CTPX

EDIT: I want to make a meta point here, which is that I have not personally worked on ACs, but I have built & shipped multiple products to consumers, and the type of stupid examples I gave in the first AC post are not just made-up for fun. Engineers argue extensively in meetings about “how can we make product A better”, and ideas get shot down for seemingly trivial reasons that basically come down to—yes, in a vacuum, that would be better, but unfortunately, there’s a ton of existing context like how large a truck is or what parts can actually be bought off the shelf that kneecap those ideas before they leave the design room. The engineers who designed the AC were not idiots, or morons, or clowns who don’t understand thermodynamic efficiency. Engineering is about working around limitations. Those limitations do not have to be rooted in physics; society or infrastructure or consumer behavior around critical price points can all be just as real in terms of what it is feasible for a company to create. Just look at how many startups fail and the founder claims in a postmortem, “Yeah, our tech was way better, but unfortunately people wouldn’t pay 10% more for it, even though it was AMAZING compared to our competitor. We just couldn’t get them to switch.

EDIT 2: I’m pretty annoyed that you doubled-down on your conclusion even after admitting the actual efficiency difference was significantly less than expected, and then chose a different analysis to let you defend your original point anyway, so these edits might keep coming. Regarding market pressures, two-hose AC units do exist. Companies do sell them, and if consumers want to buy a two-hose AC unit, they can do so. But the presence of both one-hose AC units and two-hose AC units in the market tells us it is not winner-take-all and there is consumer behavior, e.g. around price or complexity, that prevents two-hose units from acquiring literally all market share. So until that changes, it will always be more rational for companies to sell one-hose AC units in addition to their two-hose AC unit, because otherwise they’d be leaving money on the floor by only servicing part of the consumer market. (EDIT 5: see also this post, which was itself a reply to AllAmericanBreakfast’s reply on this thread here)

EDIT 3: Let’s look at your math. Outdoor temp is 85-88 F, let’s just take the average and call it 86.5 F. That’s pretty hot. I’d definitely be uncomfortable in that scenario. How cold did the AC cool the rooms? You say on low fan it was 20.6 F degrees with one hose, 22.7 F with two hoses, and then on high fan, 18.3 F with one hose, and 22.2 F with two hoses. The control was 13.1 F. Looking at the control, that gives a room temperature of ~73.4 F. That is uncomfortably hot in my opinion. I keep my room temperature around 68-70 F ish. The internet tells me that this is within the window of a “comfortable room temperature” defined as 67-75 F[1], so I’m just a normal human, I guess. How well did the ACs accomplish that? With one hose, you got it down to ~66 F, and with two hoses, you had it down to about ~64 F. That is pretty cold in my mind—I would not set my AC that low if it actually reached that temperature. What does this mean? The one hose unit literally did the job it was designed to do. With an incredibly hot outside temperature, that resulted in an uncomfortable indoor “control” temperature, the one-hose AC was able to lower the temperature to a comfortable, ideal range, and then go below that, showing it even has margin left over. But now you’re saying that they should make the thing more expensive and optimize it for even greater efficiency because … why!? It works!

EDIT 4: I will die on this hill. This is the problem with how the rationalist community approaches the concept of what it means to “make a rational decision” perfectly demonstrated in a single debate. You do not make a “rational decision” in the real world by reasoning in a vacuum. That is how you arrive at a hypothetically good action, but it is not necessarily feasible or possible to perform, so you always need to check your analysis by looking at real world constraints and then pick the action that is 1.) actually possible in the real world, and 2.) still has the highest expected value. Failing to do that is not more clever or more rational, it is just a bad, broken model for how an ideal, optimal agent would behave. An optimal agent doesn’t ignore their surroundings—they play to them, exploit them, use them.

1. ^

I averaged the following lower /​ upper temperatures.

Wikipedia: 64-75
www.cielowigle.com: 68-72
www.vivint.com: 68-76
www.provicincialheating.ca: 68-76

• I really like this list because it does a great job of explicitly specifying the same behavior I was trying to vaguely gesture at in my list when I kept referring to AGI-as-a-contract-engineer.

Even your point about it doesn’t have to succeed, it’s ok for it to fail at a task if it can’t reach it in some obvious, non-insane way—that’s what I’d expect from a contractor. The idea that an AGI would find that a task is generally impossible but identify a novel edge case that allows it to be accomplished with some ridiculous solution involving nanotech and then it wouldn’t alert or tell a human about that plan prior to taking it has always been confusing to me.

In engineering work, we almost always have expected budget /​ time /​ material margins for what a solution looks like. If someone thinks that solution space is empty (it doesn’t close), but they find some other solution that would work, people discuss that novel solution first and agree to it.

That’s a core behavior I’d want to preserve. I sketched it out in another document I was writing a few weeks ago, but I was considering it in the context of what it means for an action to be acceptable. I was thinking that it’s actually very context dependent—if we approve an action for AGI to take in one circumstance, we might not approve that action in some vastly different circumstance, and I’d want the AGI to recognize the different circumstances and ask for the previously-approved-action-for-circumstance-A to be reapproved-for-circumstance-B.

EDIT: Posting this has made me realize that idea of context dependencies is applicable more widely than just allowable actions, and it’s relevant to discussion of what it means to “optimize” or “solve” a problem as well. I’ve suggested this in my other posts but I don’t think I ever said it explicitly: if you consider human infrastructure, and human economies, and human technology, almost all “optimal” solutions (from the perspective of a human engineer) are going to be built on the existing pile of infrastructure we have, in the context of “what is cheapest, easiest, the most straight line path to a reasonably good solution that meets the requirements”. There is a secret pile of “optimal” (in the context of someone doing reasoning from first principles) solutions that involve ignoring all of human technology and bootstrapping a new technology tree from scratch, but I’d argue that’s a huge overlap if not the exact same set as the things people have called “weird” in multiple lists. Like if I gave a contractor a task to design a more efficient paperclip factory and they gave me a proposed plan that made zero reference to buying parts from our suppliers or showed the better layout of traditional paper-clip making machines or improvements to how an existing paper-clip machine works, I’d be confused, because that contractor is likely handing me a plan that would require vertically integrating all of the dependencies, which feels like complete overkill for the task that I assigned. Even if I phrased my question to a contractor as “design me the most efficient paperclip factory”, they’d understand constraints like: this company does not own the Earth, therefore you may not reorder the Earth’s atoms into a paperclip factory. They’d want to know, how much space am I allowed? How tall can the building be? What’s the allowable power usage? Then they’d design the solution inside of those constraints. That is how human engineering works. If an AGI mimicked that process and we could be sure it wasn’t deceptive (e.g. due to interpretability work), then I suspect that almost all claims about how AGI will immediately kill everyone are vastly less likely, and the remaining ways AGI can kill people basically reduce to the people controlling the AGI deliberately using it to kill people, in the same way that the government uses military contractors to design new and novel ways of killing people, except the AGI would be arbitrarily good at that exercise.

• 21 Jun 2022 7:22 UTC
13 points
0 ∶ 0

Oh, sorry, you’re referring to this:

includes a distributed network of non-nuclear electromagnetic pulse emitters that will physically shut down any tech infrastructure appearing to be running rogue AI agents.

This just seems like one of those things people say, in the same vein as “melt all of the GPUs”. I think that non-nuclear EMPs are still based on chemical warheads. I don’t know if a “pulse emitter” is a thing that someone could build. Like I think what this sentence actually says is equivalent to saying

includes a distributed network of non-nuclear ICBMs that will be physically shot at any target believed to be running a rogue AI agent

and then we can put an asterisk on the word “ICBM” and say it’ll cause an EMP at the detonation site, and only a small explosion.

But you can see how this now has a different tone to it, doesn’t it? It makes me wonder how the system defines “appears to be running rogue AI agents”, because now I wonder what the % chance of false positives is—since on a false positive, the system launches a missile.

What happens if this hypothetical system is physically located in the United States, but the rogue AI is believed to be in China or Russia? Does this hypothetical system fire a missile into another country? That seems like it could be awkward if they’re not already on board with this plan.

because they’re doing something pretty non-trivial, they probably have to be big complex systems. Because they’re big complex systems, they’re hackable. Does this sound right to you? I’m mostly asking you about the step “detecting rogue AI implies hackable”. Or to expand the question, for what tasks XYZ can you feasibly design a system that does XYZ, but is really seriously not hackable even by a significantly superhuman hacker?

It’s not really about “tasks”, it’s about how the hardware/​software system is designed. Even a trivial task, if done on a general-purpose computer, with a normal network switch, the OS firewall turned off, etc, is going to be vulnerable to whatever exploits exist for applications or libraries running on that computer. Those applications or libraries expose vulnerabilities on a general-purpose computer because they’re connected to the internet to check for updates, or they send telemetry, or they’re hosting a Minecraft server with log4j.

It seems like you could not feasibly make an unhackable system that takes a bunch of inputs from another (unsafe) system and processes them in a bunch of complex ways using software that someone is constantly updating, because having the ability to update to the latest Detect-O-Matic-v3.4 without knowing in advance what sort of thing the Detect-O-Matic is, beyond that it’s software, seems to imply being Turing-completely programmable, which seems to imply being hackable.

When you’re analyzing the security of a system, what you’re looking for is “what can the attacker control?”

If the attacker can’t control anything, the system isn’t vulnerable.

We normally distinguish between remote attacks (e.g. over a network) and physical attacks (e.g. due to social engineering or espionage or whatever). It’s generally safe to assume that if an attacker has physical access to a machine, you’re compromised.[1] So first, we don’t want the attacker to have physical access to these computers. That means they’re in a secure facility, with guards, and badges, and access control on doors, just like you’d see in a tech company’s R&D lab.

That leaves remote attacks. These generally come in two forms:

1. The attacker tricks you into downloading and running some compromised software. For example, visiting a website with malicious JavaScript, or running some untrusted executable you downloaded because it was supposed to be a cheat engine for a video game but it was actually just a keylogger, or the attacker has a malicious payload in a seemingly innocent file type like a Word document or PDF file and it’s going to exploit a bug in the Word program or Adobe Acrobat program that tries to read that file.

2. The attacker sends network traffic to the machine which is able to compromise the machine in some way, generally by exploiting open ports or servers running on the target machine.

All of the attacks in (1) fall under “when you run untrusted code, you will get pwned” umbrella. There’s a bunch of software mitigations for trying to make this not terrible, like admin users vs non-admin users, file system permissions, VM sandboxing, etc, but ultimately it’s just like rearranging deck chairs on the Titanic. It doesn’t matter what you do, someone else is going to find a side channel attack and ruin your day if you let them run code on your machine. So don’t do that. This is actually easier than you might think: plenty of systems are “secure” because they run an incredibly minimal Linux OS (or some RTOS or even just bare metal) and they’re effectively static—the software image is flashed to some SoC’s read-only memory (ROM) by an external debugger[2], and there’s no capability from within the software to write or update that memory. The processor is not configured for running code outside of that ROM. There’s no user accounts, or “default applications”, or browsers, or anything else other than the code for performing the actual task required by the system.

For (2), in cases where the system is not supposed to be receiving arbitrary traffic, we solve this by using networking hardware downstream of the system that drops any unexpected traffic, and rate-limits everything. If the attacker’s traffic gets dropped, they can’t control anything. The rate-limiting is to prevent a denial-of-service based on traffic we expect. For the “expected” traffic, the ideal mechanism to prevent abuse is a binary schema for valid messages, a fixed MTU size on messages, and the use of authenticated encryption so that the attacker is unable to generate the traffic without access to encryption keys. Encryption keys can be stored in hardware security modules that are physically located on the senders & receivers within the system.[3]

With that in mind, I hope it is clear that the problem isn’t about complexity of a system, or the number of inputs that the system has, or if the system supports software updates.

The actual issue with the system we’re discussing is that the attacker can control their network traffic, and in this hypothetical, we need to look at arbitrary network traffic to decide if it’s evidence of a rogue AI.[4] We can’t just drop everything in hardware and ignore messages that don’t fit an authenticated encryption scheme, because we want to read that traffic.

So we have to assume that the attacker can generate arbitrary payloads, send them to our system, and our system is going to need to not get pwned.

I think I’d probably approach it with the following thoughts:

1. I’d design the system so that the traffic to be inspected is distributed across a cluster of computers. This is both to prevent denial-of-service attacks and to prevent targeted attacks like Nethammer. If the system is receiving traffic in excess of what the cluster is designed to handle, it’ll be dropped—while this represents a degradation in monitoring capability, it can be fixed by adding additional machines to the cluster as needed. It’s often useful to gracefully degrade capabilities when a system is under load.

2. The computers in the cluster that receive traffic have a simple role: send an alert message to a central supervisor if suspicious traffic is detected.

3. The alert message uses authenticated encryption.

4. The central supervisor is only communicated with via authenticated encryption messages on a local network. It is not connected to the wider network and it is not possible for the attacker to send traffic to the supervisor machine, unless they first compromised one of the cluster machines.

The supervisor machine is pretty safe unless a cluster machine is compromised. If a cluster machine is compromised, maybe we think that the attacker can control the alert message, and in doing so, control traffic sent to the central supervisor? Let’s fix that. What do we need for the alert message? Maybe the IP of the packet, geolocation information, reasons why it was suspicious? Etc. We’ll define a binary schema for that information and we’ll stick a serial device or FPGA onto each cluster machine.

1. The cluster machines send the alert message by setting the relevant fields of the message via the serial device or FPGA registers, and it’s that device which sends the actual alert. Now if the cluster machine is compromised, all the attacker can do is change fields in the alert message, but they can’t send different messages entirely.[5]

2. We test the central supervisor to confirm that any permutation of the alert message is still safe to receive. For testing permutations, we normally use a method based on equivalence classes.

So now attackers can maybe compromise cluster machines because they control the traffic that we’re trying to monitor, but they can’t use a compromised cluster machine to control traffic to the central supervisor, because compromising a cluster machine doesn’t let you change the traffic sent to a supervisor in a way that would be dangerous.

At this point we can ask what the attacker can do:

• They can maybe send garbage alert messages.

• They can maybe send no message at all—i.e. they crash the cluster machine so it fails silent.

We can mitigate the 2nd problem by requiring a heartbeat from each cluster machine to the central supervisor. A heartbeat is another message sent via authenticated encryption, but it’s sent at a known frequency, like 100 Hz. If an attacker or any other event crashes the machine, the central supervisor will know.

If the attacker sends a garbage alert message, what is the downside? It could be bad if this system launches missiles, so maybe we’ll have human review on any alert that the central supervisor receives, prior to any action being taken by the system. The human review can include a log of the network traffic that generated the alert, captured by the cluster machine, and a diagnosis of the cluster machine’s telemetry for how it concluded the traffic was suspicious. This is intended to mitigate the possibility of a false positive where the attacker can control where we think a rogue AI is located.

Now, where we stand is that an attacker can maybe compromise a cluster machine, and in doing so, they can maybe generate an erroneous alert message for a rogue AI, which a human review maybe won’t be able to detect as erroneous, so whatever action gets taken on belief that a rogue AI exists would be sent to a location controlled by the attacker. The magnitude of how bad that is depends on what action is taken. If it is “fire an ICBM”, it could be pretty bad.

The other possibility is that an attacker compromises a machine in such a way that no alert is generated, but no heartbeat is missed either. This is an interesting failure mode because it’s equivalent to another failure mode that we have not discussed: the system could simply fail to detect the behavior of a rogue AI. In other words, this system doesn’t need to be hacked in order to fail. If the algorithm we’re using for detecting rogue AI has any margin for error at all, then it should be no surprise to anyone that this system is susceptible to a false negative, i.e.: it did not send an alert when it should have.

The relevant question becomes, what’s the margin of error on the hypothetical algorithm for detecting a rogue AI? If that algorithm has a higher likelihood of a false positive than our hypothetical attacker, after all of our mitigations, than maybe this system isn’t even worth building, depending on how high that likelihood is. Likewise, if the algorithm has a high likelihood of a false negative, such that a rogue AI can reasonably evade detection due to not being suspicious enough, then this system might still not be worth building, because now the most likely failure mode is that the system just doesn’t work reliably enough, hacked or not.

1. ^

This is a safe assumption because if the attacker has physical access to a machine, they can do dumb things like pull out the hard drive and put it into another computer that they control. Enable encryption on your drives.

2. ^

When I say “external debugger”, this is implying physical access to the system.

3. ^

In order to get the encryption keys, the attacker needs physical access to the system. Attacks that would use a side-channel attack won’t work here, because we don’t accept arbitrary traffic, and in order to generate that traffic, you need the keys. It’s a catch-22, unless you have physical access. But if you have physical access, we assume the system is compromised.

4. ^

To be clear, I am not convinced that “evidence of a rogue AI” is a meaningful description of behavior.

5. ^

This is assuming that we’ve verified the device that sends the alert message, which we’ll have high confidence in because of how simple the implementation will be.

• Would you agree that if there were a system that could automatically (without humans) monitor for rogue AI, then that system is probably hackable?

I can’t comment on this, because I have no idea what this system would even do, or what it would be looking for, or who would be running it.

What indicates a rogue AI? How is that differentiated from human hackers? How is that different from state-level actors like the NSA? How is it different from some white hat security researchers doing pen-testing in the wild for questionable reasons? What “behavior” is the system trying to look for?

How is it monitoring? What inputs is it taking? Is it only looking at internet traffic? How? Where is it tapping for those packets? There’s a lot of internet traffic. Is this even “a computer”, or it, “an entire facility of computers”. Why are there no humans involved in this process? Where do the logs, monitoring, or other telemetry go if this is done “without humans”? Wouldn’t this system be reporting some type of digest or summary for review? Why do you think that it would need to be a general computer? Do you think that high-performance SoCs do not exist, or that it would not be possible to have a distributed system containing multiple SoCs with the equivalent RAM and compute of some general-purpose machine?

What would it mean to hack this system? Just that it doesn’t report any rogue AI anymore? Or that it erroneously reports the presence of rogue AI? What would hacking it to not report rogue AI look like? Are we concerned that it would log the presence of a rogue AI, but then the rogue AI would be able to wipe that entry from some log file, like a movie? Or are we thinking that the rogue AI is going to disable this system before it is even noticed by the system? Do we count “turning off the power” as “hacking”?

• 21 Jun 2022 0:43 UTC
6 points
1 ∶ 0