#!/usr/bin/env python3
import struct
a = [1.3, 2.1, -0.5] # initialize data
L = 2095 # total length
i = 0
while len(a)<L:
if abs(a[i]) > 2.0 or abs(a[i+1]) > 2.0:
a.append(a[i]/2)
i += 1
else:
a.append(a[i] * a[i+1] + 1.0)
a.append(a[i] - a[i+1])
a.append(a[i+1] - a[i])
i += 2
f = open("out_reproduced.bin","wb") # save in binary
f.write(struct.pack('d'*L,*a)) # use IEEE 754 double format
f.close()
Then one can check that the produced file is identical:
edit: How I found the solution: I found some of the other comments helpful, especially from gjm (although I did not read all). In particular, interpreting the data as a sequence of 64-bit floating point numbers saved me a lot of time. Also gjm’s mention of the pattern a, -a, b, c, -c, d was an inspiration.
If you look at the first couple of numbers, you can see that they are sometimes half of an earlier number. Playing around further with the numbers I eventually found the patterns a[i] * a[i+1] + 1.0 and a[i] - a[i+1].
It remained to figure out when the a[i]/2 rule applies and when the a[i] * a[i+1] + 1.0 rule applies. Here it was a hint that the numbers do not grow too large in size. After trying out several rules that form bounds on a[i] and a[i+1], I eventually found the right one.
I was delighted to see someone else put forth an challenge, and impressed with the amount of people who took it up.
I’m disappointed though that the file used a trivial encoding. When I first saw the comments suggesting it was just all doubles, I was really hoping that it wouldn’t turn out to be that.
I think maybe where the disconnect is occurring is that in the original That Alien Message post, the story starts with aliens deliberately sending a message to humanity to decode, as this thread did here. It is explicitly described as such:
From the first 96 bits, then, it becomes clear that this pattern is not an optimal, compressed encoding of anything. The obvious thought is that the sequence is meant to convey instructions for decoding a compressed message to follow...
But when I argued against the capability of decoding binary files in the I No Longer Believe Intelligence To Be Magical thread, that argument was on a tangent—is it possible to decode an arbitrary binary files? I specifically ruled out trivial encodings in my reasoning. I listed the features that make a file difficult to decode. A huge issue is ambiguity because in almost all binary files, the first problem is just identifying when fields start or end.
I gave examples like
Camera RAW formats
Compressed image formats like PNG or JPG
Video codecs
Any binary protocol between applications
Network traffic
Serialization to or from disk
Data in RAM
On the other hand, an array of doubles falls much more into this bucket
data that is basically designed to be interpreted correctly, i.e. the data, even though it is in a binary format, is self-describing.
With all of the above said, the reason why I did not bother uploading an example file in the first thread is frankly because it would have taken me some number of hours to create and I didn’t think there would be any interest in actually decoding it by enough people to justify the time spent. That assumption seems wrong now! It seems like people really enjoyed the challenge. I will update accordingly, and I’ll likely post my example of a file later this week after I have an evening or day free to do so.
Your interlocutor in the other thread seemed to suggest that they were busy until mid-July or so. Perhaps you could take this into account when posting.
I agree that IEEE754 doubles was quite an unrealistic choice, and too easy. However, the other extreme of having a binary blob with no structure at all being manifest seems like it would not make for an interesting challenge. Ideally, there should be several layers of structure to be understood, like in the example of a “picture of an apple”, where understanding the file encoding is not the only thing one can do.
This suggests a different question. For non-participants who are given the program which creates the data, what probability/timeframe to assign to success.
On this one I think that I would have put a high probability to be solved but would have anticipated a longer timeframe.
Kolmogorov Complexity isn’t the right measure. I’m pretty sure this program is way simpler (according to you or me) than most uniformly sampled programs of equal KC that produce a string of equal length.
And i wouldn’t consider it short given how hard it would be to find a “typical” program with this KC.
Why do you say that Kolmogorov complexity isn’t the right measure?
most uniformly sampled programs of equal KC that produce a string of equal length.
...
“typical” program with this KC.
I am worried that you might have this backwards?
Kolmogorov complexity describes the output, not the program. The output file has low Kolmogorov complexity because there exists a short computer program to describe it.
Let me rephrase then: most strings of same length and same KC will have much more intuitively complex programs-of-minimum-length that generated them.
Why is the program intuitively simple? Well for example, suppose you have the following code in the else block:
a.append(min(max(int(i+a[i]),2),i)
i += 1
I think the resulting program has lower length (so whatever string it generates has lower KC), but it would be way harder to find. And I bet that a uniformly sampled string with the same KC will have a program that looks much worse than that. The kinds or programs that look normal to you or me are a heavily skewed sample, and this program does look reasonably normal.
Eh. I agree with both replies that the example was bad. I wanted to do something unintuitive with pointer arithmetic, but the particular code probably doesn’t capture it well, so it may in fact be easier to find. (Still totally stand by the broader point though.)
Even more obviously, why do aliens adhere to the IEEE 754 standard? My interpretation is that the cryptanalyst from the post has indeed been pranked by their friend at NASA.
Solution:
Then one can check that the produced file is identical:
edit: How I found the solution: I found some of the other comments helpful, especially from gjm (although I did not read all). In particular, interpreting the data as a sequence of 64-bit floating point numbers saved me a lot of time. Also gjm’s mention of the pattern a, -a, b, c, -c, d was an inspiration. If you look at the first couple of numbers, you can see that they are sometimes half of an earlier number. Playing around further with the numbers I eventually found the patterns
a[i] * a[i+1] + 1.0
anda[i] - a[i+1]
. It remained to figure out when thea[i]/2
rule applies and when thea[i] * a[i+1] + 1.0
rule applies. Here it was a hint that the numbers do not grow too large in size. After trying out several rules that form bounds ona[i]
anda[i+1]
, I eventually found the right one.Replicated!
I have mixed thoughts on this.
I was delighted to see someone else put forth an challenge, and impressed with the amount of people who took it up.
I’m disappointed though that the file used a trivial encoding. When I first saw the comments suggesting it was just all doubles, I was really hoping that it wouldn’t turn out to be that.
I think maybe where the disconnect is occurring is that in the original That Alien Message post, the story starts with aliens deliberately sending a message to humanity to decode, as this thread did here. It is explicitly described as such:
But when I argued against the capability of decoding binary files in the I No Longer Believe Intelligence To Be Magical thread, that argument was on a tangent—is it possible to decode an arbitrary binary files? I specifically ruled out trivial encodings in my reasoning. I listed the features that make a file difficult to decode. A huge issue is ambiguity because in almost all binary files, the first problem is just identifying when fields start or end.
I gave examples like
Camera RAW formats
Compressed image formats like PNG or JPG
Video codecs
Any binary protocol between applications
Network traffic
Serialization to or from disk
Data in RAM
On the other hand, an array of doubles falls much more into this bucket
With all of the above said, the reason why I did not bother uploading an example file in the first thread is frankly because it would have taken me some number of hours to create and I didn’t think there would be any interest in actually decoding it by enough people to justify the time spent. That assumption seems wrong now! It seems like people really enjoyed the challenge. I will update accordingly, and I’ll likely post my example of a file later this week after I have an evening or day free to do so.
Your interlocutor in the other thread seemed to suggest that they were busy until mid-July or so. Perhaps you could take this into account when posting.
I agree that IEEE754 doubles was quite an unrealistic choice, and too easy. However, the other extreme of having a binary blob with no structure at all being manifest seems like it would not make for an interesting challenge. Ideally, there should be several layers of structure to be understood, like in the example of a “picture of an apple”, where understanding the file encoding is not the only thing one can do.
I have posted my file here https://www.lesswrong.com/posts/BMDfYGWcsjAKzNXGz/eavesdropping-on-aliens-a-data-decoding-challenge.
Interesting, I expected the solution to be simpler, and I’m surprised you found it this quickly given that it’s this complex.
This suggests a different question. For non-participants who are given the program which creates the data, what probability/timeframe to assign to success.
On this one I think that I would have put a high probability to be solved but would have anticipated a longer timeframe.
https://en.wikipedia.org/wiki/Kolmogorov_complexity
The fact that the program is so short indicates that the solution is simple. A complex solution would require a much longer program to specify it.
Kolmogorov Complexity isn’t the right measure. I’m pretty sure this program is way simpler (according to you or me) than most uniformly sampled programs of equal KC that produce a string of equal length.
And i wouldn’t consider it short given how hard it would be to find a “typical” program with this KC.
Why do you say that Kolmogorov complexity isn’t the right measure?
I am worried that you might have this backwards?
Kolmogorov complexity describes the output, not the program. The output file has low Kolmogorov complexity because there exists a short computer program to describe it.
Let me rephrase then: most strings of same length and same KC will have much more intuitively complex programs-of-minimum-length that generated them.
Why is the program intuitively simple? Well for example, suppose you have the following code in the
else
block:a.append(min(max(int(i+a[i]),2),i)
i += 1
I think the resulting program has lower length (so whatever string it generates has lower KC), but it would be way harder to find. And I bet that a uniformly sampled string with the same KC will have a program that looks much worse than that. The kinds or programs that look normal to you or me are a heavily skewed sample, and this program does look reasonably normal.
I don’t think this follows—your code is shorter in python but it includes 3 new built in functions which is hidden complexity.
I do agree with the general point that KC isn’t a great measure of difficulty for humans—we are not exactly arbitrary encoders.
Why do you think that “would be way harder to find”? (My intuition goes exactly the other way.)
Eh. I agree with both replies that the example was bad. I wanted to do something unintuitive with pointer arithmetic, but the particular code probably doesn’t capture it well, so it may in fact be easier to find. (Still totally stand by the broader point though.)
Lovely!
Might be an obvious thought, but why did the aliens send us this gibberish. Is there an interpretation of this that makes sense?
Even more obviously, why do aliens adhere to the IEEE 754 standard? My interpretation is that the cryptanalyst from the post has indeed been pranked by their friend at NASA.