harfe comments on Contest: An Alien Message

harfe 28 Jun 2022 14:06 UTC
74 points
28
Solution:
```
#!/usr/bin/env python3

import struct

a = [1.3, 2.1, -0.5] # initialize data
L = 2095 # total length
i = 0

while len(a)<L:
    if abs(a[i]) > 2.0 or abs(a[i+1]) > 2.0:
        a.append(a[i]/2)
        i += 1
    else:
        a.append(a[i] * a[i+1] + 1.0)
        a.append(a[i] - a[i+1])
        a.append(a[i+1] - a[i])
        i += 2

f = open("out_reproduced.bin","wb") # save in binary
f.write(struct.pack('d'*L,*a)) # use IEEE 754 double format
f.close()
```
Then one can check that the produced file is identical:
```
$ sha1sum *bin
70d32ee20ffa21e39acf04490f562552e11c6ab7  out.bin
70d32ee20ffa21e39acf04490f562552e11c6ab7  out_reproduced.bin
```
edit: How I found the solution: I found some of the other comments helpful, especially from gjm (although I did not read all). In particular, interpreting the data as a sequence of 64-bit floating point numbers saved me a lot of time. Also gjm’s mention of the pattern a, -a, b, c, -c, d was an inspiration. If you look at the first couple of numbers, you can see that they are sometimes half of an earlier number. Playing around further with the numbers I eventually found the patterns a[i] * a[i+1] + 1.0 and a[i] - a[i+1]. It remained to figure out when the a[i]/2 rule applies and when the a[i] * a[i+1] + 1.0 rule applies. Here it was a hint that the numbers do not grow too large in size. After trying out several rules that form bounds on a[i] and a[i+1], I eventually found the right one.
What links here?
- Alien Message Contest: Solution by DaemonicSigil (13 Jul 2022 4:07 UTC; 29 points)
- Challenge: A Much More Alien Message by kman (28 Jun 2022 21:50 UTC; 24 points)
- dkirmani 28 Jun 2022 14:35 UTC
  14 points
  6
  Parent
  Replicated!
- anonymousaisafety 28 Jun 2022 16:26 UTC
  12 points
  15
  Parent
  I have mixed thoughts on this.
  I was delighted to see someone else put forth an challenge, and impressed with the amount of people who took it up.
  I’m disappointed though that the file used a trivial encoding. When I first saw the comments suggesting it was just all doubles, I was really hoping that it wouldn’t turn out to be that.
  I think maybe where the disconnect is occurring is that in the original That Alien Message post, the story starts with aliens deliberately sending a message to humanity to decode, as this thread did here. It is explicitly described as such:
  From the first 96 bits, then, it becomes clear that this pattern is not an optimal, compressed encoding of anything. The obvious thought is that the sequence is meant to convey instructions for decoding a compressed message to follow...
  But when I argued against the capability of decoding binary files in the I No Longer Believe Intelligence To Be Magical thread, that argument was on a tangent—is it possible to decode an arbitrary binary files? I specifically ruled out trivial encodings in my reasoning. I listed the features that make a file difficult to decode. A huge issue is ambiguity because in almost all binary files, the first problem is just identifying when fields start or end.
  I gave examples like
  1. Camera RAW formats
  2. Compressed image formats like PNG or JPG
  3. Video codecs
  4. Any binary protocol between applications
    Network traffic
    Serialization to or from disk
    Data in RAM
  On the other hand, an array of doubles falls much more into this bucket
  data that is basically designed to be interpreted correctly, i.e. the data, even though it is in a binary format, is self-describing.
  With all of the above said, the reason why I did not bother uploading an example file in the first thread is frankly because it would have taken me some number of hours to create and I didn’t think there would be any interest in actually decoding it by enough people to justify the time spent. That assumption seems wrong now! It seems like people really enjoyed the challenge. I will update accordingly, and I’ll likely post my example of a file later this week after I have an evening or day free to do so.
  - blf 28 Jun 2022 18:39 UTC
    1 point
    0
    Parent
    Your interlocutor in the other thread seemed to suggest that they were busy until mid-July or so. Perhaps you could take this into account when posting.
    I agree that IEEE754 doubles was quite an unrealistic choice, and too easy. However, the other extreme of having a binary blob with no structure at all being manifest seems like it would not make for an interesting challenge. Ideally, there should be several layers of structure to be understood, like in the example of a “picture of an apple”, where understanding the file encoding is not the only thing one can do.
    - anonymousaisafety 24 Jul 2022 4:37 UTC
      1 point
      0
      Parent
      I have posted my file here https://www.lesswrong.com/posts/BMDfYGWcsjAKzNXGz/eavesdropping-on-aliens-a-data-decoding-challenge.
- Rafael Harth 28 Jun 2022 15:33 UTC
  9 points
  0
  Parent
  Interesting, I expected the solution to be simpler, and I’m surprised you found it this quickly given that it’s this complex.
  - Bucky 28 Jun 2022 19:55 UTC
    5 points
    1
    Parent
    This suggests a different question. For non-participants who are given the program which creates the data, what probability/timeframe to assign to success.
    
    On this one I think that I would have put a high probability to be solved but would have anticipated a longer timeframe.
  - anonymousaisafety 28 Jun 2022 15:44 UTC
    3 points
    0
    Parent
    https://en.wikipedia.org/wiki/Kolmogorov_complexity
    The fact that the program is so short indicates that the solution is simple. A complex solution would require a much longer program to specify it.
    - Rafael Harth 28 Jun 2022 16:09 UTC
      5 points
      0
      Parent
      Kolmogorov Complexity isn’t the right measure. I’m pretty sure this program is way simpler (according to you or me) than most uniformly sampled programs of equal KC that produce a string of equal length.
      
      And i wouldn’t consider it short given how hard it would be to find a “typical” program with this KC.
      - anonymousaisafety 28 Jun 2022 16:33 UTC
        1 point
        0
        Parent
        Why do you say that Kolmogorov complexity isn’t the right measure?
        most uniformly sampled programs of equal KC that produce a string of equal length.
        ...
        “typical” program with this KC.
        I am worried that you might have this backwards?
        Kolmogorov complexity describes the output, not the program. The output file has low Kolmogorov complexity because there exists a short computer program to describe it.
        Rafael Harth 28 Jun 2022 17:29 UTC
        2 points
        0
        Parent
        Let me rephrase then: most strings of same length and same KC will have much more intuitively complex programs-of-minimum-length that generated them.
        
        Why is the program intuitively simple? Well for example, suppose you have the following code in the else block:
        
        a.append(min(max(int(i+a[i]),2),i)
        
        i += 1
        
        I think the resulting program has lower length (so whatever string it generates has lower KC), but it would be way harder to find. And I bet that a uniformly sampled string with the same KC will have a program that looks much worse than that. The kinds or programs that look normal to you or me are a heavily skewed sample, and this program does look reasonably normal.
        Bucky 28 Jun 2022 18:49 UTC
        5 points
        0
        Parent
        
        I think the resulting program has lower length (so whatever string it generates has lower KC)
        
        I don’t think this follows—your code is shorter in python but it includes 3 new built in functions which is hidden complexity.
        
        I do agree with the general point that KC isn’t a great measure of difficulty for humans—we are not exactly arbitrary encoders.
        gjm 28 Jun 2022 18:07 UTC
        4 points
        1
        Parent
        Why do you think that “would be way harder to find”? (My intuition goes exactly the other way.)
        Rafael Harth 28 Jun 2022 19:18 UTC
        5 points
        2
        Parent
        Eh. I agree with both replies that the example was bad. I wanted to do something unintuitive with pointer arithmetic, but the particular code probably doesn’t capture it well, so it may in fact be easier to find. (Still totally stand by the broader point though.)
- gjm 28 Jun 2022 15:06 UTC
  7 points
  2
  Parent
  Lovely!
- Alex Vermillion 28 Jun 2022 15:09 UTC
  2 points
  0
  Parent
  Might be an obvious thought, but why did the aliens send us this gibberish. Is there an interpretation of this that makes sense?
  - dkirmani 28 Jun 2022 15:16 UTC
    10 points
    0
    Parent
    Even more obviously, why do aliens adhere to the IEEE 754 standard? My interpretation is that the cryptanalyst from the post has indeed been pranked by their friend at NASA.