Eavesdropping on Aliens: A Data Decoding Challenge

This is my reply to faul_sname’s claim in the “I No Longer Believe Intelligence To Be Magical” thread:

In terms of concrete predictions, I’d expect that if we

  1. Had someone² generate a description of a physical scene in a universe that runs on different laws than ours, recorded by sensors that use a different modality than we use

  2. Had someone code up a simulation of what sensor data with a small amount of noise would look like in that scenario and dump it out to a file

  3. Created a substantial prize structured mostly³ like the Hutter Prize for compressed file + decompressor

we would see that the winning programs would look more like “generate a model and use that model and a similar rendering process to what was used to original file, plus an error correction table” and less like a general-purpose compressor⁴.

...

If you were to give me a binary file with no extension and no metadata that is

  1. Above 1,000,000 bytes in size

  2. Able to be compressed to under 50% of its uncompressed size with some simple tool like gzip (to ensure that there is actually some discoverable structure)

  3. Not able to be compressed under 10% of its uncompressed size by any well-known existing tools (to ensure that there is actually a meaningful amount of information in the file)

  4. Not generated by some tricky gotcha process (e.g. a file that is 250,000 bytes from /dev/random followed by 750,000 bytes from /dev/zero)

then I’d expect that

  1. It would be possible for me, given some time to examine the data, create a decompressor and a payload such that running the decompressor on the payload yields the original file, and the decompressor program + the payload have a total size of less than the original gzipped file

  2. The decompressor would legibly contain a substantial amount of information about the structure of the data.

Here is such a file: https://​​mega.nz/​​file/​​VYxE3T5A#xN3524gW4Q68NXK2rmYgTqq6e-2RSaEF2HW8rLGfK7k.

  1. It is 2 MB in size.

  2. When I compress it with zip using default settings, I see it’s ~1.16 MB in size. or a little over 50%.

  3. I have tried to compress it to smaller sizes with various configurations of 7zip and I’ve been unable to get it significantly smaller than 50% of the file size.

  4. This file represents the binary output of a fake sensor that I developed for the purpose of this challenge.

    1. There is no “gotcha” on what type of sensor is used, or what the sensor is being used to detect. It will be obvious if the data is decoded correctly.

    2. I have not obfuscated the fundamental design or mechanism by which the sensor works, or the way that the sensor data is presented. The process to decode the data from similar sensors is straightforward and well documented online.

    3. The data is not compressed or encrypted.

Unfortunately, the data in this file comes from eavesdropping on aliens.

Background on LV-478

It was a long overdue upgrade of the surface-to-orbit radio transmitter used for routine communication with the space dock. However, a technician forgot to lower the power level of the transmitter after the upgrade process, and the very first transmission of the upgraded system was at the new (and much higher) maximum power level. The aliens accidentally transmitted an utterly mundane and uninteresting file into deep space.

The technician was fired.

Hundreds of years later...

Background on Earth

Astronomers on Earth noticed a bizarre radio transmission from a nearby star system. In particular, the transmission seemed to use frequency modulation to carry some unknown data stream. They were lucky enough to be recording when this stream was received, and they are therefore confident that they received all of the data.

Furthermore, the astronomers were able to analyze the transmission and assign either a “0?-modulated” or “1?-modulated” value. The astronomers are confident that they’ve analyzed the frequencies correctly, and that the data carried by the transmission was definitely binary in nature. The full transmission contained a total of 16800688 (~2 MB) binary values.

The astronomers ran out of funding before they could finish analyzing the data, so an intern in the lab took the readings, packed the bits together, and uploaded it to the internet, in the hopes that someone else would be able to figure it out.

This is the exact Python code that the intern used when packing the data:

def main():
    bits = None
    with open("data.bits", "rb") as f:
        bits = f.read()

    print(len(bits))

    byte_values = []
    byte_value = 0
    bit_idx = 0
    for bit in bits:
        byte_value |= (bit << bit_idx)
        bit_idx += 1
        if bit_idx % 8 == 0:
            byte_values.append(byte_value)
            byte_value = 0
            bit_idx = 0

    with open("mystery_file_difficult.bin", "wb") as f:
        f.write(bytearray(byte_values))

if __name__ == "__main__":
    main()

Challenge Details

Decode the alien transmission in the file uploaded at https://​​mega.nz/​​file/​​VYxE3T5A#xN3524gW4Q68NXK2rmYgTqq6e-2RSaEF2HW8rLGfK7k.

  1. This is a collaborative challenge.

  2. The challenge ends on August 27th, or when someone can explain what was sent in the alien’s unintentional transmission, whichever occurs first.

  3. I will award points for partial credit at the end of the challenge for any correct statements that describe the alien’s hardware/​software systems.

  4. The points are not worth anything.