Here is an example of a large file that:
I don’t see the file, but again if you want to run this the mentioned dates from the other thread work best.
I am not exactly sure what you mean by the phrase “derive structure but not meaning”—a couple possibilities come to mind.
Let’s say I have a file that is composed of 1,209,600 bytes of 0s. That file was generated by taking pressure readings, in kPa, once per second on the Hubble Space Telescope for a two week period. If I said “this file is a sequence of 2^11 x 3^3 x 5^2 x 7 zeros”, would that be a minimal example of “deriving the structure but not the meaning”? If so I expect that situation to be pretty common.
If not, some further questions:
Is it coherent to “understand the meaning but not the structure”?
Would it be possible to go from “understands the structure but not the meaning” to “understands both” purely through receiving more data?
If not , what about through interaction
If not, what differences would you expect to observe between a world where I understood the structure but not the meaning of something and a world where I understood both?
I don’t think this algorithm could decode arbitrary data in a reasonable amount of time. I think it could decode some particularly structured types of data, and I think “fairly unprocessed sensor data from a very large number of nearly identical sensors” is one of those types of data.
I actually don’t know if my proposed method would work with jpgs—the whole discrete cosine transform thing destroys data in a way that might not leave you with enough information to make much progress. In general if there’s lossy compression I expect that to make things much harder, and lossless compression I’d expect either makes it impossible (if you don’t figure out the encoding) or not meaningfully different than uncompressed (if you do).
RAW I’d expect is better than a bitmap, assuming no encryption step, on the hypothesis that more data is better.
Also I kind of doubt that the situation where some AI with no priors encounters a single frame of video but somehow has access to 1000 GPU years to analyze that frame would come up IRL. The point as I see it is more about whether it’s possible with a halfway reasonable amount of compute or whether That Alien Message was completely off-base.