As we’ve all discovered, the data is most productively viewed as a sequence of 2095 8-byte blocks.
The eightth byte in each block takes the values 64, 63, 192, and 191. 64 and 192 are much less common than 63 and 191.
The seventh byte takes a value between 0 and 16 for 64⁄192 rows, weighted to be more common at the 0 end of the scale. For 63⁄191 rows, it takes a value between ??? and 256, strongly weighted to be more common at the 256 end of the scale (the lowest is 97 but there’s nothing special about that number so the generator probably has the capacity to go lower and just never exercised it during the generation process).
I agree with gjm that at least the first six bytes should probably be read as little-endian fractions. The first ten lines are variations on “x/5, expressed in little-endian hexadecimal, with the last digit rounded”: notice how there’s an ‘a’ in the early 999 row and a ‘d’ in the early ccc row, but no equivalent for the early 000 or 666 rows. And applying this reasoning to the rest of the data gets a lot of very neat fractions . . . for the first 80 rows or so, after which things rapidly degenerate.
I can’t find any more correlations between these features, at least on a per-row basis. Between rows . . . there are a lot of ‘pairs’ of rows in which there’s a 63-row followed by an otherwise-identical 191-row (or 191 by 63, or 64 by 192, or 192 by 64). These pairs are usually but not invariably seperated from the next pair by at least one unpaired row.
Misc. notes:
As we’ve all discovered, the data is most productively viewed as a sequence of 2095 8-byte blocks.
The eightth byte in each block takes the values 64, 63, 192, and 191. 64 and 192 are much less common than 63 and 191.
The seventh byte takes a value between 0 and 16 for 64⁄192 rows, weighted to be more common at the 0 end of the scale. For 63⁄191 rows, it takes a value between ??? and 256, strongly weighted to be more common at the 256 end of the scale (the lowest is 97 but there’s nothing special about that number so the generator probably has the capacity to go lower and just never exercised it during the generation process).
I agree with gjm that at least the first six bytes should probably be read as little-endian fractions. The first ten lines are variations on “x/5, expressed in little-endian hexadecimal, with the last digit rounded”: notice how there’s an ‘a’ in the early 999 row and a ‘d’ in the early ccc row, but no equivalent for the early 000 or 666 rows. And applying this reasoning to the rest of the data gets a lot of very neat fractions . . . for the first 80 rows or so, after which things rapidly degenerate.
I can’t find any more correlations between these features, at least on a per-row basis. Between rows . . . there are a lot of ‘pairs’ of rows in which there’s a 63-row followed by an otherwise-identical 191-row (or 191 by 63, or 64 by 192, or 192 by 64). These pairs are usually but not invariably seperated from the next pair by at least one unpaired row.