If someone wanted to continue this project to really rigorously find out how well Bing AI can generate Voynichese, here is how I would do it:
1. Either use an existing VMS transcription or prepare a slightly-modified VMS transcription that ignores all standalone label vords and inserts a single token such as a comma [,] to denote line breaks and a [>] to denote section breaks. There are pros and cons each way. The latter option would have the disadvantage of being slightly less familiar to Bing AI compared to what is in its training data, but it would have the advantage of representing line and section breaks, which may be important if you want to investigate whether Bing AI can reproduce statistical phenomena like the “Line as a Functional Unit” or gallows characters appearing more frequently at the start of sections.
2. Feed existing strings of Voynich text into Bing AI (or some other LLM) systematically starting from the beginning of the VMS to the end in chunks that are as big as the context window can allow. Record what Bing AI puts out.
3. Compile Bing AI’s outputs into a 2nd master transcription. Analyze Bing AI’s compendium for things like: Zipf’s Law, 1st order entropy, 2nd order entropy, curve/line “vowel” juxtaposition frequences (a la Brian Cham), “Grove Word” frequences, probabilities of finding certain bigrams at the beginnings or endings of words, ditto with lines, etc. (The more statistical attacks, the better).
4. See how well these analyses match when applied to the original VMS.
5. Compile a second Bing AI-generated Voynich compendium, and a third, and a fourth, and a fifth, and see if the statistical attacks come up the same way again.
There are probably ways to automate this that people smarter than me could figure out.
Matthew. AI will never be able to decipher the text of MS −408. Handwriting is a very complex substitution. I also asked the bot and wrote to him that the text is a substitution. The AI replied. I know what substitution is. But if I don’t know the key, I can’t decipher the handwriting.
So the important thing is to give the AI a key. The key is written on the last page of manuscript 116v.
In addition, the entire text of the manuscript is written in the Czech language. As written by the author and on his website. (sheets of parchment).
If someone wanted to continue this project to really rigorously find out how well Bing AI can generate Voynichese, here is how I would do it:
1. Either use an existing VMS transcription or prepare a slightly-modified VMS transcription that ignores all standalone label vords and inserts a single token such as a comma [,] to denote line breaks and a [>] to denote section breaks. There are pros and cons each way. The latter option would have the disadvantage of being slightly less familiar to Bing AI compared to what is in its training data, but it would have the advantage of representing line and section breaks, which may be important if you want to investigate whether Bing AI can reproduce statistical phenomena like the “Line as a Functional Unit” or gallows characters appearing more frequently at the start of sections.
2. Feed existing strings of Voynich text into Bing AI (or some other LLM) systematically starting from the beginning of the VMS to the end in chunks that are as big as the context window can allow. Record what Bing AI puts out.
3. Compile Bing AI’s outputs into a 2nd master transcription. Analyze Bing AI’s compendium for things like: Zipf’s Law, 1st order entropy, 2nd order entropy, curve/line “vowel” juxtaposition frequences (a la Brian Cham), “Grove Word” frequences, probabilities of finding certain bigrams at the beginnings or endings of words, ditto with lines, etc. (The more statistical attacks, the better).
4. See how well these analyses match when applied to the original VMS.
5. Compile a second Bing AI-generated Voynich compendium, and a third, and a fourth, and a fifth, and see if the statistical attacks come up the same way again.
There are probably ways to automate this that people smarter than me could figure out.
Matthew. AI will never be able to decipher the text of MS −408. Handwriting is a very complex substitution. I also asked the bot and wrote to him that the text is a substitution. The AI replied. I know what substitution is. But if I don’t know the key, I can’t decipher the handwriting.
So the important thing is to give the AI a key. The key is written on the last page of manuscript 116v.
In addition, the entire text of the manuscript is written in the Czech language. As written by the author and on his website. (sheets of parchment).
So the AI needs a key.