Marco Ponzi kindly pointed out that there was a mistake in my previous entropy posts. It was did not invalidate the results, but was still enough to noticeably change the numbers. However, instead of rewriting the previous posts with corrected numbers, I decided to do over the experiment, improving it at the same time.
What was this about again?
One of the main problems we have with the text of the Voynich manuscript is that its characters are relatively easy to predict – more so than in any language we know of. We say that its conditional character entropy (h2) is too low. One part of this problem (though certainly not all!) is caused by the fact that the commonly used EVA transliteration system splits certain glyphs into separate strokes. For example, the “bench” glyph becomes EVA [ch]. This makes that the c+h pairing suddenly becomes extremely predictable, dragging down h2.

The example above would be [chain] in EVA, while it could just represent two glyphs, the “bench” and something that looks like a loopy “m”. Since such stroke groups are common, it is possible that EVA decreases h2 by using multiple letters to represent them. This is not a criticism of EVA, I like it and the fact that everybody can use it to communicate about the MS, but rather of the fact that not all researchers are aware of EVA’s impact on statistics.
While I started by mapping the impact of EVA, my interest then shifted to common n-grams (groups of glyphs) in general. Because even if we factor out potentially “introduced” problems like [ch], there are still many common n-grams like [ol], [dy] etc, which are harder to interpret as single glyphs because they consist of clearly separate parts.
Imagine that [o] modifies the next glyph. So for example, [t] is “t”, but [ot] is “d”. This hypothesis assumes that Voynichese is a verbose cipher, whereby one glyph in the source text is represented by a group of glyphs in the ciphertext. While I do not wish to argue that Voynichese is a verbose cipher, it is still interesting to see how entropy changes if we reverse this hypothetical process.
Improved experiment – setup
My goal is to increase h2, preferably up to the magical barrier of 3.00, while at the same time keeping h1 in check. I do this by replacing common n-grams like [aiin] with a single new glyph, for example [A]. If you keep doing this, you can of course increase character entropy up to a point where it matches word entropy since each word (type) will be a unique glyph. This is why it is important to keep an eye on h1 as well, which makes the experiment much more tricky; some replacements increase h1 by much more than they increase h2, so those are to be avoided.
While in previous posts I did this with trial and error based on an n-gram frequency list Marco sent me, I now opted for a more systematic approach, and I wanted to take more variables into account:
- What is the impact of the transliteration file used? Takahashi (TT) vs. Zandbergen-Landini.
- Do different sections behave differently? I will test Herbal A, Herbal B, Q13 and Q20.
- Does the way we treat benched gallows make a difference?
That last point deserves some explanation. Since benches (EVA-ch and -sh) are almost certainly glyphs that are “split” by EVA, I want to merge them already in the original files to be modified. Merged benches should be part of the “zero” state. But this leaves us with a problem for benched gallows, which consist of a bench glyph and a gallow glyph on top of each other.
EVA splits these into three parts: half bench + gallow + half bench, e.g. [cth]. But since benches have been merged, [c] and [h] are now gone. There are generally two ways to tackle this problem. One is “unstacking”, preferred by for example Nick Pelling and myself. I had somewhat arbitrarily chosen to rewrite them as bench + gallow, but in response Nick argued that gallow + bench is the better order. I will follow Nick’s advice. The other way is to represent benched gallows as their own, unique characters, as preferred by for example Marco Ponzi and Emma May Smith.
Because all these questions required me to test and edit a lot of different files, I faced the inevitable conclusion that I needed to learn how to automate these tasks in Python. This worked for the most part, and Marco was kind enough to help me fix the last issues with the code. Given an input text file, the script generates a series of new files, each with a different n-gram merged. The n-grams to be merged are simply based on a list of the most frequent n-grams, up to n=4. Benches are merged in the input already, so they count as one.
Running this script on a base file results in a series of new files, each with one of the most frequent n-grams merged to a single glyph. By running Nablator’s Java code on these files, I was able to calculate the impact of each different transformation.
What matters?
I prepared four different starting files: Both Takashi and ZL, both with benched gallows unstacked and benched gallows rewritten as unique glyphs. It appears that the impact of the transliteration used is very small. H2 for ZL files differs on average a mere 0.005 with the corresponding TT files. The impact of the way benched gallows were treated was five times as large, with an average difference of 0.028 between both methods.
This is a relief, because it means that at least for these very broad statistics, the two most commonly used transliterations are similar enough. Therefore, this post will henceforth focus on the TT files. Interpreting benched gallows in different ways does make a difference, but this was to be expected: rewriting benched gallows introduces new glyphs, increasing h2.

Do the differences between various sections matter? Apparently, they do. In the graph below, black dots are the original, unmodified EVA files. Red dots have had benches merged and benched gallows replaced by unique characters. Finally, green dots have benches merged and benched gallows unstacked.

There’s a lot to unpack in this graph. Q13, Q20 and HB follow a similar pattern, where both modified versions gain h2 and lose h1 (which is exactly what we want). Herbal A breaks this pattern: its version where benched gallows are replaced by unique characters (red dot) gains a lot of h2, but also takes on an increase in h1.
I wonder whether this is a consequence of the “language”; Q20, Q13 and HB are all in Currier’s B-dialect, while HA is, as its name suggests, in the A-dialect. Still, Currier language does not explain everything, since Q13 distances itself from the others with its notoriously low entropy values.
So in summary, what matters?
- Section? Yes: A and B languages might show different behavior and Q13 has much lower entropy values than the rest.
- Treatment of benched gallows? Yes, especially the impact on h1 is significant.
- Transliteration file used? No, not for these broad statistics.
Since I find the differences between sections the most interesting, I will focus on this aspect to limit the amount of variables. We will start the experiment with the version where benched gallows have been unstacked in Nick Pelling’s favored manner. At the end I will check if there is a big difference with the version where benched gallows were replaced with unique characters.
What am [i]?
When I ran the script for a first test, I noticed a few things. The following graph is probably hard to read, but it is enough as an illustration for now.

Each section is a color: blue for HA, red for HB, yellow for Q13, green for Q20. The height of the bar represents the increase in h2 after the transformation, minus the increase in h1. In other words, how much does the dot move to the top left in the scatter plot, which is what we want. Tall bars represent glyphs that appear to have been extremely redundant.
The graph is sorted by increasing values for Herbal A, which also allows us to see where the B-sections differ. Q13 presents itself as kind of an extreme version of the B-language, with tall peaks at [edy, qo, qok]. These differences can be explained to a large extent by frequency: if an n-gram is much more numerous in one section, it can have a larger impact on that section. Still, frequency does not explain everything, since also some frequent n-grams have low values. But this graph certainly shows why it is important to treat sections separately.
By far the tallest bars for most sections are those replacements involving [ai], [in], or some combination of these. Therefore, I will first sort out the [i]-clusters separately. Like the benches, I feel like their effect on entropy is in part explained by of the way we represent Voynichese. Of course, the fact that [i]-clusters are always at the end of words is an inherent property of Voynichese, but the way we section those clusters in minims might lower entropy in an artificial way.
But what is an [i]-cluster?
7195 words in the ZL transcription contain one or more i-characters. Of those, 6043 end with [in]. In 6788 cases, the [i]’s are immediately preceded by [a]. This is why [ain] and [aiin] are so common. Only two other bigrams involve [i] and occur more than 100 times: [oi] and [ir].

Differently put, 78% of VM words containing [i] end either in [ain] or [aiin]. It might be a surprise to some readers that [a] is even more frequently connected to the [i]-cluster than [n] is. The only other combination that seems to matter is [air] with 8%. Next are [oiin] and [aiiin], which don’t have a great impact with 2% each.
Now don’t get me wrong, the fact that [oiin] appears 176 times must be relevant somehow. But not so much for this particular study, which is mostly concerned with frequent phenomena. Therefore, I will reduce the [i]-problem to [ain, aiin] and, like benches, replace these separately beforehand.
Target
My intention is to increase h2 (conditional character entropy) while keeping h1 in check. Looking at the corpus of medieval texts I collected, I established a threshold for h1. For the level of h2 I am expecting to reach, I should really stay below:
max h1 = 4.20
For h2, I aim for the magical limit of 3.00, but this will probably be impossible. In my previous entropy post I did reach a value above 3.00, but as Marco pointed out, it contained two mistakes:
- I used the entire MS instead of just one section. Given the different nature of the sections, it is easier to reach high h2 on the full MS.
- I made a mistake in formatting the text which also somewhat increased h2.
So this is what we are aiming for: get h2 as close as possible to 3.00, while keeping h1 below 4.2.
Each of the following sections has had the following n-grams rewritten before running the tests:
ch, sh, ain, aiin
The following plot should clarify these choices. The many black dots are various medieval texts. The colored dots in the bottom left are the four selected Voynich quires. I have circled the original, unmodified EVA versions. With benches, ain and aiin “fixed”, you can see that they get a fairer starting position already, since all dots shift to the top left.
I know from experiences that reaching the green line will be hard. Therefore, the red line at 4.2 seems like a reasonable limit for h1. After this point, h1 tends to increase without raising h2, which offers no prospect of reaching any acceptable level.

Herbal A
For Herbal A, I reached the following values:
h1 = 4.20
h2 = 2.94
Sequence: qok, chol, chor, che, chy, ol, cho, or, qot, ar, eey, al, qo
This is closer than I thought to the “dream destination” of 3.00, but still not quite normal. I am surprised by the amount of [ch], and that several versions of [cho] were eliminated. It may be worth pointing out that the VM scribes often write [cho] as a ligature, and may indeed have thought of it as one unit. But this is something that should be studied separately.
If we apply this exact series of changes to the B-sections, we see how it affects them differently. They do move in the right direction, but Herbal A (purple dot) benefits more. It remains to be seen whether a different series will be better for the other sections. Things are not looking good for Q13.

Herbal B
For Herbal B, I reached the following values:
h1 = 4.15
h2 = 2.82
Sequence: ar, qok, ol, al, qot, or, edy, she, che, ok, qo
Q20
For Q20, I reached the following values:
h1 = 4.19
h2 = 2.78
Sequence: qok, edy, qot, ar, eey, al, she, che, chol
Q13
Now comes the VM’s worst crime against entropy: Q13. This section’s vocabulary is the most limited by far, and its characters are the most predictable. These are the values I attained:
h1 = 4.18
h2 = 2.73
Sequence: qok, edy, qot, ol, ar, al, ot, or, ok, eey, she, che
The final graph looks like this:

In the original EVA, Q13 is the outlier while the other sections are relatively close together. The modified versions, however, group by Currier language; it was much easier to get close to h2 = 3.00 with Herbal A than with any of the B-sections, even though I spent a lot of time trying to squeeze the most out of each section individually.
Comparison of selected n-grams
I obtained these values by repeatedly running a Python script on the files, and each time checking which operations produced the best offspring. Ideally I would have liked to create a version of each possible combination, but this is impossible given the exponential growth. (Once I accidentally ran the script four times in a row and I ended up with over 60,000 text files, or 14 gb, which took several minutes to delete).
The n-grams included in the script were the following:
ol, ok, ot, or, od, qo, qot, che, ee, cho, ed, ey, al, ar, eo, she, chol, chor, qok, chy, sho, da, dy, edy, eedy, eey
While selecting these, I needed to draw the line somewhere in order not to get overwhelmed. Therefore, I excluded common n-grams with a gallow in the middle, like [oke]. It is possible that including these may improve results, but this seemed too fanciful even for my taste.
The following graph shows which n-grams were changed where. Note that the order of operations may be relevant; this graph does not take those into account, but the section-specific titles above in this post do. Before anything else, benched gallows were unstacked in the fashion preferred by Nick Pelling, that is gallow-bench.
Marked in green are those replacements which were widely required: ch, che, sh, ain, aiin, qot, qok, al, ar, ol, or, eey.

Herbal A seemed to benefit most from additional replacements involving [ch]: chol, chor, cho, chy. This came as a surprise to me, but I think something can be said for at least [cho] being a unit.
The replacements she and edy (blue) were required in all B-sections, but not in Herbal A.
A whole group of replacements (marked in red) were not selected by the system: od, ee, ed, ey, eo, sho, da, dy, eedy. We can think of it like [edy] was preferred as a unit, excluding dy, ed, and eedy. In previous experiments, I already learned that e-based replacements like [ee, eo] don’t tend to give great results.
I was planning to check what the difference would be if benched gallows became unique characters, but this post already took me way too long. Since this expands and corrects the previous entropy posts, these are no longer valid. I will edit them and refer to this new post.
A final note: this is just one way to do this, and I do not want to propose this as any kind of solution. The point I want to make is that the information value of Voynichese words might be higher than the “unmodified” EVA h2 suggests. Researchers should take this into account when, for example, dismissing the possibility that Voynichese can contain real language based on these grounds.
I can certainly believe that there may be significant differences and distinctions between Herbal A and all the B-sections, but I find it hard to believe that the entire writing systems of Dialect A and Dialect B are fundamentally different.
One detail in your results that strikes me as illogical is the fact that [qok] and [qot] appear to function as single units in all sections, but [ok] and [ot] only appear to do so in certain B-sections. Frankly it is hard to believe that the three-glyph sequences [qok] and [qot] are single units when they appear together, but the two-glyph subsequences [ok] and [ot] are not single units but rather each consist of two individual units [o] + [k] and [o] + [t].
I am curious to know, how significantly would the entropy statistics be affected in all sections, if [ok] and [ot] were everywhere treated as single units? Perhaps this does not produce the absolute maximum conditional entropy in each section, but if it does not affect conditional entropy too significantly, it may well still be a more plausible hypothesis, due to the greater overall logical consistency of the resulting system.
LikeLike
This is good analysis, well done. 🙂
Me, I’m not at all surprised by the differences between the sections, I’ve been saying for years that treating the whole manuscript as if it had a uniform language is statistically misleading.
And yet some elements (such as the or/ar/ol/al cluster) do seem to thread through the whole system. What can you do, eh?
Like Geoffrey, I’m wondering where results like this leave o + gallows pairs, simply because it’s almost impossible for me to believe that o and a can be letters in a verbose cipher covertext only some of the time. That’s basically what I argued in Curse in 2006, anyway, and the point still stands. 🙂
While this is all good work, I can’t help but feel that we aren’t yet asking the right questions of the text: we’re machinery-rich, but question-poor.
LikeLike
I should add that your analysis has reminded me how suspicious I am about chol and chor in Herbal A. There’s some properly funny business going on there, and we haven’t really put our collective finger on it yet.
LikeLike
The question of looking for a system is certainly a valid one, though I intentionally avoided this. On the one hand, this was a lot of work already as it was, and I had to set some boundaries to maintain my sanity 🙂
But more importantly, I really wanted to see what came out if I just looked at the stats and nothing else, to avoid any kind of confirmation bias. I only “discovered” the list of replacements at the end, when I looked at the “winning” file.
Of course this can be taken one step further by cleaning things up a bit, aiming for more consistency. One thing this approach begs for is to entirely get rid of certain original EVA glyphs. For example, it is possible that [i] isn’t really a thing. And if there is a verbose cipher, [o] may not represent a separate sound either, and so on.
This would lower h0 and h1, but it would also involve looking at the details, while this approach is entirely based on frequent phenomena that have a huge impact on the stats. When certain [o]-combinations are not included, it may simply be because they were not frequent enough in the given section to have a noticeable impact. For this particular exercise, it seemed more important to me to list the operations that had the most effect, rather than create the longest possible list that doesn’t screw up the stats.
LikeLike
Thank you, Koen! What you have done is well thought and clearly presented. You also managed to keep the complexity at a moderate level, so that everything is easy to follow.
Your second chart (the one with the black, red, green dots) is a great visualization of the smoothness of the Currier A-B drift. That phenomenon of course is major obstacle for any serious approach to Voynichese. As Rene pointed out on the forum, for verbose ciphers one must add the huge search-space for possible replacements and the fact that this or similar methods would unavoidably misclassify some frequent bigrams in the underlying language as verbose tokens. The result is that the verbose-cipher path is immensely complex. If one also wants to think of spaces as possibly non-significant and/or of some of the characters as nulls, you easily arrive at something that looks close to hopeless. This could be a good area for AI methods: maybe they will be able to explore the trillions of different combinations for each Voynich section, together with the many candidates for the underlying language.
Thankfully, researchers like Nick and Bowern-Lindemann believe that Voynichese words likely correspond to words in the underlying language: this leaves a line of research that is independent from the intricacies of character entropy. Though of course word-level research is also affected by the dialect-drift and riddled with problems of its own.
LikeLiked by 1 person
I was thinking while performing this experiment that I was manually doing what an “AI” wild do: perform a modification and continue with the best results for the next step. But the sheer number of possibilities made that this was a lot of work to do manually. Generating over 10,000 txt files is one thing, but calculating their entropy values and selecting the most desirable ones is something else. And to keep things manageable I already omitted larger gallow patterns like [oke]. For a true verbose cipher approach, these should have been included as well (and may have allowed me to reach 3.00).
In the current result, I see a continuum from “reasonable modifications to EVA for entropy calculations” to “verbose cipher”. Merging benches is the best example of the first category. Then come [ain] and [aiin]. These are more tricky already since combinations like [oin, air] also exist. Something like [ok] is in the last category since these are clearly separate characters, so they would be a digraph at best.
But for me, this is not about presenting a definite “verbose cipher” theory, it is more for the kinds of things I learn along the way.
LikeLike
Splendid work, Koen. I absolutely agree with you, by the way. I’ve done a comprehensive symbolic study of the rosettes page and the concept is absolutely brilliant. It defies belief that anyone responsible for it – the care, attention to detail disguised as sloppiness, clever puzzles and puns, careful avoidance of all triangles, squares and pentagrams, but still managing to convey their actual structure – would write or translate the text into the choppy, simplified, one-word transcriptions we have so far seen. This is a purposeful cipher text. I do believe we’ll find a verbose plaintext.
I wanted to mention something about the benched gallows. One symbol that comes up time and again, sometimes almost hidden, is the shepherd’s crook. Davidsch on Voynich.nu did a spectogram of the first page figure and there is a very clear crook beside a sort of invisible being with possible clothes and feet. I know why it’s there but it’s too long to explain here. (It appears again on the last page – as a gnome (n)! I will say the crook relates to the elongated c of the bench and possibly the a if you tuck the stem down. Both are connectors within different systems, symbolic of the head and mouth of the ecliptic dragon,to indicate the alphabet never ends. You’ll see that or those, c and i, symbols (the i has a bridge, no + signs then), between every 9 and o repetition.
But long story short: has anyone looked at the strange symbol just before, not after, the 9? It is a shepherd’crook holding up three fours. Our number 13 in the zodiac symbol of the rosettes I believe. 1 + 12. Head and tail. Draconis, ouroboris, divine nature.
But if you set the crook sideways, it becomes the first c of the benched gallows. We have three half-benched gallow characters come before the 9 in that one chaotic looking sign. Pretty sure.
And the other half of the bench is the lightening strike with the dot I think. This one’s more of a stretch but it’s very similar to our backwards S, on its back with a 9 rising out of it in 57v”s 2nd circle.. The o rising from last c is for clockwise, the 9 is for counter. These are in fact our direction ladies in the centre of 57v, one with her ” s” nose, the other with her “crook” neck holding up a circle (she’s actually also Saturn holding up a circle, a symbol for the pentagram, so when I talk about grid lines, you know what I’m referring to). S faces counter, r with o faces clock.
Pretty evident too, if I’m right about these, that maybe when you hit 9 or c i (first time round becomes an “a) in that sequence, you might need to read the alphabet backwards (to the first T? Or end bench?) to get a fuller complement of letters.
The alphabet is not in an abc order, btw. a seems to represent the horizontal axis, elongated c the vertical. They both possibly stand for alphabet letter a, and the full bench ae, the same as ar might be ae. Character groups such as OX mark the beginning and end of a grid line, or 8 (a) r do the same. The five usual endings, 9, r, x, s, d( an 8 but swirled back for that sequence) are the end points in one permutation of my pentagram, and I think indicate which gridline to follow because the values of the letters will possibly change depending on the end letter.
Anyway, this is what I’m working on. It takes so much time! When I finally publish my symbolic analysis of both the text symbols and the rosette symbols, I want to have given the text pattern a good shot too. But you know, because of that gnomon pun, I’m very afraid this will turn into a Fibonacci sequence – divine nature -and not just a sun dial.
Anybody out there good with geometry? I might need help with angles.
Full support, Koen. I hope my experiments with gridlines help not hinder your own analysis, which gives me a lot of food for thought and certainly helps me as I scramble behind our originator. I hope you give me permission to quote for my upcoming paper.
LikeLike
The section information is interesting. Is there any trend in your controls as to subject matter or type of writing (fiction vs. nonfiction; biblical vs. secular; time period of source text? author similarity?) that could help explain the differences? Can any of your controls be broken into sections to see if this amount of entropy difference is common with change of subject or is more likely due to differences in something unique to the VM (perhaps differing cipher approaches, etc)? Thanks for sharing these results and doing the work.
LikeLike
The only trend you see is really the language (to some extent) but more so the writing system. All major outliers in the control are non-Latin. I also have a lot of Greek. The problem is that I don’t know enough about more exotic writing systems to judge the quality of the text (i.e. to what extent does it reflect something that could have been written in the Middle Ages?)
It is an interesting question to see what the difference may be when comparing sections from the same work about different subjects. My prediction is that any differences will be tiny. But I may be wrong, entropy is tricky to predict. I will test this tomorrow and post the results on the forum 🙂
LikeLike