EDIT 19 October 2020: part of this post is based on faulty values. The points made remain the same, but do not rely on the exact values for further calculations.

In this series of posts, I have been tracking the effects of various manipulations on the character entropy of a given transliteration of Voynichese. This started out with practical intentions, but moved towards a more experimental approach.

Entropy Hunting and Entropy Hunting II were mainly intended to chart the effects of the popular transliteration system EVA on character entropy. For example, EVA represents the “bench” glyph as [ch], while in the manuscript it appears as one character. Since inexperienced researchers tend to assume that EVA = Voynichese, I felt like it was necessary to show the impact of these choices.

The third post, Entropy Hunting III: most frequent n-grams is similar, but with a different purpose. Here, I attempted to merge glyphs in such a way that h2 became as close to “normal” as possible, while still keeping h1 in check. While in the first posts, I tried to select the most reasonable mergers, the third post simply tries to optimize the statistics. So the question shifted from “what are caveats when using EVA?” to “what if we treat Voynichese like a verbose cipher?”.

Having moved into full-fledged cipher territory, the next step is to think about nulls (“empty” characters that are inserted to fool the reader or to pad out lines, or other characters without a sound value). My intention is not to detect which characters are most likely to be nulls – for this I refer to Nick Pelling, who seems to suspect line-initial EVA [s]. Rather, I simply wish to present the effect of various characters being nulls on entropy.

There should not necessarily be a consistant correlation between nulls and entropy – the effect depends on how the supposed nulls are distributed. For example, if I insert [0] in a text in completely random and varied places, the conditional character entropy (h2) will rise because the behavior of this new character is unpredictable. On the other hand, if I insert [0] at the beginning of every word or only after [a], predictability will increase and h2 will drop.

Let me illustrate this with an example. I took a short chapter of some 500 words from Pliny’s Natural History in Latin. Then I selected the frequency of the vowel [o] as reference: 150. For one file, I inserted 150 nulls [0] at random, making sure to position them after various letters. For the other, I inserted the numeral [0] after every [o] vowel by simply find-and-replacing [o] with [o0]. These represent two extremes: “random” nulls versus structured nulls. I am still using Nablator’s Java code for entropy calculations.

As expected, inserting 150 nulls at random raises h2 by a considerable 0.13, while consistently inserting a null after each of the 150 [o]’s lowers h2 by a similar amount.

The slippery slope of Voynich cipher thinking

Manipulating a Voynichese transliteration in order to optimize one metric, may have the undesired effect of messing up others. An obvious issue with nulls is that eliminating them would further decrease Voynichese’s character inventory (h0) which is already low. Especially when we consider only frequent characters. (Of course this is only an issue when a character is considered to be a null in all contexts, removing it entirely from the set of meaningful characters).

But if one were to go “full cipher”, then this issue might in fact solve another one. In the previous post in this series, I introduced new characters by merging frequent glyph pairs, thereby increasing h0. The effect was that characters like [o] would end up modifying the following character. So if one were to combine modifiers and nulls, then h0 might balance out.

[Side note: I don’t know what the expected h0 would be for a medieval manuscript, counting abbreviation symbols, ligatures and other positional glyph variations. This is why I usually don’t even mention the stat.]

However, this leaves us with yet another problem: word length. As Marco Ponzi wonderfully explained, my glyph merging would leave Voynichese with an average word length of 3.5, which is similar to Vietnamese and way too short for a European language. Now remove some potential nulls, and you are left with something too short for any language. Which, as usual, leads to spaces, and the question whether Voynichese words are really words at all. This is one of the reasons why I prefer to see this series of posts as experiments exploring statistics rather than potential solutions.

Pliny nulls

To get a better feel for the effect of removing characters, let us first return to Pliny’s Latin. I made a number of files where each time one character was eliminated. So Latin without [i], without [a] and so on. This allowed me to graph the impact on h2 of each removal compared to the base file.

In the above graph, we see that removing [c] decreases h2 for Latin. Apparently [c] is an unpredictable character, and removing it makes the characters more predictable overall. This appears to be the case for most characters. On the other end of the spectrum, [q,e,u,i] cause entropy to increase with their respective removal. This makes sense, since they appear in predictable combinations like [qu] and [iu].

What I wondered though, is whether I should correct for frequency. Imagine a rare glyph like [y] in Latin. Removing it will barely have any effect, even though it may be exceptionally predictable or erratic. This is why I made the next graph, by dividing the difference in h2 by the glyph’s frequency in the sample. What it shows is the increase/decrease in h2 per removed token.

This appears to work as expected, somewhat rearranging the bars and pushing glyphs with special behavior to the edges. Note how [q], probably the most predictable character, is now correctly indicated as an outlier.

Voynich nulls

How do Voynich entropy values react if we remove each glyph as if it were a null? To test this, I started with the TT transliteration and applied only two modifications: unstack benched gallows to bench-gallow and then merge the benches [ch] and [sh] to [1] and [2] respectively. The situation of [a] and [i] is more complex, so I left those unaltered.

Decrease or increase in h2 after removing the glyph
Decrease or increase in h2 per removed token

Overall the charts for Latin and Voynichese look similar, with some glyphs lowering h2 upon removal and others raising it. However, for Latin most glyphs reduce h2, while the balance for Voynichese is more even. Moreover, the increases in h2 are much greater for Voynichese.

Below is a table with the numbers for anyone who is interested.

counth0h1h2h2/h1diffdiff/count
base4.393.802.110.55
e73844.323.712.120.570.0140.0000019
o66914.323.702.120.570.0170.0000025
a47994.323.692.220.600.1170.0000244
y47904.323.692.180.590.0750.0000156
i40334.323.692.160.580.0490.0000121
d36324.323.692.080.56-0.029-0.0000080
ch36124.323.692.070.56-0.041-0.0000115
k34464.323.692.120.580.0180.0000051
l29704.323.702.080.56-0.024-0.0000081
n21624.323.712.180.590.0740.0000341
r21074.323.712.130.570.0220.0000103
q18804.323.712.100.57-0.008-0.0000044
t18374.323.722.100.57-0.004-0.0000021
sh11164.323.742.090.56-0.014-0.0000123
p5814.323.762.080.55-0.026-0.0000447
s5264.323.762.090.55-0.022-0.0000418
m3264.323.772.110.560.0060.0000187
f1084.323.792.100.55-0.009-0.0000867

Combination

Over at the Voynich.ninja forum, Geoffrey Caveney wondered what would happen to entropy if first glyphs were combined as described in the previous post, and then [y] removed as a suspected null. As a reminder, what I did was replace the n-grams [ch, sh, ain, aiin, aiiin, air, am, ar, al, or, qok, qot, qo, ol, ot, ok, od] by a single new character. I must add again that this was already slightly more experimental than what I would usually be comfortable with.

If on top of that we assume that one of the remaining characters might be a null, the result is quite spectacular. I can best show this by adding them to the scatter plot I used in the previous post.

The grey dots represent Medieval texts in various European languages. There is a thick cluster around h2 = 3.3. But below that, between 3.0 and 3.3 there are still plenty of texts as well. The VM dots all fall within this second range. The black dot is the “original merge” version, where I tried to increase entropy by merging common n-grams. Red is OM with [d] removed. This has no effect on h2. Removing only [e] raises h2 to a respectable 3.11.

Removing only [y] performs best, raising h2 to 3.20. If you look only at h2 and h1, the VM is now a perfectly normal text, but of course there are plenty of other stats to consider, word length being a major one. Still, I must say Geoffrey was correct in assuming [y] would have the largest impact upon removal.

What this means, if anything, is another question. Entropy is not the best way to detect nulls, unless you know for sure that the nulls were applied in a very consistent way. These numbers should be treated with caution.