Some thoughts on how to solve the Voynich

Date: 15/02/2018Author: Koen 20 Comments

I mostly write about Voynich images, and must often conclude that multiple explanations remain plausible, and that it’s impossible to know the full intentions of figures this unusual and complex without understanding the accompanying text. So while I believe the imagery is of great historical value and its study is useful, I agree with the general sentiment that understanding the text should remain our ultimate goal.

In this post I will discuss three types of attack on the text and their respective flaws and strengths. It won’t contain much of my own research, but rather a synthesis of and commentary on what others have published.

The methods are the following:

Label reading
Computer attacks
Block paradigm

1. Label reading

By “label reading” I mean the attempted reading of single words, whether these are formatted as labels or not. Attempting to read a label with an image or the first word of a paragraph is essentially the same strategy. The most famous example of label reading is probably Stephen Bax’ work, but there are others walking similar paths (see for example Ruby Novacna’s blog).

It’s the technique I used to rely on myself when I first started writing about the VM, but I see that my latest post with an attempted label reading is already one and a half year old (and still I feel quite new to this): Monkey Business. Since then I haven’t payed much attention to label reading, but I plan to revisit my initial attempts with a fresh look soon.

The idea is to start small. You select an image and a word which is supposed to name the image, and take it from there. Let’s say we think the plant pictured below is broccoli. Did broccoli exist at the time and place this image could be made? (It did, apparently). What were names for broccoli in various languages? Which of those names could the label encode?

Advantages

The main advantage of label reading is that we can use the images and labels in the manuscript without the absolute need for external sources. By which I mean that, even if the Voynich is unique and has no relation to surviving manuscripts, we still have an avenue of attack. If the text is related to the images and the manuscript can be read, label reading might eventually get us there.

Disadvantages

Oh, where to begin? Even though I am a fan of the strategy myself, I must admit that there are so many problems associated with it. I think I’ll need a bullet list…

Uncertainty whether the selected word actually describes the image.
- Are labels really labels? Or do they add some property of the item? Or something else?
- When selecting a word from a paragraph, you don’t even know whether you’ve picked the right one.
Uncertainty about the language. Many people presume Latin, but there are a thousand other possibilities, and even in the large European languages names for plants vary or are loanwords.
Uncertainty about what the image depicts. If my broccoli is actually a tree, the entire exercise is futile and even misleading, and it can set you on the wrong path for months.
Requires lots of research to do properly, for just one word
And finally, no attempt so far has proved scalable.

The last point deserves some more explanation. Several researchers have attempted to read labels and assign sound values to individual glyphs, but if you use those readings on a whole paragraph the result is gibberish in every single language. In other words, it’s possible to propose a reading of one word, or selected words throughout the manuscript, but this has never allowed any researcher to convincingly read even a single sentence. This is a criticism of others’ attempts at label reading as well as my own past efforts.

However, the apparent lack of scalability is in my opinion not inherent to the method. It may just imply that so far nobody has succeeded in reading the glyphs the correct way. Hence, I don’t believe criticism on label reading itself is justified; it should be possible to correctly identify an image and read its label, and gradually increase your scope. The criticism should be focused on potential misinterpretation of the image and unlikely glyph readings.

2. Computer attacks

I’ll keep this segment short since computer-assisted strategies are not my area of expertise. It must be mentioned, however, since they are often employed. A recent example is the news reports on AI being used to “uncover the mystery of an ancient manuscript“. This mystery-uncovering has been debunked by Nick Pelling before it even got up to speed in mainstream media, as such things go.

I don’t have to talk about the advantages of using computers as assistants in manuscript studies, code breaking etc. Some might say advanced statistics and computing power are indispensable, and, who knows, it may one day be an AI who solves the Voynich text.

But as we stand now, there are still many problems. For example, Marco Ponzi recently compared various algorithms to sort out which Voynich glyphs are vowels (based on expected alteration patterns). However, a reaction by JKP highlights what I also believe to be the most common difficulty in this matter: parsing. In order to feed the VM text to a computer, we must transcribe it somehow, and that is where we must inevitably make choices.

And there’s our catch 22: before we can properly enlist the help of computers, we must know how to feed the text to them. But before we can do that, we must understand the text better, which is what we’d like the computers’ help with in the first place.

I do believe that perhaps a true AI can help us a lot, but this AI does not exist yet, and making it would first require a ton of human effort.

We, as “interpreters” stand between the VM and computers, but the VM itself may resist an all too rigid machine approach as well. My intuitive approach of the script two years ago was that it contains plenty of abbreviations which must be developed based on one’s intuitive understanding of the sentence and familiarity with the vocabulary. Think of the “bench” character which has various possible meanings in normal medieval texts, or generic ending abbreviations. These are easy for trained humans to understand, but can prove to be extremely problematic for machines.

3. Block paradigm

The block paradigm, based on established cryptological practices, was introduced to Voynich studies by Nick Pelling. We recently did an interview with him, where he explains in detail how it works (click the link for his full explanation in the video).

In code breaking generally there are two types of attack: one where you try to attack the system (working backwards). You can also do a forward attack; if you find out what the plaintext is, you can work forwards from there to the ciphertext.

In other words, if you have the plaintext and the corresponding section in the Voynich, you can discover the method that was used to turn one into the other. Nick discusses quire 20 as an example. Since each page consists of a number of star-labelled paragraphs, one could try to find other medieval sources with similar lists.

For a while I was convinced that I was on the trail of a 15th century recipe book that was going to be the same as quire 20. I pursued it, but in the end I don’t believe that there is a good version of it.

Like all strategies, the block paradigm has advantages and disadvantages.

Advantages

As Nick says, the advantage of the block paradigm is that everybody should agree the method will work. Imagine that something really unusual was done to the Voynich text, something so unique or far-fetched that we will never understand it by ourselves. Even in such a case, the block paradigm will help us out. We’d have the starting text and the end result (the corresponding VM “block”). All that would be left to do is figure out how they went from A to B, and then see if this method applies to the whole manuscript.

Even if the VM were uncrackable in isolation, the block paradigm would still be able to hand us a key. It’s immensely powerful and an almost guaranteed Voynich (block)buster.

Problems

The problem with the block paradigm is that it’s like looking for a needle in a haystack which may not have any needles in it. Countless medieval texts have been lost, and like Nick says in the interview, others are incomplete, scattered, unreliably copied… Yes, the ancients were eager copyists, but they were also pragmatic copyists, especially when “scientific” subjects were involved. Texts would often undergo some form of alteration to suit the user’s needs.

And that’s assuming we even find a block to compare a piece of VM text to. Nick tried using the layout as a guideline, without much success so far. Another route is of course to look for parallels for the imagery, which brings us back to the problems mentioned in the “label reading” method. The block paradigm also assumes that the VM is an altered or encrypted version of an existing text, but there is always the possibility that people were writing in Voynichese directly.

Bottom line: if the contents of the VM text is unique in the record, we will not be able to use the block paradigm method and our block will remain a holy grail in more senses than one (in that it doesn’t exist). In order to do successful test based on the block paradigm, you have to win the proverbial lottery first, if at all possible.

Conclusion

Brute force computer analysis can help us gain insights in the text, and it has its uses as an auxiliary tool. I have great admiration for people like Marco Ponzi who produce one fascinating graph after the other. But before AI can really solve the Voynich for us (which is our ultimate goal), lots of human effort will be required.

On the opposite end of the scale there is label reading, which might work but demands a decent understanding of the imagery. The same can be said about the block paradigm, which presents the additional condition that a suitable text exist in the first place. Still, I think Pelling is right to imply that we should all have our feelers out for a suitable block, since finding one would make our job a lot easier.

But for now, since I can’t build AI’s, I keep studying the imagery. It’s likely our best basis for understanding the text.

20 thoughts on “Some thoughts on how to solve the Voynich”

Add Comment

EMSmith says:

15/02/2018 at 08:38

The block paradigm is a worse version of label reading.

LikeLiked by 1 person

Reply
1. Koen says:
  
  15/02/2018 at 09:08
  
  Haha! I must say that a similar thought crossed my mind. One is like a nimble version of the other. Sometimes I try to be diplomatc though.
  
  LikeLike
  
  Reply
  1. EMSmith says:
    
    15/02/2018 at 21:13
    
    I see these kinds of problems as having three parts: 1) we don’t know the script, 2) we don’t know the language, and 3) we don’t know the content. The goal is to keep two of these parts as static as possible while you work at the third.
    
    Label reading does this by proposing that a single word has a meaning which can be read translingually. Hence why proper names are so successful, and why plant or star names might work. If f13r really does show a banana plant, then you’re going to find a reflex of ‘musa’ or ‘banana’ somewhere on that page.
    
    The block paradigm fails because it attempts to match lots of words to lots of words. Even if we know that a certain text is on a page, we’re still not sure which word is which. Worse, we still don’t know what language we’re working with. I can assure you that the first page of the Bible will say, “in the beginning, god made the heaven and the earth,” but I can’t tell you which of the nearly 700 translations it might be. (Of course, Abimelech will always be the son of Gideon.)
    
    LikeLiked by 1 person
    
    Reply
    1. Koen says:
      
      15/02/2018 at 23:26
      
      Yes, you’re right. That’s somewhat reassuring actually 🙂 Plant names almost always spread over a large area and sometimes the world’s name for a spice or fruit is limited to two or three different roots.
      
      That won’t work for common herbs though, since those have folk names everywhere. But I’d be surprised if the VM is all common herbs..
      
      LikeLike
      
      Reply
2. nickpelling says:
  
  16/02/2018 at 22:47
  
  The fact that there are 100+ Voynich theories out there that all start by “reading labels” before quickly descending into utter nuttiness would seem to indicate that label reading is a terrible way to begin.
  
  The block paradigm is a particular form of plaintext attack, a technique which was extensively used during WW2 (and with great success).
  
  LikeLike
  
  Reply
  1. Koen says:
    
    16/02/2018 at 22:51
    
    If we all start skateboarding we’d probably fall and break something, or at least look stupid. But that doesn’t mean that skateboarding is impossible.
    
    LikeLike
    
    Reply
  2. nickpelling says:
    
    16/02/2018 at 22:56
    
    Koen: it looks more like jumping naked over a cliff than skateboarding to me. Feel free to shout “geronimo!” as loudly as you like… it won’t help. 😉
    
    LikeLike
    
    Reply
  3. EMSmith says:
    
    17/02/2018 at 17:44
    
    Didn’t the WWII codebreakers know the target language?
    
    LikeLike
    
    Reply
  4. nickpelling says:
    
    17/02/2018 at 18:39
    
    Emma: that’s why I describe the block paradigm approach as being a specific kind of plaintext attack. The central point of finding a matching block from a parallel text would be to identify a mapping between the structure of one and the structure of the other. If we can do that, then I believe that we stand a high chance of moving from there to understand the
    
    For what it’s worth, I would tend to agree with Marco Ponzi that much of what we see in the Voynich text may well have no parallel text. However, those few other pages – for example, a block of text accompanying a Sagittarius crossbowman, or a magic circle, or whatever – that came from somewhere else (however rare or hard to find) are what I believe gives us our best chance of decrypting Voynichese.
    
    Hence your criticisms of the block paradigm as being somehow linguistically naive are a bit wide of the mark: the necessary first stage is actually to find a structural match, not a grammatical or linguistic match.
    
    LikeLike
    
    Reply
  5. nickpelling says:
    
    17/02/2018 at 18:41
    
    […to understand the text itself as a further stage.]
    
    LikeLike
    
    Reply
    1. Koen says:
      
      17/02/2018 at 20:52
      
      It’s true that judging by the images and the variety of scientific themes we’d expect the VM to go back on a variety of sources. If we can assume the same for the text, this does increase our chances of finding some extant document that’s of relevance.
      
      LikeLike
      
      Reply
D.N. O'Donovan says:

15/02/2018 at 10:09

Koen,
I try to stay out of things that are not my area, as you know, but a couple of times – really, just twice in the past six years or so, I’ve come to a point where I felt it sure enough to think sharing the information wouldn’t be a waste of others’ time.

Label-reading doesn’t strike me as being necessarily a bad idea, but I’d suggest those trying it think long and hard about the discrepancy you mention – that is, the difference between the language of a name and that informing the remaining text.

Place-names and star-names are often garbled versions of a different language. About plants, as you say, if a person’s in an area where there are foreign plants such as bananas (for example) they need to know what to call them in the local market, and labels might then cover the range of languages covered by the range of the content.

In that context, I thought the difficulties experienced by Poggio Bracciolini very telling, given his position and the range of authority and resources available to him.

The post which I wrote on that subject isn’t among the publicly-available ones at present, but here’s the principal source I used, in case others are interested.

Lluch, R.S., ‘Translators, Interpreters and Cultural Mediators in Late Medieval Western Iberia and Western Islamic Diplomatic relationships’. (Paper delivered to the 10th MRM meeting, 24-27th March, 2009).

LikeLike

Reply
Marco Ponzi says:

15/02/2018 at 19:42

Thank you for your kind words, Koen. A good thing of quantitative approaches (whether computer-based or not) is that they produce objective data. For instance, Emma’s results should be of interest also to those who prefer a cryptological approach over a linguistic one. It’s hard work, but it can only move forward.

My recent post you mentioned had the main objective of making Jacques Guy’s 1991 papers known. Guy is referenced in Hulden’s 2017 paper and, in the field of computational linguistics, when a work is still relevant after 26 years it is likely good. Hulden’s algorithm, applied to six different VMS transcriptions (Currier, First Study Group, EVA, CUVA, Neal, Bennett), consistently points to some characters behaving like vowels (EVA:a,e,o,y). These characters were already identified as the most likely vowels by Guy in 1991: I have only added different transcriptions and tested different algorithms. I think that considering different options is a reasonable way to cope with uncertainty, but I am open to suggestions for better approaches. If the alternative is doing nothing until we are 100% sure of the transcription to use, how are we going to make progress?

About the Block Paradigm: as presented here, it seems like a version of the Rosetta stone. I guess that, if the VMS corresponds to a text available in an ordinary language and an ordinary script, then it is not encrypted but translated (like the Rosetta stone was). As Prof.Bowern suggested on the voynich.ninja forum, why should someone want to encrypt Dioscorides or Mattheus Platearius? But translations of a well-known text are typically accompanied by illustrations similar to those of the original (see the various Latin, Greek and Arabic Dioscorides, or illustrated bibles or Ovid in different languages), while the Voynich illustrations are mostly unique.

Basically, I believe the manuscript must be one of these two:
1) The product of an “exotic” culture which possibly had no previous writing system (the idea of Stephen Bax, and still my opinion)
2) An encrypted original work (like the 1420 illustrated encrypted manuscript by Giovanni Fontana, BSB Cod.icon. 242, “Bellicorum instrumentorum liber cum figuris”).
In both cases, finding a “block” elsewhere seems nearly impossible.

I believe that a compromise between a label-based and a block-based approach is what Darren Worley called “structured data sets”, such as the sets of labels in diagrams (including T-O “maps”).
https://stephenbax.net/?p=1476#comment-162576
These sets of labels appear to refer to something homogeneous (zodiac signs, months, continents, elements, etc. depending on the number of elements in the set). Even if the text is completely original, the “cosmological” section seems likely to include lists of these entities, in one form or the other.

LikeLike

Reply
1. Koen says:
  
  15/02/2018 at 23:19
  
  Hi Marco
  A bit after posting this I realized I should have distinguished more between quantitative approaches in general and full-blown AI solutions. Obviously I’m always happy to see people produce the kind of insights I can’t, whether computer assisted or otherwise. The point I really wanted to get across is that we’re still very far away from computers ‘solving’ the MS. And maybe that it may not work without a human’s soft touch 🙂
  
  The difference between “encoded” and exotic is something I’ve been thinking a lot about. I maintain that it can’t be a complex code because of its relatively large size.
  
  I think the MS contains traditional material which did *not* reach us by a traditional way of transmission. So along the way it became something new. I have always seen it as a cultural product, by which I mean that nobody tried to hide, obscure or fabricate anything, but lately I’d also consider the possibility that someone obtained this rare material and copied it in a way that made it impossible to consult for others. That still wouldn’t be my first choice though.
  
  But whatever it may be, the result for us will be the same. A simple cipher may be as hard to understand as a “natural” transcription in a script we don’t know. Both can do funky things to the way sounds are represented compared to what we’re used to.
  
  A possibility I rank very low is that the VM is an obscured version of a work that is known to us in its original state. So I agree that the block paradigm is unlikely to help us out; there may be no needles in the haystack.
  
  LikeLike
  
  Reply
2. nickpelling says:
  
  16/02/2018 at 22:53
  
  Marco: surely the presence of the Sagittarius crossbowman is a direct indication that we may well be looking at something that is (at least in part) a derivative work? In that scenario, the block paradigm would be exactly the right way to proceed – after all, we would probably only need a single block match to solve the mystery.
  
  LikeLike
  
  Reply
R. Sale says:

16/02/2018 at 00:51

We can’t read a thing in the VMs. Either we attack the language – clueless, or we look to the illustrations for guidance.

Mr. Pelling has suggested his block paradigm as a possible method of investigation. This is only a potential solution paradigm, not an example of a solution paradigm because it has yet to identify the corresponding parts of the comparison.

Why not look at a comparison where we have the corresponding parts. An example of this would be the cosmic comparison of Ms. Velinska. [VMs f68v and Oresme’s BNF Fr. 565] This shows a high degree of structural similarity despite distinct differences in appearance between the two illustrations. One way to interpret the visual discrepancies is that they are intended to disguise the source of the VMs representation.

Another possible solution paradigm that has numerous examples in the VMs Zodiac section is the pairing paradigm. Look at all the visual pairings in the medallions of the first five houses of the Zodiac and in the tub patterns. Does the idea of pairing contribute in any way to finding a solution?

Intentional visual disguise also plays a part in VMs White Aries. If you can match the blue-striped tub patterns with the Fieschi armorial insignia, then you have another example of pairing. Pairing exists in the form of two version of Stolfi’s makers on White Aries. And an abundance of pairs and triplets exist in the words found in the outer, circular band of text. Here is a segment of text where both sequence and pattern of repetition can be applied. This still requires finding the necessary plain text segment for comparison, but when found, it should be obvious.

This method is using a specific segment of text ostensibly proffered by the author as a guide to further investigation. The method of resolution of this segment may or may not apply elsewhere.

LikeLike

Reply
EMSmith says:

17/02/2018 at 21:44

I can’t reply directly to one of Nick’s comments, so I’ll have to add it here.

Nick said:
“Emma: that’s why I describe the block paradigm approach as being a specific kind of plaintext attack. The central point of finding a matching block from a parallel text would be to identify a mapping between the structure of one and the structure of the other. If we can do that, then I believe that we stand a high chance of moving from there to understand the text itself as a further stage.”

I still don’t see how this improves the position from saying “some pages are astronomical diagrams” and seeking matches between star labels and star names. Star names themselves are a ‘block’ of meaningful content. Yet they have the benefit of being stripped of grammatical structure and are often translingual.

The block paradigm adds complexity, not reduces it, by asking that we handle lots of text that can vary wildly from language to language.

LikeLike

Reply
1. nickpelling says:
  
  17/02/2018 at 23:18
  
  Emma: it might appear pragmatic to assume that those ‘astro’ pages contain astronomical diagrams containing star labels, yet doing this has achieved precisely nothing in terms of revealing anything at all of what Voynichese is doing.
  
  Yet I demonstrated more than a decade ago that at least some of the layouts of many of the pages were duplicated from an earlier (almost certainly also vellum) manuscript. So there is a reasonably good probability that what we are seeing on the pages of the Voynich Manuscript in some way retains the block structure of its immediately preceding document(s).
  
  To be precise, I freely admit I don’t believe that the exact document(s) from which it was in some way copied still exists. However, I think there is a high chance that at least some of those source pages were in turn copied from other sources: and so if we can identify a block even as small as a paragraph (or perhaps even as little as a line) from those other sources, we stand a chance of making progress.
  
  Hence I’d instead argue that the block paradigm reduces complexity by avoiding the whole thorny issue of having to start from a known language and grammar: a single match to a single source document is all we need to solve the mysteries solidly forward (rather than sketchily backward).
  
  LikeLike
  
  Reply
D.N. O'Donovan says:

23/02/2018 at 02:51

“And by God’s help I did learn the Chamanian language, and the Uigurian character; which language and character are commonly used throughout all those kingdoms or empires of the Tartars, Persians, Chaldseans, Medes, and of Cathay.” 1338 AD

LikeLike

Reply
D.N. O'Donovan says:

23/02/2018 at 04:35

also – Marco Ponzi (I hope you can read this alright) – the rhetorical question asked on your forum by Prof.Bowern is one which has been uttered so very often, by so very many people, that it is (in the longer perspective) platitudinous and to name him is not only ‘unnecessary’ – to use the favourite disincentive of mutual friends – but if it is intended to suggest Bowern himself expects to be credited with it, potentially embarrassing to him as well as suggesting a surprising lack of knowledge on your part of things said by earlier and other scholars. (But of course you have limited time, and have to be selective: so do we all).

Personally, it is hardly interesting to see the same old observations and rhetorical questions than it would be to see some seriously ask whether or not there has ever been a time before 1438 when possession of a text about plants would have been enciphered? Was there any class of people prohibited from owning books? Prohibited from practicing medicine?

Come to think of that, I’d like to see someone prove that the Voynich botanical pictures have anything to do with materia medica. It’s another of those false analogies presumed as long ago as 1912 and allowed first to pass, and then to harden, without challenge, investigation or reasoned debate. As so very very many ‘Voynich facts’ prove to be.

LikeLike

Reply

	Koen on Can we sort plants in Herbal A…
	nickpelling on Can we sort plants in Herbal A…
	Koen on Can we sort plants in Herbal A…
	D.N. O'Donovan on Can we sort plants in Herbal A…
	For 600 years the Vo… on A map of Swallowtail Merl…
	bachheim55 on F116v: most likely readings ba…
	Koen on F116v: most likely readings ba…
	D.N. O'Donovan on F116v: most likely readings ba…
	Monika Pal-Stumpp on F116v: most likely readings ba…
	Koen on Bonsai Perspective
	D.N. O'Donovan on Bonsai Perspective
	D.N. O'Donovan on Bonsai Perspective
	Koen on F116v: most likely readings ba…
	Monika Pal-Stumpp on F116v: most likely readings ba…
	Koen on F116v: most likely readings ba…

The Voynich Temple

"The day will end, and Phoebus will bathe his weary horses in the deep, before my words can do justice to all that has been translated into new forms."

Some thoughts on how to solve the Voynich

1. Label reading

2. Computer attacks

3. Block paradigm

Conclusion

20 thoughts on “Some thoughts on how to solve the Voynich”

Leave a comment Cancel reply

1. Label reading

2. Computer attacks

3. Block paradigm

Conclusion

Share this:

Related

20 thoughts on “Some thoughts on how to solve the Voynich”

Leave a comment Cancel reply