Translating languages that haven't been understood since ancient times typically requires intensive linguistic research. It turns out that neural networks can do the job.
What’s new: Researchers at MIT CSAIL and Google Brain devised an algorithm that deciphers lost languages. It’s not the first, but it achieves state-of-the-art results across a variety of tongues.
Key insight: The new approach identifies cognates, words in different languages that have the same meaning and similar roots. Cognates follow consistent rules, such as:
- Related characters appear in similar places in matching cognates.
- The vocabulary surrounding cognates is often similar, since they have the same meaning.
How it works: The new method is based on a mapping between cognates in an unknown language and a known language.
- The mapping begins at random.
- A sequence-to-sequence LSTM network draws its ground truth from the map.
- Given a word in a lost language, the LSTM tries to predict the spelling of that word in the known language.
- The map is updated (using a mathematical structure, commonly used in operations research, known as a flow network) to minimize the distance between predicted spellings and actual words in the known language.
- The network and map bootstrap one another as they converge on a consistent mapping.
Results: The new approach outperforms previous methods on Ugartic, a relative of Hebrew, translating cognates with up to 93.5 percent accuracy. It also sets a new state of the art in translating the proto-Greek script Linear B into Greek, spotting cognates with 84.7 percent accuracy.
Why it matters: Previous translation algorithms for lost languages were designed specifically for a particular language. The new method generalizes to wildly dissimilar tongues and achieves stunningly high accuracy.
Takeaway: Historically, specialists labored for decades to decipher the thoughts encoded in lost languages. Now they have a general-purpose power tool. We look forward to ancient secrets revealed.