Digital Rosetta Stone

Published

Jul 10, 2019

Reading time

1 min read

Translating languages that haven't been understood since ancient times typically requires intensive linguistic research. It turns out that neural networks can do the job.

What’s new: Researchers at MIT CSAIL and Google Brain devised an algorithm that deciphers lost languages. It’s not the first, but it achieves state-of-the-art results across a variety of tongues.

Key insight: The new approach identifies cognates, words in different languages that have the same meaning and similar roots. Cognates follow consistent rules, such as:

Related characters appear in similar places in matching cognates.
The vocabulary surrounding cognates is often similar, since they have the same meaning.

How it works: The new method is based on a mapping between cognates in an unknown language and a known language.

The mapping begins at random.
A sequence-to-sequence LSTM network draws its ground truth from the map.
Given a word in a lost language, the LSTM tries to predict the spelling of that word in the known language.
The map is updated (using a mathematical structure, commonly used in operations research, known as a flow network) to minimize the distance between predicted spellings and actual words in the known language.
The network and map bootstrap one another as they converge on a consistent mapping.

Results: The new approach outperforms previous methods on Ugartic, a relative of Hebrew, translating cognates with up to 93.5 percent accuracy. It also sets a new state of the art in translating the proto-Greek script Linear B into Greek, spotting cognates with 84.7 percent accuracy.

Why it matters: Previous translation algorithms for lost languages were designed specifically for a particular language. The new method generalizes to wildly dissimilar tongues and achieves stunningly high accuracy.

Takeaway: Historically, specialists labored for decades to decipher the thoughts encoded in lost languages. Now they have a general-purpose power tool. We look forward to ancient secrets revealed.

Subscribe to The Batch