The secrets of history are locked in troves of handwritten documents. Now a machine learning platform is making them amenable to digital search.
What’s new: Transkribus, a program developed by the University of Innsbruck, is transcribing centuries-old records en masse and making them available to scholars worldwide. The system has rendered letters between the Brothers Grimm, manuscripts by English philosopher Jeremy Bentham, and Amsterdam’s city archives.
How it works: Since handwriting varies so much, the platform trains a bespoke transcription model for each individual scribe.
- The system’s key component is a handwriting recognition module that correlates a writer’s scrawl with letters of the alphabet. The recognizer stacks LSTM layers atop convolutional layers, Transkribus co-creator Günter Mühlberger told The Batch.
- To train a model, users manually transcribe and upload around 15,000 words in any language — that’s 50 to 100 pages — penned by the target author. After training, they can upload new documents by the same author for automatic transcription.
Behind the news: Transkribus launched in 2015 as a collaboration between 17 archives, universities, and research groups across Europe. Some 45,000 users have trained 7,700 models so far.
Why it matters: Most optical character recognition approaches perform poorly on the millions of handwriting styles represented in historical archives. By transcribing and making these documents searchable and sortable, machine learning is helping to deepen our understanding of past people and events.
We’re thinking: This platform could also be a gift to amateur historians with shoeboxes full of their forebears’ diaries, documents, and love letters.