An ambitious company is using deep learning to extract and find associations from all the information on the internet — and it isn’t Google.
What’s new: Diffbot, a Stanford offshoot founded in 2008, built a system that reads web code, parses text, classifies images, and assembles them into what it says is the world’s largest knowledge graph, according to MIT Technology Review.
How it works: Diffbot’s web crawler rebuilds the graph every four to five days, adding roughly 150 million new subject-object-verb associations monthly. The graph encompasses more than 10 billion entities — people, businesses, products, locations, and so on — and a trillion bits of information about those entities.
- The company uses image recognition to classify content into 20 categories such as news, discussion, and images.
- It analyzes any text to find statements made up of a subject, verb, and object and stores their relationships. Its knowledge graph has captured subject-verb-object associations from 98 percent of the internet in nearly 50 languages. The image recognition tool also picks up implicit associations such as that between a product and its price.
- A suite of machine learning techniques including knowledge fusion (which weighs the trustworthiness of various sources) associates new information and overwrites outdated information, the Diffbot founder and CEO Mike Tung told The Batch.
- The company’s customers can sift the graph using a query language, point-and-click interface, or geographic map (as shown above). The system automatically corrects misspellings and other inconsistencies.
Behind the news: Over 400 companies including Adidas, Nasdaq, and SnapChat use Diffbot’s technology to understand their customers and competition, and to train their own models. Researchers can apply for free access.
Why it matters: A knowledge graph that encompasses the entire internet could reveal a wealth of obscure connections between people, places, and things. This tool could also be useful for machine learning engineers who aim to train models that have a good grasp of facts.
We’re thinking: Knowledge graphs have proven to be powerful tools for companies such as Google and Microsoft, but they’ve received little attention in academia relative to their practical impact. Tools to automatically build large knowledge graphs will help more teams reap their benefits.