Models that summarize documents and answer questions work pretty well with limited source material, but they can slip into incoherence when they draw from a sizeable corpus. Recent work by Facebook AI Research and Université de Lorraine’s computer science research lab addresses this problem.
What’s new: Angela Fan and collaborators developed a model for multi-document summarization and question answering. While most previous efforts combine all input documents into one, the authors improved the state of t he art by representing them in a more compact form.
Key insight: The combined length of major source documents pertaining to a given topic overwhelms current language models' ability to extract meaning. A knowledge graph squeezes out irrelevant and redundant information, enabling models to work more effectively.
How it works: The authors’ method involves three steps: constructing a knowledge graph from source documents, encoding the graph as a sequence of words, and extracting information from the sequence.
- The model reads a set of source documents and converts each sentence into a (subject, object, relationship) triplet. It transforms each triplet into two nodes corresponding to the subject and object plus an edge between them that represents their relationship. Nodes and edges also capture the number of times a given subject, object, or relationship appears, reducing redundancy.
- For each word, a word embedding encodes meaning and a position embedding encodes relative position. A graph-weight embedding captures the number of times a node or edge appears and a query-relevance embedding reflects a given source document’s relevance to the latest input query. These embeddings combine to yield the vector representation of the graph.
- The model flattens the graph by concatenating triplets.
- At this point, the input is much smaller but still large. A modified attention mechanism finds the most salient parts of the graph and focuses there while generating output text.
Results: The authors tested their model on a question answering task based on the dataset called Explain Like I'm Five (ELI5).This dataset contains 270,000 question-answer pairs along with source documents (the top 100 web sources from the CommonCrawl corpus for each question). The graph approach edged out the earlier state of the art on F1 for ROUGE-1 (30 percent versus 28.9 percent). They also compared performance on the WikiSum dataset for multi-document summarization using an article’s title as the input query, the footnotes as source documents, and the first paragraph as the target summary. The graph approach underperformed the previous ROUGE-L state of the art 36.5 percent to 38.8 percent, but the comparison wasn't apples-to-apples. The previous research supplemented the corpus with a web search, while the new work used only CommonCrawl.
Why it matters: This research shows that natural language generation based on very large bodies of input text can work well. It also shows that source documents don’t need to be composed of well formed sentences. New ways of representing source documents may well lead to better language generation.
We’re thinking: Many search engines produce summaries or answer questions by choosing the most relevant document. The ability to draw on any number of documents could enable such models to deliver a far wider diversity of information, leading to better research tools and ultimately a better-informed public.