Posts

Showing posts with the label Document Understanding

2023-01-08: A Summary of "DocFormer: End to End Transformer for Document Understanding" (Appalaraju et al. 2021 ICCV)

Image
Our previous blog described the importance of document understanding for layout analysis. While layout analysis is important, many downstream tasks, including document classification, entity extraction, and sequence labeling often require visual document understanding (VDU). VDU requires an understanding of both structures and layout of the document.  There are VDU approaches based  on  only textual features or approaches based  on  both textual and spatial features.   The best results are obtained by fusing textual, spatial, and visual features . Appalaraju et al. proposed DocFormer: End-to-End Transformer for Document Understanding at IEEE / CVF International Conference on Computer Vision in 2021, which incorporates a novel multimodal self-attention with shared embeddings in an encoder-only transformer architecture. DocFormer achieved state-of-the-art results on four various downstream VDU tasks. The contributions of this paper are: DocFormer  has ...

2022-07-11: A Summary of "Document Domain Randomization for Deep Learning Document Layout Analysis" (Ling et al. 2021 ICDAR)

Image
Document Understanding  is the task of automatically parsing and ingesting the content of documents into a system using artificial intelligence methods to accomplish downstream challenges, such as information retrieval, Q&A, text and non-textual analysis. Document Understanding  has trending importance in processing digital documents at scale. Many documents are visually rich, meaning layout and visual information are critical to understanding document content. In the scholarly domain, the layout analysis is challenging due to various document templates (e.g., single or double-column papers), which have title pages, section headings, tables, figures, algorithms, equations, references, and so on. To build an intelligent system to process such downstream tasks, annotating a large number of documents is laborious. Besides, developing training data with an equal amount of samples for each template is challenging and may not be attainable at a large scale. Thus, we often see im...