Труды Института системного программирования РАН (Oct 2018)
Extracting Objects and Their Attributes from Tables in Text Documents
Abstract
Extracting information from tables is an important and rather complex part of information retrieval. For the task of objects extraction from HTML tables we introduce the following methods: determining table orientation, processing of aggregating objects (like Total) and scattered headers (super row labels, subheaders).