Mathematics (Jan 2023)

Column-Type Prediction for Web Tables Powered by Knowledge Base and Text

  • Junyi Wu,
  • Chen Ye,
  • Haoshi Zhi,
  • Shihao Jiang

DOI
https://doi.org/10.3390/math11030560
Journal volume & issue
Vol. 11, no. 3
p. 560

Abstract

Read online

Web tables are essential for applications such as data analysis. However, web tables are often incomplete and short of some critical information, which makes it challenging to understand the web table content. Automatically predicting column types for tables without metadata is significant for dealing with various tables from the Internet. This paper proposes a CNN-Text method to deal with this task, which fuses CNN prediction and voting processes. We present data augmentation and synthetic column generation approaches to improve the CNN’s performance and use extracted text to get better predictions. The experimental result shows that CNN-Text outperforms the baseline methods, demonstrating that CNN-Text is well qualified for the table column type prediction.

Keywords