PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification

Tan Yue; Yong Li; Xuzhao Shi; Jiedong Qin; Zijiao Fan; Zonghai Hu

doi:10.3390/app12094554

Applied Sciences (Apr 2022)

PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification

Tan Yue,
Yong Li,
Xuzhao Shi,
Jiedong Qin,
Zijiao Fan,
Zonghai Hu

Affiliations

Tan Yue: School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
Yong Li: School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
Xuzhao Shi: School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
Jiedong Qin: School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
Zijiao Fan: School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
Zonghai Hu: School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

DOI: https://doi.org/10.3390/app12094554
Journal volume & issue: Vol. 12, no. 9
p. 4554

Abstract

Read online

Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords