GNI Corpus Version 1.0: Annotated Full-Text Corpus of  to Support Biomedical Information Extraction

So-Yeon Oh; Ji-Hyeon Kim; Seo-Jin Kim; Hee-Jo Nam; Hyun-Seok Park

doi:10.5808/GI.2018.16.3.75

Genomics & Informatics (Sep 2018)

GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction

So-Yeon Oh,
Ji-Hyeon Kim,
Seo-Jin Kim,
Hee-Jo Nam,
Hyun-Seok Park

Affiliations

So-Yeon Oh: Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea
Ji-Hyeon Kim: Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea
Seo-Jin Kim: Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea
Hee-Jo Nam: Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea
Hyun-Seok Park: Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea

DOI: https://doi.org/10.5808/GI.2018.16.3.75
Journal volume & issue: Vol. 16, no. 3
pp. 75 – 77

Abstract

Read online

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.

Published in Genomics & Informatics

ISSN: 1598-866X (Print); 2234-0742 (Online)
Publisher: Korea Genome Organization
Country of publisher: Korea, Republic of
LCC subjects: Science: Biology (General): Genetics
Website: https://genominfo.org/

About the journal

Abstract

Keywords