The NLP4NLP Corpus (I): 50 Years of Publication, Collaboration and Citation in Speech and Language Processing

Joseph Mariani; Gil Francopoulo; Patrick Paroubek

doi:10.3389/frma.2018.00036

Frontiers in Research Metrics and Analytics (Feb 2019)

The NLP4NLP Corpus (I): 50 Years of Publication, Collaboration and Citation in Speech and Language Processing

Joseph Mariani,
Gil Francopoulo,
Patrick Paroubek

Affiliations

Joseph Mariani: LIMSI-CNRS, Université Paris-Saclay, Orsay, France
Gil Francopoulo: Tagmatica, Paris, France
Patrick Paroubek: LIMSI-CNRS, Université Paris-Saclay, Orsay, France

DOI: https://doi.org/10.3389/frma.2018.00036
Journal volume & issue: Vol. 3

Abstract

Read online

This paper introduces the NLP4NLP corpus, which contains articles published in 34 major conferences and journals in the field of speech and natural language processing over a period of 50 years (1965–2015), comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing ~270 million words. Most of these publications are in English, some are in French, German, or Russian. Some are open access, others have been provided by the publishers. In order to constitute and analyze this corpus several tools have been used or developed. Many of them use Natural Language Processing methods that have been published in the corpus, hence its name. The paper presents the corpus and some findings regarding its content (evolution over time of the number of articles and authors, collaborations between authors, citations between papers and authors), in the context of a global or comparative analysis between sources. Numerous manual corrections were necessary, which demonstrated the importance of establishing standards for uniquely identifying authors, articles, or publications.

Published in Frontiers in Research Metrics and Analytics

ISSN: 2504-0537 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Bibliography. Library science. Information resources
Website: http://journal.frontiersin.org/journal/research-metrics-and-analytics

About the journal

Abstract

Keywords