Native Language Identification Across Text Types: How Special Are Scientists?

Sabrina Stehwien; Sebastian Padó

doi:10.4000/ijcol.348

IJCoL (Jun 2016)

Native Language Identification Across Text Types: How Special Are Scientists?

Sabrina Stehwien,
Sebastian Padó

Affiliations

Sabrina Stehwien
Sebastian Padó

DOI: https://doi.org/10.4000/ijcol.348
Journal volume & issue: Vol. 2, no. 1

Abstract

Read online

Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific articles. Our main results are: (a) the science corpus is not harder to model than some learner corpora; (b) it cannot profit as much as learner corpora from corpus combination via domain adaptation; (c) this pattern can be explained in terms of the respective models focusing on language transfer and topic indicators to different extents.

Published in IJCoL

ISSN: 2499-4553 (Online)
Publisher: Accademia University Press
Country of publisher: Italy
LCC subjects: Social Sciences; Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: https://journals.openedition.org/ijcol

About the journal