Patterns (Apr 2020)

Sentiment Analysis of Conservation Studies Captures Successes of Species Reintroductions

  • Kyle S. Van Houtan,
  • Tyler Gagne,
  • Clinton N. Jenkins,
  • Lucas Joppa

Journal volume & issue
Vol. 1, no. 1
p. 100005

Abstract

Read online

Summary: Learning from the rapidly growing body of scientific articles is constrained by human bandwidth. Existing methods in machine learning have been developed to extract knowledge from human language and may automate this process. Here, we apply sentiment analysis, a type of natural language processing, to facilitate a literature review in reintroduction biology. We analyzed 1,030,558 words from 4,313 scientific abstracts published over four decades using four previously trained lexicon-based models and one recursive neural tensor network model. We find frequently used terms share both a general and a domain-specific value, with either positive (success, protect, growth) or negative (threaten, loss, risk) sentiment. Sentiment trends suggest that reintroduction studies have become less variable and increasingly successful over time and seem to capture known successes and challenges for conservation biology. This approach offers promise for rapidly extracting explicit and latent information from a large corpus of scientific texts. The Bigger Picture: The volume of peer-reviewed published science is increasingly growing, presenting new opportunities for growth in research on research itself, also known as meta-analysis. Such research operates by (1) acquiring a body of scientific texts from public archives, (2) extracting the desired information from the texts, and (3) performing analyses on the extracted data. While such analyses hold great value, they may require substantial resources and manual effort throughout the project pipeline. Here, we detail how much of the process of scientific meta-analysis may be automated using a type of machine learning known as natural language processing (NLP). We apply this technique to a specific problem in environmental conservation, show how off-the-shelf NLP models perform, and offer recommendations for future improvements to the process. Such investments may be critical for advances from research, perhaps especially to ensure that scientific productivity meets practical progress.

Keywords