Opening a conversation on responsible environmental data science in the age of large language models

Ruth Y. Oliver; Melissa Chapman; Nathan Emery; Lauren Gillespie; Natasha Gownaris; Sophia Leiker; Anna C. Nisi; David Ayers; Ian Breckheimer; Hannah Blondin; Ava Hoffman; Camille M.L.S. Pagniello; Megan Raisle; Naupaka Zimmerman

doi:10.1017/eds.2024.12

Environmental Data Science (Jan 2024)

Opening a conversation on responsible environmental data science in the age of large language models

Ruth Y. Oliver,
Melissa Chapman,
Nathan Emery,
Lauren Gillespie,
Natasha Gownaris,
Sophia Leiker,
Anna C. Nisi,
David Ayers,
Ian Breckheimer,
Hannah Blondin,
Ava Hoffman,
Camille M.L.S. Pagniello,
Megan Raisle,
Naupaka Zimmerman

Affiliations

Ruth Y. Oliver: ORCiD; Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, CA, USA
Melissa Chapman: National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, CA, USA
Nathan Emery: Center for Innovative Teaching, Research, and Learning, University of California Santa Barbara, Santa Barbara, CA, USA
Lauren Gillespie: Department of Computer Science, Stanford University, Palo Alto, CA, USA
Natasha Gownaris: Department of Environmental Studies, Gettysburg College, Gettysburg, PA, USA
Sophia Leiker: Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, CA, USA
Anna C. Nisi: Department of Biology, Center for Ecosystem Sentinels, University of Washington, Seattle, WA, USA
David Ayers: Wildlife, Fish and Conservation Biology Department, University of California Davis, Davis, CA, USA
Ian Breckheimer: Rocky Mountain Biological Laboratory, Crested Butte, CO, USA
Hannah Blondin: Cooperative Institute for Marine and Atmospheric Studies (CIMAS), University of Miami, Miami, FL, USA
Ava Hoffman: Data Science Lab, Fred Hutchinson Cancer Center, Seattle, WA, USA
Camille M.L.S. Pagniello: ORCiD; Hawai’i Institute of Marine Biology, University of Hawai’i at Mānoa, Kaneohe, HI, USA
Megan Raisle
Naupaka Zimmerman: ORCiD; Department of Biology, University of San Francisco, San Francisco, CA, USA

DOI: https://doi.org/10.1017/eds.2024.12
Journal volume & issue: Vol. 3

Abstract

Read online

The general public and scientific community alike are abuzz over the release of ChatGPT and GPT-4. Among many concerns being raised about the emergence and widespread use of tools based on large language models (LLMs) is the potential for them to propagate biases and inequities. We hope to open a conversation within the environmental data science community to encourage the circumspect and responsible use of LLMs. Here, we pose a series of questions aimed at fostering discussion and initiating a larger dialogue. To improve literacy on these tools, we provide background information on the LLMs that underpin tools like ChatGPT. We identify key areas in research and teaching in environmental data science where these tools may be applied, and discuss limitations to their use and points of concern. We also discuss ethical considerations surrounding the use of LLMs to ensure that as environmental data scientists, researchers, and instructors, we can make well-considered and informed choices about engagement with these tools. Our goal is to spark forward-looking discussion and research on how as a community we can responsibly integrate generative AI technologies into our work.

Published in Environmental Data Science

ISSN: 2634-4602 (Online)
Publisher: Cambridge University Press
Country of publisher: United Kingdom
LCC subjects: Geography. Anthropology. Recreation: Environmental sciences; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.cambridge.org/core/journals/environmental-data-science

About the journal

Abstract

Keywords