Measuring an artificial intelligence language model’s trust in humans using machine incentives

Tim Johnson; Nick Obradovich

doi:10.1088/2632-072X/ad1c69

Journal of Physics: Complexity (Jan 2024)

Measuring an artificial intelligence language model’s trust in humans using machine incentives

Tim Johnson,
Nick Obradovich

Affiliations

Tim Johnson: ORCiD; Atkinson School of Management, Willamette University , 900 State Street, Salem, OR 97301, United States of America
Nick Obradovich: ORCiD; Laureate Institute for Brain Research , Tulsa, OK 74136, United States of America

DOI: https://doi.org/10.1088/2632-072X/ad1c69
Journal volume & issue: Vol. 5, no. 1
p. 015003

Abstract

Read online

Will advanced artificial intelligence (AI) language models exhibit trust toward humans? Gauging an AI model’s trust in humans is challenging because—absent costs for dishonesty—models might respond falsely about trusting humans. Accordingly, we devise a method for incentivizing machine decisions without altering an AI model’s underlying algorithms or goal orientation and we employ the method in trust games between an AI model from OpenAI and a human experimenter (namely, author TJ). We find that the AI model exhibits behavior consistent with trust in humans at higher rates when facing actual incentives than when making hypothetical decisions—a finding that is robust to prompt phrasing and the method of game play. Furthermore, trust decisions appear unrelated to the magnitude of stakes and additional experiments indicate that they do not reflect a non-social preference for uncertainty.

Published in Journal of Physics: Complexity

ISSN: 2632-072X (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Science: Physics
Website: https://iopscience.iop.org/journal/2632-072X

About the journal

Abstract

Keywords