Scientific Data (May 2024)

Open e-commerce 1.0, five years of crowdsourced U.S. Amazon purchase histories with user demographics

  • Alex Berke,
  • Dan Calacci,
  • Robert Mahari,
  • Takahiro Yabe,
  • Kent Larson,
  • Sandy Pentland

DOI
https://doi.org/10.1038/s41597-024-03329-6
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 14

Abstract

Read online

Abstract This is a first-of-its-kind dataset containing detailed purchase histories from 5027 U.S. Amazon.com consumers, spanning 2018 through 2022, with more than 1.8 million purchases. Consumer spending data are customarily collected through government surveys to produce public datasets and statistics, which serve public agencies and researchers. Companies now collect similar data through consumers’ use of digital platforms at rates superseding data collection by public agencies. We published this dataset in an effort towards democratizing access to rich data sources routinely used by companies. The data were crowdsourced through an online survey and shared with participants’ informed consent. Data columns include order date, product code, title, price, quantity, and shipping address state. Each purchase history is linked to survey data with information about participants’ demographics, lifestyle, and health. We validate the dataset by showing expenditure correlates with public Amazon sales data (Pearson r = 0.978, p < 0.001) and conduct analyses of specific product categories, demonstrating expected seasonal trends and strong relationships to other public datasets.