Applied Sciences (Aug 2024)

GenAI-Assisted Database Deployment for Heterogeneous Indigenous–Native Ethnographic Research Data

  • Reen-Cheng Wang,
  • David Yang,
  • Ming-Che Hsieh,
  • Yi-Cheng Chen,
  • Weihsuan Lin

DOI
https://doi.org/10.3390/app14167414
Journal volume & issue
Vol. 14, no. 16
p. 7414

Abstract

Read online

In ethnographic research, data collected through surveys, interviews, or questionnaires in the fields of sociology and anthropology often appear in diverse forms and languages. Building a powerful database system to store and process such data, as well as making good and efficient queries, is very challenging. This paper extensively investigates modern database technology to find out what the best technologies to store these varied and heterogeneous datasets are. The study examines several database categories: traditional relational databases, the NoSQL family of key-value databases, graph databases, document databases, object-oriented databases and vector databases, crucial for the latest artificial intelligence solutions. The research proves that when it comes to field data, the NoSQL lineup is the most appropriate, especially document and graph databases. Simplicity and flexibility found in document databases and advanced ability to deal with complex queries and rich data relationships attainable with graph databases make these two types of NoSQL databases the ideal choice if a large amount of data has to be processed. Advancements in vector databases that embed custom metadata offer new possibilities for detailed analysis and retrieval. However, converting contents into vector data remains challenging, especially in regions with unique oral traditions and languages. Constructing such databases is labor-intensive and requires domain experts to define metadata and relationships, posing a significant burden for research teams with extensive data collections. To this end, this paper proposes using Generative AI (GenAI) to help in the data-transformation process, a recommendation that is supported by testing where GenAI has proven itself a strong supplement to document and graph databases. It also discusses two methods of vector database support that are currently viable, although each has drawbacks and benefits.

Keywords