QueryMintAI: Multipurpose Multimodal Large Language Models for Personal Data

Ananya Ghosh; K. Deepa

doi:10.1109/ACCESS.2024.3468996

IEEE Access (Jan 2024)

QueryMintAI: Multipurpose Multimodal Large Language Models for Personal Data

Ananya Ghosh,
K. Deepa

Affiliations

Ananya Ghosh: ORCiD; School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Vellore, India
K. Deepa: ORCiD; School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Vellore, India

DOI: https://doi.org/10.1109/ACCESS.2024.3468996
Journal volume & issue: Vol. 12
pp. 144631 – 144651

Abstract

Read online

QueryMintAI, a versatile multimodal Language Learning Model (LLM) designed to address the complex challenges associated with processing various types of user inputs and generating corresponding outputs across different modalities. The proliferation of diverse data formats, including text, images, videos, documents, URLs, and audio recordings, necessitates an intelligent system capable of understanding and responding to user queries effectively. Existing models often exhibit limitations in handling multimodal inputs and generating coherent outputs across different modalities. The proposed QueryMintAI framework leverages state-of-the-art language models such as GPT-3.5 Turbo, DALL-E-2, TTS-1 and Whisper v2 among others, to enable seamless interaction with users across multiple modalities. By integrating advanced natural language processing (NLP) techniques with domain-specific models, QueryMintAI offers a comprehensive solution for text-to-text, text-to-image, text-to-video, and text-to-audio conversions. Additionally, the system supports document processing, URL analysis, image description, video summarization, audio transcription, and database querying, catering to diverse user needs and preferences. The proposed model addresses several limitations observed in existing approaches, including restricted modality support, lack of adaptability to various data formats, and limited response generation capabilities. QueryMintAI overcomes these challenges by employing a combination of advanced NLP algorithms, deep learning architectures, and multimodal fusion techniques.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords