Hydrology (Sep 2024)

The Implementation of Multimodal Large Language Models for Hydrological Applications: A Comparative Study of GPT-4 Vision, Gemini, LLaVa, and Multimodal-GPT

  • Likith Anoop Kadiyala,
  • Omer Mermer,
  • Dinesh Jackson Samuel,
  • Yusuf Sermet,
  • Ibrahim Demir

DOI
https://doi.org/10.3390/hydrology11090148
Journal volume & issue
Vol. 11, no. 9
p. 148

Abstract

Read online

Large Language Models (LLMs) combined with visual foundation models have demonstrated significant advancements, achieving intelligence levels comparable to human capabilities. This study analyzes the latest Multimodal LLMs (MLLMs), including Multimodal-GPT, GPT-4 Vision, Gemini, and LLaVa, with a focus on hydrological applications such as flood management, water level monitoring, agricultural water discharge, and water pollution management. We evaluated these MLLMs on hydrology-specific tasks, testing their response generation and real-time suitability in complex real-world scenarios. Prompts were designed to enhance the models’ visual inference capabilities and contextual comprehension from images. Our findings reveal that GPT-4 Vision demonstrated exceptional proficiency in interpreting visual data, providing accurate assessments of flood severity and water quality. Additionally, MLLMs showed potential in various hydrological applications, including drought prediction, streamflow forecasting, groundwater management, and wetland conservation. These models can optimize water resource management by predicting rainfall, evaporation rates, and soil moisture levels, thereby promoting sustainable agricultural practices. This research provides valuable insights into the potential applications of advanced AI models in addressing complex hydrological challenges and improving real-time decision-making in water resource management

Keywords