IEEE Access (Jan 2024)

UDNet: A Unified Deep Learning-Based AutoML Framework to Execute Multiple ML Strategies for Multi-Modal Unstructured Data Processing

  • Shubham Jain,
  • Enda Fallon

DOI
https://doi.org/10.1109/ACCESS.2024.3403724
Journal volume & issue
Vol. 12
pp. 77959 – 77975

Abstract

Read online

The rise of multi-modal unstructured data has intensified the need for advanced processing solutions capable of deriving meaningful insights efficiently. UDNet emerges as a concurrent ML processing framework that leverages its novel methodologies, specifically Adaptive Confidence Bound (ACB) and Multi-Fidelity Meta-Learning (MFM), to execute multiple ML strategies for multi-modal unstructured data processing. These innovations are crucial for dynamically adjusting algorithm selection based on performance trends and evaluating algorithm efficacy across various data fidelity levels, optimizing computational resources while ensuring accuracy in model selection. Crucially, UDNet has been rigorously evaluated against a suite of benchmark datasets, including ICDAR 2013 Table Competition Dataset, Marmot Dataset, MIMIC-III Clinical Database, and TabLeX, each chosen for their complexity and representativeness of real-world scenarios. This comprehensive evaluation showcases UDNet’s robustness and versatility across different data types and formats. To accurately assess its performance, we introduced specialized evaluation metrics: Extraction Accuracy (EA), for measuring the precision of unstructured data extraction; Table Integrity (TI), for evaluating the accuracy of structured data conversion; and File-Type Versatility (FV), for assessing adaptability across diverse data formats. These metrics, alongside traditional ones, have substantiated UDNet’s enhanced efficiency and accuracy compared to existing AutoML solutions in the sphere of unstructured data processing. The introduction of these metrics not only underscores the technical sophistication of UDNet but also addresses the nuances of evaluating performance in unstructured data processing, marking a significant contribution to the field.

Keywords