IEEE Access (Jan 2022)
GasBotty: Multi-Metric Extraction in the Wild
Abstract
The lived environment, particularly when proximal to roadways, is filled with multi-digit and multi-numbered values corresponding to advertised commodities. The reliable detection of single multi-digit values from natural imagery has been widely studied through the last decade (e.g. Street View House Numbers [SVHN]); however, extraction and assignment of the contextual meaning of those values are far more difficult given the diversity and unstructured nature of advertisements. To operationalize information extracted from detected values in the wild, the contextual meaning that those values represent is critical. To our knowledge, no large-scale visual dataset comprising multi-digit, multi-number values with associated context labels exists; we denote this class of problems as “multi-metric extraction in the wild”. In this work, we focus on the accurate detection and reading of gas prices, and their contextual association to gas grade and payment type. We provide complete annotations for the Gas Prices of America (GPA) dataset, comprising 2,048 training and 512 test images sampled across the United States including over 2,600 signs, 6,000 prices, 27,000 digits, and 7,800 gas grade and payment type labels. With these data, we develop the GasBotty predictor, a composite neural network model, and evaluate it over eight benchmark tasks of increasing difficulty. Finally, we define a new highly stringent, binary-type metric, denoted “All-or-Nothing Accuracy” (ANA), requiring that a predictor perfectly extract and correctly associate all information in a gas sign. Our proposed model achieves 72.9% ANA over the independent test set of 512 images, where conventional state-of-the-art models object detection models and both a uniform random and biased random predictor would all tend towards 0% ANA. GasBotty and the GPA dataset will serve as a valuable benchmark for the development of future multi-metric extraction in the wild systems.
Keywords