Remote Sensing (Dec 2023)
MobileSAM-Track: Lightweight One-Shot Tracking and Segmentation of Small Objects on Edge Devices
Abstract
Tracking and segmenting small targets in remote sensing videos on edge devices carries significant engineering implications. However, many semi-supervised video object segmentation (S-VOS) methods heavily rely on extensive video random-access memory (VRAM) resources, making deployment on edge devices challenging. Our goal is to develop an edge-deployable S-VOS method that can achieve high-precision tracking and segmentation by selecting a bounding box for the target object. First, a tracker is introduced to pinpoint the position of the tracked object in different frames, thereby eliminating the need to save the results of the split as other S-VOS methods do, thus avoiding an increase in VRAM usage. Second, we use two key lightweight components, correlation filters (CFs) and the Mobile Segment Anything Model (MobileSAM), to ensure the inference speed of our model. Third, a mask diffusion module is proposed that improves the accuracy and robustness of segmentation without increasing VRAM usage. We use our self-built dataset containing airplanes and vehicles to evaluate our method. The results show that on the GTX 1080 Ti, our model achieves a J&F score of 66.4% under the condition that the VRAM usage is less than 500 MB, while maintaining a processing speed of 12 frames per second (FPS). The model we propose exhibits good performance in tracking and segmenting small targets on edge devices, providing a solution for fields such as aircraft monitoring and vehicle tracking that require executing S-VOS tasks on edge devices.
Keywords