A Distribution Agnostic Rank-Based Measure for Proximity Search

Mayur Garg; Ashutosh Nayak; Rajasekhara Reddy Duvvuru Muni

doi:10.1109/ACCESS.2024.3522669

IEEE Access (Jan 2025)

A Distribution Agnostic Rank-Based Measure for Proximity Search

Mayur Garg,
Ashutosh Nayak,
Rajasekhara Reddy Duvvuru Muni

Affiliations

Mayur Garg: ORCiD; United Airlines, Gurugram, India
Ashutosh Nayak: ORCiD; Samsung Research Institute Bangalore, Bengaluru, India
Rajasekhara Reddy Duvvuru Muni: ORCiD; Samsung Research Institute Bangalore, Bengaluru, India

DOI: https://doi.org/10.1109/ACCESS.2024.3522669
Journal volume & issue: Vol. 13
pp. 12103 – 12112

Abstract

Read online

Proximity search is extensively used in modern machine learning algorithms across various applications. Proximity search aims at finding data points which are close to the data point of interest. Extant algorithms depend on distance-based metrics to find the closest data points. However, these metrics are limited by their dependency on the distribution of data along different dimensions, making them sensitive to scaling and translation. The performance also suffers as the number of dimensions increase. Furthermore, proximity estimation between any two data points in extant metrics does not factor in the relative position of the rest of the data. In this paper, we aim to provide an alternative to these metrics by proposing Rank Adjacency Measure (RAM) which is agnostic to the distribution of the data. RAM estimates the probability of proximity between points by extending the concept of ordering in one dimension. We provide a detailed mathematical construction of RAM. We illustrate the effectiveness of the proposed methodology using five datasets in three application areas - Outlier Detection, Nearest Neighbor Search, and Text Similarity. While our proposed methodology outperforms existing algorithms in outlier detection by 50%, it performs at par with existing metrics for other two applications. We conclude the paper with discussion on its limitations and research directions for improving RAM.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords