New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation

Robert Gove; Lucas Cadalzo; Nicholas Leiby; Jedediah M. Singer; Alexander Zaitzeff

Visual Informatics (Jun 2022)

New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation

Robert Gove,
Lucas Cadalzo,
Nicholas Leiby,
Jedediah M. Singer,
Alexander Zaitzeff

Affiliations

Robert Gove: Corresponding author.; Two Six Technologies, USA
Lucas Cadalzo: Two Six Technologies, USA
Nicholas Leiby: Two Six Technologies, USA
Jedediah M. Singer: Two Six Technologies, USA
Alexander Zaitzeff: Two Six Technologies, USA

Journal volume & issue: Vol. 6, no. 2
pp. 87 – 97

Abstract

Read online

We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.

Published in Visual Informatics

ISSN: 2468-502X (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.journals.elsevier.com/visual-informatics/

About the journal

Abstract

Keywords