Revisiting Sequential Information Bottleneck: New Implementation and Evaluation

Assaf Toledo; Elad Venezian; Noam Slonim

doi:10.3390/e24081132

Entropy (Aug 2022)

Revisiting Sequential Information Bottleneck: New Implementation and Evaluation

Assaf Toledo,
Elad Venezian,
Noam Slonim

Affiliations

Assaf Toledo: IBM Research AI, Haifa University Campus, Mount Carmel Haifa, Haifa 3498825, Israel
Elad Venezian: IBM Research AI, Haifa University Campus, Mount Carmel Haifa, Haifa 3498825, Israel
Noam Slonim: IBM Research AI, Haifa University Campus, Mount Carmel Haifa, Haifa 3498825, Israel

DOI: https://doi.org/10.3390/e24081132
Journal volume & issue: Vol. 24, no. 8
p. 1132

Abstract

Read online

We introduce a modern, optimized, and publicly available implementation of the sequential Information Bottleneck clustering algorithm, which strikes a highly competitive balance between clustering quality and speed. We describe a set of optimizations that make the algorithm computation more efficient, particularly for the common case of sparse data representation. The results are substantiated by an extensive evaluation that compares the algorithm to commonly used alternatives, focusing on the practically important use case of text clustering. The evaluation covers a range of publicly available benchmark datasets and a set of clustering setups employing modern word and sentence embeddings obtained by state-of-the-art neural models. The results show that in spite of using the more basic Term-Frequency representation, the proposed implementation provides a highly attractive trade-off between quality and speed that outperforms the alternatives considered. This new release facilitates the use of the algorithm in real-world applications of text clustering.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords