CBMAP: Clustering-Based Manifold Approximation and Projection for Dimensionality Reduction

Küçük Resim Yok

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Ieee-Inst Electrical Electronics Engineers Inc

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Dimensionality reduction methods are employed to decrease data dimensionality, either to enhance machine learning performance or to facilitate data visualization in two or three-dimensional spaces. Among these, popular nonlinear methods such as t-SNE, UMAP, TriMap, and PaCMAP excel in capturing local relationships and nonlinear structures. However, they often distort the global arrangement of clusters, rely heavily on hyperparameter tuning, and exhibit sensitivity to initialization. Moreover, most of these methods cannot project unseen test samples, limiting their applicability in real-world scenarios. To address these challenges, this study introduces a novel approach, CBMAP (Clustering-Based Manifold Approximation and Projection), which explicitly incorporates clustering in the high-dimensional space to guide the embedding. CBMAP computes membership values based on cluster centers in the original space and preserves these memberships during the projection process. This design enables CBMAP to better retain the global layout of the data while maintaining meaningful local relationships. CBMAP demonstrates low sensitivity to initialization strategies, minimal dependence on hyperparameters, and supports projection of unseen test samples. Experimental evaluations on both toy and real-world benchmark datasets show that CBMAP consistently preserves global structures and inter-cluster distances more effectively than state-of-the-art methods, while delivering competitive results in local structure preservation. The method is freely available at https://github.com/doganlab/cbmap and can be installed via the Python Package Index with the command pip install cbmap.

Açıklama

Anahtar Kelimeler

Clustering algorithms, Dimensionality reduction, Data visualization, Principal component analysis, Machine learning algorithms, High dimensional data, Sensitivity, Manifolds, Data structures, Standards, Clustering, dimensionality reduction, k-means, PCA, t-SNE, UMAP

Kaynak

IEEE Access

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

13

Sayı

Künye