PyPI version License: MIT Python Versions Paper
G-CoMVKM is a Python implementation of the Globally Collaborative Multi-View k-Means clustering algorithm. This algorithm integrates a collaborative transfer learning framework with entropy-regularized feature-view reduction, enabling dynamic elimination of uninformative components. The method achieves clustering by balancing local view importance and global consensus.
- Multi-View Clustering: Process data from multiple views/sources simultaneously
- Feature Weight Learning: Automatically determine the importance of each feature
- View Weight Learning: Automatically determine the importance of each view
- Feature Selection: Entropy-regularized mechanism to discard irrelevant features
- Global Consensus: Balance local view objectives with global clustering agreement
You can install G-CoMVKM directly from PyPI:
pip install gcomvkm
- Python 3.7+
- NumPy
- SciPy
- Matplotlib
- scikit-learn
- seaborn
Here's a simple example of how to use G-CoMVKM:
from gcomvkm import GCoMVKM from gcomvkm.utils import load_synthetic_data from gcomvkm.evaluation import nmi, rand_index, adjusted_rand_index # Load the synthetic dataset (2 views, 2 dimensions, 2 clusters) X, true_labels = load_synthetic_data() # Create and fit the model model = GCoMVKM( n_clusters=2, gamma=5.0, # Feature selection regularization parameter theta=4.0, # View weight regularization parameter max_iter=100, tol=1e-4, verbose=True, random_state=42 ) # Fit the model to the data model.fit(X) # Get the clustering results predicted_labels = model.labels_ feature_weights = model.feature_weights_ view_weights = model.view_weights_ # Evaluate clustering performance nmi_score = nmi(true_labels, predicted_labels) ri_score = rand_index(true_labels, predicted_labels) ari_score = adjusted_rand_index(true_labels, predicted_labels) print(f"NMI Score: {nmi_score:.4f}") print(f"Rand Index: {ri_score:.4f}") print(f"Adjusted Rand Index: {ari_score:.4f}")
G-CoMVKM extends the traditional k-means algorithm to work with multi-view data. The algorithm:
- Initializes cluster centers randomly or using k-means++
- Computes memberships for each data point to the clusters
- Updates cluster centers based on these memberships
- Updates feature weights using an entropy-regularized optimization
- Discards irrelevant features based on a threshold
- Updates view weights to balance view importance
- Repeats steps 2-6 until convergence
The objective function minimizes the within-cluster variance while encouraging feature and view sparsity through entropy regularization.
G-CoMVKM also provides a command-line interface:
# Run with default parameters on the synthetic dataset gcomvkm --dataset 2V2D2C # Run with custom parameters gcomvkm --dataset 2V2D2C --gamma 5.0 --theta 4.0 --n-clusters 2 --max-iter 100
-
Comprehensive Cross-Platform Development
- β Production-grade MATLAB Implementation (original repository)
- β Professional Python Package (PyPI: gcomvkm 0.1.0)
- β Industry-standard documentation and interactive tutorials
- β 100% reproducible experiments with provided code and data
- β Optimized performance with GPU acceleration
-
Quality Assurance
- Rigorous testing across multiple datasets
- Comprehensive error handling and input validation
- Performance benchmarking against state-of-the-art methods
- Clean, well-documented, and maintainable code
-
User Experience
- Intuitive API design following scikit-learn conventions
- Detailed documentation with examples and tutorials
- Visualizations for better interpretation of results
- Command-line interface for quick experimentation
If you use G-CoMVKM in your research, please cite:
@Article{electronics14112129, AUTHOR = {Sinaga, Kristina P. and Yang, Miin-Shen}, TITLE = {A Globally Collaborative Multi-View k-Means Clustering}, JOURNAL = {Electronics}, VOLUME = {14}, YEAR = {2025}, NUMBER = {11}, ARTICLE-NUMBER = {2129}, URL = {https://www.mdpi.com/2079-9292/14/11/2129}, ISSN = {2079-9292}, ABSTRACT = {Multi-view (MV) data are increasingly collected from various fields, like IoT. The surge in MV data demands clustering algorithms capable of handling heterogeneous features and high dimensionality. Existing feature-weighted MV k-means (MVKM) algorithms often neglect effective dimensionality reduction such that their scalability and interpretability are limited. To address this, we propose a novel procedure for clustering MV data, namely a globally collaborative MVKM (G-CoMVKM) clustering algorithm. The proposed G-CoMVKM integrates a collaborative transfer learning framework with entropy-regularized feature-view reduction, enabling dynamic elimination of uninformative components. This method achieves clustering by balancing local view importance and global consensus, without relying on matrix reconstruction. We design a feature-view reduction by embedding transferred learning processes across view components by using penalty terms and entropy to simultaneously reduce these unimportant feature-view components. Experiments on synthetic and real-world datasets demonstrate that G-CoMVKM consistently outperforms these existing MVKM clustering algorithms in clustering accuracy, performance, and dimensionality reduction, affirming its robustness and efficiency.}, DOI = {10.3390/electronics14112129} }
The original code has been tested on MATLAB R2020a. Performance on other versions may vary. This Python implementation has been tested on Python 3.7+ and is compatible with most modern Python environments.
As Arthur C. Clarke said, "The only way of discovering the limits of the possible is to venture a little way past them into the impossible."
We didn't just ventureβwe blazed a trail:
- Where they saw complexity, we found elegance
- Where they predicted failure, we achieved excellence
- Where they set limits, we broke boundaries
- Where they said "impossible," we said "watch us"
To aspiring researchers: Let our journey be a reminder that in science, "impossible" is often just a challenge waiting to be accepted. The boundaries of what's possible are meant to be pushed, tested, and ultimately redefined.
- Kristina P. Sinaga
- Email: kristinapestaria.sinaga@isti.cnr.it (The email address kristinasinaga41@gmail.com is no longer under my authority. Please do not use it to contact me).
- GitHub
- A Globally Collaborative Multi-View k-Means Clustering - Electronics MDPI
- Original MATLAB Implementation: G-CoMVKM
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.