Fuzzy Spectral Clustering with Variable-Weighted Adjacency Matrices

Jesse S. Ghashti and John R. J. Thompson

2025年10月04日

Introduction

The FuzzySpec package implements the FVIBES (Fuzzy Variable-Importance Based Eigenspace Separation) algorithm, a fuzzy spectral clustering procedure that incorporates variable-weighted distance metrics and adaptive adjacency matrix constructions. This package accompanies the paper Variable-Weighted Adjacency Constructions for Fuzzy Spectral Clustering by Ghashti, Hare, and Thompson (2025).

The key features of this package include:

a variable-weighted distance metric that automatically determines variable importance using nonparametric kernel density estimation,
an adaptive adjacency construction framework with multiple options for building similarity graphs including locally-adaptive scaling (Zelnik-Manor and Perona, 2004),
clustering outputs that return fuzzy membership matrices rather than just hard cluster assignments, and
a synthetic dataset generation containing built-in generators to benchmark fuzzy clustering algorithms.

Package Overview

There are three primary functions needed to conduct FVIBES clustering:

Build an adjacency matrix from the data using make.adjacency()
Perform fuzzy spectral clustering using fuzzy.spectral.clustering()
Optionally, examine results results with 2D visualization function plot.fuzzy() or compare to true class labels using clustering.accuracy().

Installation

Install the latest release version of FuzzySpec from GitHub or with the following:

 library(devtools)
 install_github("ghashti-j/FuzzySpec")
 library(FuzzySpec)

Sample Usage

The basic steps using built-in function are provided below.

First we generate a synthetic dataset spirals, see the help file for gen.fuzzy() for more options and information.

 set.seed(1)
data <- FuzzySpec::gen.fuzzy(n = 300, dataset = "spirals", noise = 0.15) # data generation
FuzzySpec::plot.fuzzy(data, plotFuzzy = TRUE, colorCluster = TRUE) # plot data generating process

Build a variable-weighted locally-adaptive adjacency matrix, corresponding to the adjacency \(\mathbf{W}^{(\text{vwla-id})}\) in Ghashti et al. (2025):

W <- FuzzySpec::make.adjacency(
 data = data$X,
 method = "vw", # variable-weighted distances
 isLocWeighted = TRUE, # Locally-adaptive scaling
 scale = FALSE # scaling not required for kernel methods
)
 #> Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 |Multistart 1 of 3 /Multistart 1 of 3 -Multistart 1 of 3 |Multistart 1 of 3 |Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 2 of 3 -Multistart 2 of 3 |Multistart 2 of 3 |Multistart 2 of 3 /Multistart 3 of 3 |Multistart 3 of 3 |Multistart 3 of 3 /Multistart 3 of 3 -Multistart 3 of 3 |Multistart 3 of 3 |

Perform fuzzy spectral clustering given the adjacency matrix \(\mathbf{W}\), number of clusters k = 3 and the commonly chosen fuzzy parameter m = 1.5. We display the first 5 rows of the membership matrix \(\mathbf{U}\):

res <- FuzzySpec::fuzzy.spectral.clustering(
 W = W, k = 3, m = 1.5, method = "CM" 
)
res$u[1:5,]
 #> Clus 1 Clus 2 Clus 3
 #> Obj 1 0.9048457 0.05660900 0.03854527
 #> Obj 2 0.9549308 0.02402001 0.02104920
 #> Obj 3 0.9092418 0.05372531 0.03703293
 #> Obj 4 0.9764813 0.01192352 0.01159519
 #> Obj 5 0.9590105 0.02179147 0.01919800

We can compare the hard clustering results to the true class labels:

acc <- FuzzySpec::clustering.accuracy(data$y, res$cluster)
 cat("Clustering accuracy:", round(acc, 3), "\n")
 #> Clustering accuracy: 0.99

We can compare the membership matrix \(\mathbf{U}\) determined by FVIBES to the true probabilistic cluster memberships with function fari, which computes fuzzy generalizations of the Adjusted Rand Index (FARI) based on Frobenius inner products of membership matrices (Andrews, Brown and Hvingelby, 2022).

far <- FuzzySpec::fari(data$U, res$u)
 cat("FARI:", round(far, 3), "\n")
 #> FARI: 0.98

Finally, we can visualize the clustering results with observations, where observations are assigned by hard cluster labels and sized by the membership matrix \(\mathbf{U}\):

resDF <- list(
 X = data$X, U = res$u, y = factor(res$cluster), k = 3
)
FuzzySpec::plot.fuzzy(resDF, plotFuzzy = TRUE, colorCluster = TRUE)

Adjacency Construction

See respective help files for each function when needed; here we provide a basic overview of function arguments for make.adjacency(). This function allows for flexible adjacency matrix constructions based on Ghashti et al. (2025). The parameters are as follows:

method: distance metric
- "eu": squared Euclidean distance
- "vw": variable-weighted distance using kernel density bandwidth estimation
isLocWeighted: scaling approach
- TRUE: locally-adaptive scaling (Zelnik-Manor & Perona, 2004)
- FALSE: global scaling with parameter sig
isModWeighted: apply similarity weightings
- ModMethod = "snn": shared nearest neighbors (Jarvis & Patrick, 1973)
- ModMethod = "sim": similarity-based weighting
- ModMethod = "both": combined SNN and SIM
isSparse: returns a sparse matrix when using weightings

References

Andrews, J.L., Browne, R. and C.D. Hvingelby (2022). On Assessments of Agreement Between Fuzzy Partitions. Journal of Classification, 39, 326–342.
J.C. Bezdek (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York.
K. R. Coombes (2025). Thresher: Threshing and Reaping for Principal Components. R package version 1.1.5.
Ferraro, M.B., Giordani, P., and A. Serafini (2019). fclust: An R Package for Fuzzy Clustering. The R Journal, 11.
Jarvis, R. A., and A. E. Patrick (1973). Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers, 22(11), 1025-1034.
Ghashti, J. S., Hare, W., and J. R. J. Thompson (2025). Variable-weighted adjacency constructions for fuzzy spectral clustering. Submitted.
Hayfield, T., and J. S. Racine (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software 27(5).
McLachlan, G. and T. Krishnan (2008). The EM algorithm and extensions, Second Edition. John Wiley & Sons.
Ng, A., Jordan, M., and Y. Weiss (2001). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 14.
Scrucca, L., Fraley, C., Murphy, T.B., and A. E. Raftery (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman & Hall.
H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer–Verlag New York.
Zelnik-Manor, L., and P. Perona (2004). Self-tuning spectral clustering. Advances in Neural Information Processing Systems, 17.
Zhu, Q., Feng, J., and J. Huang (2016). Natural neighbor: A self-adaptive neighborhood method without parameter K. Pattern Recognition Letters, 80, 30-36.