Questions tagged [unsupervised-learning]
Finding hidden (statistical) structure in unlabelled data, including clustering and feature extraction for dimensionality reduction.
709 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
23
views
Modeling recurring monthly transactions with weekend-shift effects: DBSCAN vs rule-based temporal detection?
I have 3 months of categorized bank transaction data and need to identify recurring cash inflows and outflows for lending risk modeling.
Complications:
1. Income dates shift earlier when payday falls ...
0
votes
0
answers
27
views
How to identify and quantify main tendencies across participants from cluster membership heatmaps?
I'd appreciate your thoughts on the following problem.
I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η).
Now, I'...
2
votes
3
answers
90
views
If a point is a marginal anomaly, should it be considered a joint anomaly no matter how mundane the other multivariate components are?
I envision a situation where multivariate data are observed and one observation of one variable seems way far away from any kind of expected behavior, say a value of 7ドル$ for data assumed to be or ...
0
votes
0
answers
28
views
What is the interval of values of the CDbw index for clustering internal evaluation?
I'm currently studying the CDbw (Compose Density between and within clusters) index, which is metric designed for internal clustering evaluation.
The original article of this index was published in ...
1
vote
1
answer
122
views
Pseudo label as ground truth?
I'm new to machine learning and currently working on new topic discovery and topic modelling under nlp.
If I have unlabeled survey responses that I want to categorise but don't know how, run an NMF ...
0
votes
0
answers
35
views
Rigorous books on unsupervised ML / latent variable modelling?
I'm looking for some rigorous book(s) on unsupervised machine learning, especially latent variable modelling (e.g., EM algorithm and various instances of it, state space models, filtering).
Time ...
Community wiki
0
votes
0
answers
41
views
Is analyzing test scores a clustering problem or an EDA problem?
I have a dataset of 28 personality assessment features, which measures personality attributes like Diligence or Sociability to determine performance in the corporate workplace. I'm tasked with ...
0
votes
0
answers
51
views
Calculating Standard Deviation of RMSE of an unsupervised algorithm
If there is an ML model, the standard deviation (SD) of the root mean squared error (RMSE) can be calculated using time series splits by fitting the model on different training sets and evaluating it ...
5
votes
2
answers
602
views
How can I use unsupervised methods to recommend an "ideal" number of managers for companies when no labels exist?
I have a dataset of around 100,000 companies. For each company, I have a bunch of features such as:
Number of employees,
Number of customers,
Number of complaints,
other additional company attributes ...
1
vote
1
answer
100
views
Dimension reduction on ordinal, related features with additional continuous features
I have what I think is a peculiar dataset that is a set of molecule features relating to a simple bead and spring molecular model. The raw molecule data is as follows
...
0
votes
0
answers
103
views
Finding Dependencies in Blackbox Way
Given a 3-rank tensor with dimensions $x,y,z$.
Where:
$x$: number of graphs (number of samples)
$y$: number of nodes (let's say 5ドル$: $a, b, c, d,$ and $e$)
$z$: embedding dimension (e.g. 2ドル$ for ...
9
votes
3
answers
1k
views
What is a good approach to show my data only belongs to one cluster?
I hope the question is not stupid, but after a long search I have not found a satisfactory answer. I have a question about how to proceed if I want to test whether my data is from just one cluster or ...
2
votes
1
answer
97
views
How Barlow Twins avoid embeddings that differ by affine transformation?
I am reading the Barlow Twins (BT) paper and just don't get how it can avoid the following scenario.
The BT loss is minimized when the cross-correlation matrix equals the identity matrix. A necessary ...
6
votes
1
answer
249
views
Why the loss is not considered as a "supervisory signal" in unsupervised learning?
It is said that supervised is different from unsupervised learning due to the presence of "supervisory signals" aka labels.
However, in both cases we have a loss function. Isn't the loss a ...
1
vote
0
answers
80
views
What if PCA is unable to group my samples, but K-means perfectly clusters them? Is there any problem with my data analysis? Is it possible? [closed]
I am not an expert, but I am currently using unsupervised methods to better explain my mass spectrometry data obtained via DART-MS analyses. I am still learning.
It turned out that when analyzing my ...