I am new to machine learning, so forgive me if I am doing something absolutely absurd.
I have a classification task (~100 classes) and have about 2 million training data points in a 2000-dimensional space. Coordinates of data points are integers (discrete). All points have non-zero coordinates only for < 10 dimensions. That is, each point can be uniquely defined in < 10 dimensional sub-space.
If I use a Gaussian Mixture Model (GMM) for each class, I will end up with ~100 GMMs in a 2000-dimensional space. I feel that given the fact that each point is uniquely definable in less than 10 dimensional space, there can possibly be a better way of doing it.
What am I missing here?
-
2$\begingroup$ Why not do something like sparse PCA? What was your reasoning for using a Guassian mixture? Does the data conform to normal distributions? Using methods that explicitly consider the sparsity of your problem would highly outperform it. A simple guassian mixture model may overfit. $\endgroup$Nicholas Mancuso– Nicholas Mancuso2014年03月03日 17:20:41 +00:00Commented Mar 3, 2014 at 17:20
-
$\begingroup$ I was not aware about sparse models. GMMs and SVMs have been my choices for classification tasks because it easy to implement them when i started out. I would go through the literature regarding sparse models now. It would nice of you if you could suggest some references for a beginner like me. Thank you. $\endgroup$hrs– hrs2014年03月03日 17:43:33 +00:00Commented Mar 3, 2014 at 17:43
1 Answer 1
Since your data are extremely sparse, using GMMs or a traditional SVM will result in an over-fit model. By employing methods that exploit the sparsity of the structure, you should get much better results. Regression methods typically add some penalty function as a measure of the amount of non-zero values. This is usually referred to as "regularization". Doing this exactly (under the $L_0$ norm) is difficult, so relaxations are used (lasso: $L_1,ドル ridge: $L_2$).
If you are comfortable with Python, SciKit Learn has implementations for several approaches that utilize either lasso or ridge penalties. Here is an example of Lasso penalty for an SVM model. I would try that and see how well it works. It may require a parameter to adjust the regularization amount. You can do cross-validation to find what parameter works best.