I am using Scikit-learn to train a classification model. I have both discrete and continuous features in my training data.
I want to do feature selection using mutual information.
The features 1,2 and 3 are discrete. to this end, I try the code below :
mutual_info_classif(x, y, discrete_features=[1, 2, 3])
but it did not work, it gives me the error:
ValueError: could not convert string to float: 'INT'
-
I have apply the code that Mr W.P. McNeill have proposed in stackoverflow.com/q/43643278 but did not worksamira– samira2018年11月25日 17:42:08 +00:00Commented Nov 25, 2018 at 17:42
-
1we need more information in order to be able to help you. It might be useful if you copy a simplified example of your code.silgon– silgon2018年11月25日 18:14:50 +00:00Commented Nov 25, 2018 at 18:14
-
this is my code: from sklearn.feature_selection import mutual_info_classif res_M_train = mutual_info_classif(data_train, Y_train, discrete_features= [1,2,3]) thank yousamira– samira2018年11月25日 18:23:26 +00:00Commented Nov 25, 2018 at 18:23
-
my data is like this :[0.983874,tcp,http,FIN,10,8,816,1172,17.278635,62,252,5976.375,8342.53125,2,2,109.319333,124.932859,5929.211713,192.590406,255,794167371,1624757001,255,0.206572,0.108393,0.098179,82,147,1,184,2,1,1,1,1,2,0,0,1,1,3,0,] as you can see my three first features are categoricale , and I want to calculate the mutual information of each feature: from sklearn.feature_selection import mutual_info_classif res_M_train = mutual_info_classif(data_train, Y_train, discrete_features= [1,2,3])samira– samira2018年11月25日 18:28:58 +00:00Commented Nov 25, 2018 at 18:28
3 Answers 3
A simple example with mutual information classifier:
import numpy as np
from sklearn.feature_selection import mutual_info_classif
X = np.array([[0, 0, 0],
[1, 1, 0],
[2, 0, 1],
[2, 0, 1],
[2, 0, 1]])
y = np.array([0, 1, 2, 2, 1])
mutual_info_classif(X, y, discrete_features=True)
# result: array([ 0.67301167, 0.22314355, 0.39575279]
3 Comments
get_dummies mutual_info_classif can only take numeric data. You need to do label encoding of the categorical features and then run the same code.
x1=x.apply(LabelEncoder().fit_transform)
Then run the exact same code you were running.
mutual_info_classif(x1, y, discrete_features=[1, 2, 3])
2 Comments
This transformer should be used to encode target values, i.e. y, and not the input X. So maybe for this case it is a better option to use OrdinalEncoder..There is a difference between 'discrete' and 'categorical' In this case, function demands the data to be numerical. May be you can use label encoder if you have ordinal features. Else you would have to use one hot encoding for nominal features. You can use pd.get_dummies for this purpose.