I get an error in the following code unless I do a fit on the SVC:
This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
Unless I do this:
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
Why I need to do a fit before doing a cross validation?
import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm
iris = datasets.load_iris()
# Split the iris data into train/test data sets with 40% reserved for testing
X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target,
test_size=0.4, random_state=0)
# Build an SVC model for predicting iris classifications using training data
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
# Now measure its performance with the test data
clf.score(X_test, y_test)
# We give cross_val_score a model, the entire data set and its "real" values, and the number of folds:
scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5)
2 Answers 2
You don't. Your cross_val_score runs fine without the fit.
You do need to fit before running score.
Comments
The reason you are seeing that error is because you are asking your estimator (clf) to compute the accuracy of its classifications (with the clf.score method) before it actually knows how to do the classification. To teach clf how to do the classification you have to train it by calling the fit method. This is what the error message is trying to tell you.
score in the above sense has nothing to do with cross-validation, only accuracy. The cross_val_score helper method you use can take an untrained estimator and compute a cross-validated score for you data. This helper trains the estimator for you and that's why you don't have to call fit before using this helper.
See the documentation for cross-validation for more information.
Comments
Explore related questions
See similar questions with these tags.