machine learning - Python scikits SVM grid search and classification -


i beginner in scikits , svm , check couple of questions. have sample of 700 items , 35 features , have 3 classes. have array x samples , features scaled using "preprocessing.scale(x)". first step find suitable svm parameters , using grid search nested cross validation (see http://scikit-learn.org/stable/auto_examples/grid_search_digits.html#). using samples (x) in "grid search". during grid search, data split training , testing (using stratifiedkfold). when svm parameters, perform classification divide data training , testing. ok use same data in grid search using during real classification?

is ok use same data in grid search using during real classification?

it ok use data training (fitting) classifier. cross validation, done stratifiedkfold, intended situations don't have enough data hold out validation set while optimizing hyperparameters (the algorithm settings). can use if you're lazy make validation set splitter , want rely on scikit-learn's built-in cross validation :)

the refit option gridsearchcv retrain estimator on full training set after finding optimal settings cross validation.

it is, however, senseless apply trained classifier data grid searched or trained on, since have labels. if want formal evaluation of classifier, should hold out test set beginning , not touch again until you've done grid searching, validation , fitting.


Comments

Popular posts from this blog

jasper reports - Fixed header in Excel using JasperReports -

media player - Android: mediaplayer went away with unhandled events -

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -