Machine learning training, hyperparameter tuning and testing with 3 different models

Question 1

I am trying to solve a multi-class classification involving prediction the outcome of a football match (target variable = Win, Lose or Draw). With a dataset of 2280 rows, which is 6 seasons of football data.

I have features with both numerical and categorical values (which I have encoded using one encoding). The data is split into a train and test set, in a way so the test set is only the most recent season of data.

I wanted to understand as this is my first machine learning project if this overall process looks correct and if there is anything I should be doing better/more optimal.

Splitting data into train test split

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
from sklearn.naive_bayes import MultinomialNB
# Assign our target variable to label
label = match_df['FTR']
# Flatten the label array
y = np.ravel(label)
# Assign all columns expect the FTR column to the features variable
X = match_df.loc[:, match_df.columns != 'FTR']
# Split our data into training and testing sets
# We set shuffle to false as we want to keep the order of the matches in the data frame so we can use the 2022/2023 season as our test set
# Use a test size of 0.1665 as this will give us 380 test samples which is the same as the number of matches in a season
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1665, random_state=0, shuffle=False)

Testing our base model, then performing hyper parameter tuning

from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import GridSearchCV, StratifiedKFold
# Try without normalization also try min max scaler
# Normalize the data set as we have a several features with different data scales
scaler = MinMaxScaler() 
# Fit the scaler to the training set and transform the training set
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Create SVM model
svm_model = SVC(random_state=0)
# Create KNN model
knn_model = KNeighborsClassifier()
# Create Naive Bayes model
nb_model = GaussianNB()
# Create a dictionary of the models
models = {'KNN':knn_model, 'SVM':svm_model, 'Naive_Bayes':nb_model}
# Create the StratifiedKFold object
skf = StratifiedKFold(n_splits=10)
# Train the base models and evaluate them using cross validation
for model_name, model in models.items():
 
 model.fit(X_train, y_train)
 scores = cross_val_score(model, X_train, y_train, cv=skf)
 
 print(f"Accuracy during cross validation for BASE {model_name}: {scores.mean()}")
 
 
# Perform hyper parameter tuning on each model using grid search
# Create a dictionary of hyper parameters for each model we want to tune
svm_parameters = {'kernel':['poly', 'rbf', 'linear'], 'C':[0.1, 1, 10, 100], 'gamma':['scale', 'auto', 0.1, 1], 'degree':list(range(1, 10))}
# For knn neighbor param, we make sure it is odd to prevent ties
knn_parameters = {'n_neighbors':[i for i in range(2, 31) if i % 2 != 0], 'weights':['uniform', 'distance'], 'algorithm':['auto', 'ball_tree', 'kd_tree', 'brute'],
 'leaf_size':[i for i in range(1, 40)], 'p':[1, 2], 'metric':['minkowski', 'euclidean', 'manhattan']}
nb_parameters = {'var_smoothing':[1e-09, 1e-08, 1e-07, 1e-06, 1e-05]}
# Create a dictionary of the parameters
parameters = {'SVM':svm_parameters, 'KNN':knn_parameters, 'Naive_Bayes':nb_parameters}
# import scoring metrics
from sklearn.metrics import accuracy_score, balanced_accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.metrics import make_scorer
scoring = {
 'accuracy': make_scorer(accuracy_score),
 'balanced_accuracy': make_scorer(balanced_accuracy_score),
 'precision': make_scorer(precision_score, average='macro'),
 'recall': make_scorer(recall_score, average='macro'),
 'f1': make_scorer(f1_score, average='macro')
}
# Loop through each model and perform hyper parameter tuning
for model_name, model in models.items():
 print(f"Performing hyper parameter tuning on {model_name}...")
 
 # Create a grid search object and fit it to the data to perform hyper parameter tuning
 search = GridSearchCV(estimator=model, param_grid=parameters[model_name], scoring=scoring, refit='accuracy', cv=skf, n_jobs=-1)
 
 # Fit the grid search object to the train data
 searchResults = search.fit(X_train, y_train)
 
 # Get the optimal hyper parameters and corresponding accuracy score
 print(f"Best parameters: {search.best_params_}, Best Score: {search.best_score_}")
 
 print("Evaluating the model on the test data...")
 bestModel = searchResults.best_estimator_
 print(bestModel)
 print(f"Test Score: {bestModel.score(X_test, y_test)}\n\n")
 
 # Fit the best parameters to the model
 models[model_name] = bestModel

Final test of our hyper parameter tuning models and display their confusion matrix

for model_name, model in models.items():
 # Produce a confusion matrix for the final model
 conf_matrix = confusion_matrix(y_test, model.predict(X_test))
 # Plot the confusion matrix
 sns.heatmap(conf_matrix, annot=True, cmap='Blues')
 # Set our x, y labels and title
 plt.xlabel('Predicted labels')
 plt.ylabel('True labels')
 plt.title(f'Confusion Matrix for {model_name}')
 # Display the plot
 plt.show()

Question 2

missing review context

label = match_df['FTR']

This line makes no sense, as it will produce "NameError: name 'match_df' is not defined"; we didn't define it in previous code such as imports.

extra temp var

Maybe do it all in one go? Since we do not later refer to label.

y = np.ravel(match_df['FTR'])

I do thank you for the helpful reminder that ravel() means "flatten".

(Some other comments, like "fit scaler ... transform", just say what the code says and could be elided.)

nit, typo: "expect" --> "except"

comment could be code

# Use a test size of 0.1665 as this will give us 380 test samples which is the same as the number of matches in a season

This is a helpful comment and I thank you for it. (Oddly, final digit is 5 rather than 7.) It makes an assertion about how our data relates to the real world.

Assertions are more believable when they are code instead of prose. Usually comments start out being true, but then they bit-rot as the code changes and the comments don't keep up. Consider rephrasing this as

matches_per_season = 380
test_size = matches_per_season / len(y)
assert round(test_size, 4) == 0.1667

But wait! Perhaps confusingly, perhaps conveniently, train_test_split behaves differently according to whether the parameter is in the unit interval or is a large integer. We could more clearly convey Author's Intent by simply saying

matches_per_season = 380
assert len(y) == 6 * matches_per_season # dataset covers six seasons
..., ..., ..., ... = train_test_split(X, y, test_size=matches_per_season, ... )

magic number

skf = StratifiedKFold(n_splits=10)

Number of splits would have defaulted to 5, a natural fit for the number of seasons you're testing on. Splitting into {first, second} half of each season seems arbitrary, and worth a # comment. On the bright side, at least we're using a multiple of five. My maintenance concern is that someone may change this to, say, 8, and then observe bigger effect than anticipated, puzzling them.

Had you shuffled the time series of scores up top, none of these concerns would be relevant here. But having decided to preserve the time series, that colors how we look at these subsequent pipeline stages.

step parameter

# For knn neighbor param, we make sure it is odd to prevent ties
knn_parameters = {'n_neighbors':[i for i in range(2, 31) if i % 2 != 0]

The "tie breaking" comment is helpful.

Ending the range at odd 31 is unusual, given that the final i tested will be even 30 which is rejected. Starting at 2 is similarly unusual, and does not aid human cognition.

range takes a 3rd parameter. Prefer:
[i for i in range(3, 30, 2)] which of course is simply
list(range(3, 30, 2))

The grid searching could be better motivated. As stated it looks like "throw stuff at the wall to see what sticks".

extract helpers

Each time you write a helpful comment like this:

# Loop through each model and perform hyper parameter tuning

it suggests that you might have written something like def tune_hyperparameters():

The biggest advantage of extracting such helpers is that all their local variables go out-of-scope when they return, thereby reducing coupling and the cognitive load that comes from juggling all those global variables. Clearly showing what is fed into a function, and what it produces, is helpful for future maintenance engineers.

It will also give you a place to add a """docstring""". And when a function morphs into doing something a little different, a function name that "lies" is more likely to be updated to tell the truth, compared with prose trying to tell the same story.

It also admits of unit testing.

choosing identifiers

 searchResults = ...
 bestModel = ...

Pep-8 asks that you spell these search_results and best_model.

We preserved causal (chronological) order among example rows, but it's not clear that any of the models benefit from that.

Kudos for labeling your axes!

This ML exercise hews pretty closely to standard textbook formats, and achieves its design goals.

I would be willing to delegate or assign maintenance tasks on this code.

Question 3

Thank you for the help, I will go ahead and adjust what you recommend. In terms of the overall process does it seem correct? e.g. the way I am training and testing my code? As there seems to be a million different answers and methods for a machine learning structure, and as I dont have much previous experience not sure if i am making a stupid mistake. Also regarding the "magic number" section, are you suggesting a cross fold of 5 would be better in my scenario then 10?

Question 4

I was saying the non-default setting wasn't motivated by anything in the Review Context nor by any comments in the code, so it had me scratching my head why 10 is somehow "better" than 5. // Does it mostly seem "correct", modulo various critiques? Yes, it does, it seems close to standard textbooks and web tutorials. In particular I didn't notice any data leakage of "test" rows into the "train" dataset, something I was nervous about as I reviewed the hyper-parameter tuning and scoring code. We could have better test/train separation if we had more helper functions --> fewer globals.

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2024-02-05 16:06:19Z

missing review context

label = match_df['FTR']

This line makes no sense, as it will produce "NameError: name 'match_df' is not defined"; we didn't define it in previous code such as imports.

extra temp var

Maybe do it all in one go? Since we do not later refer to label.

y = np.ravel(match_df['FTR'])

I do thank you for the helpful reminder that ravel() means "flatten".

(Some other comments, like "fit scaler ... transform", just say what the code says and could be elided.)

nit, typo: "expect" --> "except"

comment could be code

# Use a test size of 0.1665 as this will give us 380 test samples which is the same as the number of matches in a season

This is a helpful comment and I thank you for it. (Oddly, final digit is 5 rather than 7.) It makes an assertion about how our data relates to the real world.

Assertions are more believable when they are code instead of prose. Usually comments start out being true, but then they bit-rot as the code changes and the comments don't keep up. Consider rephrasing this as

matches_per_season = 380
test_size = matches_per_season / len(y)
assert round(test_size, 4) == 0.1667

But wait! Perhaps confusingly, perhaps conveniently, train_test_split behaves differently according to whether the parameter is in the unit interval or is a large integer. We could more clearly convey Author's Intent by simply saying

matches_per_season = 380
assert len(y) == 6 * matches_per_season # dataset covers six seasons
..., ..., ..., ... = train_test_split(X, y, test_size=matches_per_season, ... )

magic number

skf = StratifiedKFold(n_splits=10)

Number of splits would have defaulted to 5, a natural fit for the number of seasons you're testing on. Splitting into {first, second} half of each season seems arbitrary, and worth a # comment. On the bright side, at least we're using a multiple of five. My maintenance concern is that someone may change this to, say, 8, and then observe bigger effect than anticipated, puzzling them.

Had you shuffled the time series of scores up top, none of these concerns would be relevant here. But having decided to preserve the time series, that colors how we look at these subsequent pipeline stages.

step parameter

# For knn neighbor param, we make sure it is odd to prevent ties
knn_parameters = {'n_neighbors':[i for i in range(2, 31) if i % 2 != 0]

The "tie breaking" comment is helpful.

Ending the range at odd 31 is unusual, given that the final i tested will be even 30 which is rejected. Starting at 2 is similarly unusual, and does not aid human cognition.

range takes a 3rd parameter. Prefer:
[i for i in range(3, 30, 2)] which of course is simply
list(range(3, 30, 2))

The grid searching could be better motivated. As stated it looks like "throw stuff at the wall to see what sticks".

extract helpers

Each time you write a helpful comment like this:

# Loop through each model and perform hyper parameter tuning

it suggests that you might have written something like def tune_hyperparameters():

The biggest advantage of extracting such helpers is that all their local variables go out-of-scope when they return, thereby reducing coupling and the cognitive load that comes from juggling all those global variables. Clearly showing what is fed into a function, and what it produces, is helpful for future maintenance engineers.

It will also give you a place to add a """docstring""". And when a function morphs into doing something a little different, a function name that "lies" is more likely to be updated to tell the truth, compared with prose trying to tell the same story.

It also admits of unit testing.

choosing identifiers

 searchResults = ...
 bestModel = ...

Pep-8 asks that you spell these search_results and best_model.

We preserved causal (chronological) order among example rows, but it's not clear that any of the models benefit from that.

Kudos for labeling your axes!

This ML exercise hews pretty closely to standard textbook formats, and achieves its design goals.

I would be willing to delegate or assign maintenance tasks on this code.

Thank you for the help, I will go ahead and adjust what you recommend. In terms of the overall process does it seem correct? e.g. the way I am training and testing my code? As there seems to be a million different answers and methods for a machine learning structure, and as I dont have much previous experience not sure if i am making a stupid mistake. Also regarding the "magic number" section, are you suggesting a cross fold of 5 would be better in my scenario then 10?
I was saying the non-default setting wasn't motivated by anything in the Review Context nor by any comments in the code, so it had me scratching my head why 10 is somehow "better" than 5. // Does it mostly seem "correct", modulo various critiques? Yes, it does, it seems close to standard textbooks and web tutorials. In particular I didn't notice any data leakage of "test" rows into the "train" dataset, something I was nervous about as I reviewed the hyper-parameter tuning and scoring code. We could have better test/train separation if we had more helper functions --> fewer globals.

Stack Exchange Network

Machine learning training, hyperparameter tuning and testing with 3 different models

1 Answer 1

missing review context

extra temp var

comment could be code

magic number

step parameter

extract helpers

choosing identifiers

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Machine learning training, hyperparameter tuning and testing with 3 different models

1 Answer 1

missing review context

extra temp var

comment could be code

magic number

step parameter

extract helpers

choosing identifiers

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions