Logistic Regression for non linearly separable data

Question 1

Iris Data Set consists of three classes in which versicolor and virginica are not linearly separable from each other.

I constructed a subset for these two classes, here is the code

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
iris = load_iris()
x_train = iris.data[50:]
y_train = iris.target[50:]
y_train = y_train - 1
x_train, x_test, y_train, y_test = train_test_split(
 x_train, y_train, test_size=0.33, random_state=2021)

and then I built a Logistic Regression model for this binary classification

def sigmoid(z):
 s = 1 / (1 + np.exp(-z))
 return s
class LogisticRegression:
 def __init__(self, eta=.05, n_epoch=10, model_w=np.full(4, .5), model_b=.0):
 self.eta = eta
 self.n_epoch = n_epoch
 self.model_w = model_w
 self.model_b = model_b
 def activation(self, x):
 z = np.dot(x, self.model_w) + self.model_b
 return sigmoid(z)
 
 def predict(self, x):
 a = self.activation(x)
 if a >= 0.5:
 return 1
 else:
 return 0
 def update_weights(self, x, y, verbose=False):
 a = self.activation(x)
 dz = a - y
 self.model_w -= self.eta * dz * x
 self.model_b -= self.eta * dz
 
 def fit(self, x, y, verbose=False, seed=None):
 indices = np.arange(len(x))
 for i in range(self.n_epoch):
 n_iter = 0
 np.random.seed(seed)
 np.random.shuffle(indices)
 for idx in indices:
 if(self.predict(x[idx])!=y[idx]):
 self.update_weights(x[idx], y[idx], verbose)
 else:
 n_iter += 1
 if(n_iter==len(x)):
 print('model gets 100% train accuracy after {} epoch(s)'.format(i))
 break

I added the param seed for reproduction.

import time
start_time = time.time()
w_mnist = np.full(4, .1)
classifier_mnist = LogisticRegression(.05, 1000, w_mnist)
classifier_mnist.fit(x_train, y_train, seed=0)
print('model trained {:.5f} s'.format(time.time() - start_time))
y_prediction = np.array(list(map(classifier_mnist.predict, x_train)))
acc = np.count_nonzero(y_prediction==y_train)
print('train accuracy {:.5f}'.format(acc/len(y_train)))
y_prediction = np.array(list(map(classifier_mnist.predict, x_test)))
acc = np.count_nonzero(y_prediction==y_test)
print('test accuracy {:.5f}'.format(acc/len(y_test)))

The accuracy is

train accuracy 0.95522
test accuracy 0.96970

the link is my github repo

Question 2

This is a very nice little project but there are some thing to upgrade here :)

Code beautification

Split everything to functions, there is no reason to put logic outside of a function, including the prediction part (this will remove the code duplication) and call everything from a main function. For example a loading function:

def load_and_split_iris(data_cut: int=50, train_test_ratio: float=0,333)
 iris = load_iris()
 x_train = iris.data[data_cut:]
 y_train = iris.target[data_cut:]
 y_train = y_train - 1
 x_train, x_test, y_train, y_test = train_test_split(
 x_train, y_train, test_size=train_test_ratio, random_state=2021)
 return x_train, x_test, y_train, y_test

Magic numbers make your code look bad, turn them into a CODE_CONSTANTS.
I really like type annotations, it will make your code more understandable for future usage and you will not confuse with the types. I added them in the code example in 1. Another example: def fit(self, x: np.array, y: np.array, verbose: bool=False, seed: int=None):. Type annotation can also declare return type, read into that.
String formatting, this: 'model gets 100% train accuracy after {} epoch(s)'.format(i) and turn into f'model gets 100% train accuracy after {i} epoch(s)'.

Bug

You reset the seed every loop (LogisticRegression.fit), in case you are passing None this is fine (since the OS will generate random for you) but if you pass a specific seed the numbers will be the same each time you shuffle. Take the seed setting outside of the loop.

Future work

If you are looking to continue the work I recommend to try and create a multiclass logistic regression.

Yonlif Yonlif 5863 silver badges12 bronze badges · Accepted Answer · 2021-08-06 11:59:48Z

This is a very nice little project but there are some thing to upgrade here :)

Code beautification

Split everything to functions, there is no reason to put logic outside of a function, including the prediction part (this will remove the code duplication) and call everything from a main function. For example a loading function:

def load_and_split_iris(data_cut: int=50, train_test_ratio: float=0,333)
 iris = load_iris()
 x_train = iris.data[data_cut:]
 y_train = iris.target[data_cut:]
 y_train = y_train - 1
 x_train, x_test, y_train, y_test = train_test_split(
 x_train, y_train, test_size=train_test_ratio, random_state=2021)
 return x_train, x_test, y_train, y_test

Magic numbers make your code look bad, turn them into a CODE_CONSTANTS.
I really like type annotations, it will make your code more understandable for future usage and you will not confuse with the types. I added them in the code example in 1. Another example: def fit(self, x: np.array, y: np.array, verbose: bool=False, seed: int=None):. Type annotation can also declare return type, read into that.
String formatting, this: 'model gets 100% train accuracy after {} epoch(s)'.format(i) and turn into f'model gets 100% train accuracy after {i} epoch(s)'.

Bug

You reset the seed every loop (LogisticRegression.fit), in case you are passing None this is fine (since the OS will generate random for you) but if you pass a specific seed the numbers will be the same each time you shuffle. Take the seed setting outside of the loop.

Future work

If you are looking to continue the work I recommend to try and create a multiclass logistic regression.

Stack Exchange Network

Logistic Regression for non linearly separable data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Logistic Regression for non linearly separable data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions