1
\$\begingroup\$

Iris Data Set consists of three classes in which versicolor and virginica are not linearly separable from each other.

I constructed a subset for these two classes, here is the code

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
iris = load_iris()
x_train = iris.data[50:]
y_train = iris.target[50:]
y_train = y_train - 1
x_train, x_test, y_train, y_test = train_test_split(
 x_train, y_train, test_size=0.33, random_state=2021)

and then I built a Logistic Regression model for this binary classification

def sigmoid(z):
 s = 1 / (1 + np.exp(-z))
 return s
class LogisticRegression:
 def __init__(self, eta=.05, n_epoch=10, model_w=np.full(4, .5), model_b=.0):
 self.eta = eta
 self.n_epoch = n_epoch
 self.model_w = model_w
 self.model_b = model_b
 def activation(self, x):
 z = np.dot(x, self.model_w) + self.model_b
 return sigmoid(z)
 
 def predict(self, x):
 a = self.activation(x)
 if a >= 0.5:
 return 1
 else:
 return 0
 def update_weights(self, x, y, verbose=False):
 a = self.activation(x)
 dz = a - y
 self.model_w -= self.eta * dz * x
 self.model_b -= self.eta * dz
 
 def fit(self, x, y, verbose=False, seed=None):
 indices = np.arange(len(x))
 for i in range(self.n_epoch):
 n_iter = 0
 np.random.seed(seed)
 np.random.shuffle(indices)
 for idx in indices:
 if(self.predict(x[idx])!=y[idx]):
 self.update_weights(x[idx], y[idx], verbose)
 else:
 n_iter += 1
 if(n_iter==len(x)):
 print('model gets 100% train accuracy after {} epoch(s)'.format(i))
 break

I added the param seed for reproduction.

import time
start_time = time.time()
w_mnist = np.full(4, .1)
classifier_mnist = LogisticRegression(.05, 1000, w_mnist)
classifier_mnist.fit(x_train, y_train, seed=0)
print('model trained {:.5f} s'.format(time.time() - start_time))
y_prediction = np.array(list(map(classifier_mnist.predict, x_train)))
acc = np.count_nonzero(y_prediction==y_train)
print('train accuracy {:.5f}'.format(acc/len(y_train)))
y_prediction = np.array(list(map(classifier_mnist.predict, x_test)))
acc = np.count_nonzero(y_prediction==y_test)
print('test accuracy {:.5f}'.format(acc/len(y_test)))

The accuracy is

train accuracy 0.95522
test accuracy 0.96970

the link is my github repo

asked Aug 6, 2021 at 11:00
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

This is a very nice little project but there are some thing to upgrade here :)


Code beautification

  1. Split everything to functions, there is no reason to put logic outside of a function, including the prediction part (this will remove the code duplication) and call everything from a main function. For example a loading function:
def load_and_split_iris(data_cut: int=50, train_test_ratio: float=0,333)
 iris = load_iris()
 x_train = iris.data[data_cut:]
 y_train = iris.target[data_cut:]
 y_train = y_train - 1
 x_train, x_test, y_train, y_test = train_test_split(
 x_train, y_train, test_size=train_test_ratio, random_state=2021)
 return x_train, x_test, y_train, y_test
  1. Magic numbers make your code look bad, turn them into a CODE_CONSTANTS.
  2. I really like type annotations, it will make your code more understandable for future usage and you will not confuse with the types. I added them in the code example in 1. Another example: def fit(self, x: np.array, y: np.array, verbose: bool=False, seed: int=None):. Type annotation can also declare return type, read into that.
  3. String formatting, this: 'model gets 100% train accuracy after {} epoch(s)'.format(i) and turn into f'model gets 100% train accuracy after {i} epoch(s)'.

Bug

You reset the seed every loop (LogisticRegression.fit), in case you are passing None this is fine (since the OS will generate random for you) but if you pass a specific seed the numbers will be the same each time you shuffle. Take the seed setting outside of the loop.

Future work

If you are looking to continue the work I recommend to try and create a multiclass logistic regression.

answered Aug 6, 2021 at 11:59
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.