0

I am a complete newbie in ML with scikit-learn I just wanted this to work after a lot of time that i spent on learning what ML was its types and so on.


from sklearn import tree
import pandas as pd
import numpy as np
df = pd.read_csv('test.csv')
age = df.Age.to_list()
age = np.array(age).reshape(-1,1)
inc = df.Income.to_list()
inc = np.array(inc).reshape(-1,1)
stud = df.Student.to_list()
stud = np.array(stud).reshape(-1,1)
buy = df.Buy.to_list()
buy = np.array(buy).reshape(-1,1)
X = [age,inc,stud]
y = [[buy]]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
'''
Income:
1 - high
2 - medium
3 - low
Student:
1 - yes
2 - no
'''
age = 34
inc = 1
stud = 2
pred = clf.predict(age,ince,stud)
print(pred)

But i get this error:

Traceback (most recent call last): File "D:\Huzefa\Desktop\ML.py", line 23, in clf = clf.fit(X, y) File "C:\Users\Huzefa\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\tree_classes.py", line 894, in fit X_idx_sorted=X_idx_sorted) File "C:\Users\Huzefa\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\tree_classes.py", line 158, in fit check_y_params)) File "C:\Users\Huzefa\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\base.py", line 429, in _validate_data X = check_array(X, **check_X_params) File "C:\Users\Huzefa\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f return f(**kwargs) File "C:\Users\Huzefa\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 642, in check_array % (array.ndim, estimator_name)) ValueError: Found array with dim 3. Estimator expected <= 2.

if i could just correct my script to make it work i will be motivated to continue further with ML All help is greatly appreciated!

asked Jun 15, 2020 at 6:37
3
  • Try - pred = clf.predict([age,ince,stud]) Commented Jun 15, 2020 at 6:39
  • Thanks for your answer bro! but it wont work. Still the same error Commented Jun 15, 2020 at 6:46
  • buy is already a list, so when you define y do it as y = [buy] or y = np.array(buy).. Commented Jun 15, 2020 at 13:42

1 Answer 1

1

The way you're defining your X and y seems overcomplicated to me, is there a specific reason behind that choice? You could also do the following:

X = df[["Age","Income","Student"]]
y = df.Buy

Also, by doing

clf = clf.fit(X, y)

you're training your decision tree on all the data available. If this is a train dataset and you have a test dataset stored elsewhere, that's okay; if not, you need to split the data first, so you can train the model AND test the efficiency of said training. train_test_split is a useful function for this.

answered Jun 16, 2020 at 12:51
Sign up to request clarification or add additional context in comments.

2 Comments

can you help me with the "train_test_split" function? Actually i am pretty new to ML and am struggling with the basics
@HuzefaUsama You're welcome! If you have new questions I encourage you to make other posts, it'll make it easier for people to anwer you.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.