0

I'm trying to encode the non-numeric columns of a pandas df to numeric values. I'm using

df = df.fillna('0')
msk = np.random.rand(len(df)) < 0.8
df_train = df[msk]
df_test = df[~msk]
columns_to_encode = df.select_dtypes(exclude=[np.number]).columns
encoder_dict = {col: LabelEncoder() for col in columns_to_encode }
df_train_enc = df_train
df_test_enc = df_test
for col in columns_to_encode:
 encoder_dict[col].fit_transform(df_train_enc[col])

This, however, throws an error TypeError: '<' not supported between instances of 'str' and 'float'. What am I missing here? I thought LabelEncoder should be able to transform strings to numerics...

asked Apr 13, 2018 at 10:53
2
  • You might have nan values in your data, see: stackoverflow.com/q/43956705/4121573 Commented Apr 13, 2018 at 10:57
  • I don't, see updated post! Commented Apr 13, 2018 at 10:59

1 Answer 1

4

LabelEncoder works on string labels without an issue, so, in case you have mixed types in your data (due to missing values, for example), you can use:

for col in columns_to_encode:
 encoder_dict[col].fit_transform(df_train_enc[col].astype(str))
answered Apr 13, 2018 at 10:58
Sign up to request clarification or add additional context in comments.

2 Comments

Did you try astype(str)?
Yes, that helped, will accept in 7 mins. Thank you!

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.