1
\$\begingroup\$

Is the following an acceptable way to change an object column to numeric if cardinality threshold (cardinality_threshold) is being breached? I think it would be good to check if column values are numeric though. Thanks!

data = {
 'col1':['1', '2', '3'],
 'col2':['1', '1', '2'], 
}
df = pd.DataFrame(data)
df
cardinality_threshold = 2
cols = [i for i in df.columns]
for i in cols: 
 if(len(df[i].value_counts()) > cardinality_threshold) :
 df[i] = pd.to_numeric(df[i]) # , errors = 'coerce'
 
print(df.info())
asked Mar 30, 2022 at 13:45
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$
CARDINALITY_THRESHOLD = 2
breached = df.columns[df.nunique() > CARDINALITY_THRESHOLD]
df[breached] = df[breached].apply(pd.to_numeric, errors='coerce')
>>> df.dtypes
# col1 int64
# col2 object
# dtype: object

I think it would be good to check if column values are numeric though.

Note that to_numeric already skips numeric columns, so it's simplest to just let pandas handle it.

If you still want to explicitly exclude numeric columns:

breached = df.columns[df.nunique() > CARDINALITY_THRESHOLD]
non_numeric = df.select_dtypes(exclude='number').columns
cols = non_numeric.intersection(breached)
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
answered Mar 31, 2022 at 4:31
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.