Best practice for cleaning Pandas dataframe columns

Asked 4 years, 11 months ago

Viewed 140 times

\$\begingroup\$

I am cleaning str columns in a Pandas dataframe (see below for an example), and I have been wondering if there are more concise ways or additional inplace methods to do so. What are the general best practices for cleaning columns in Pandas?

import pandas as pd
df = pd.DataFrame.from_dict({"col1": [0, 1, 2, 3], "col2": ["abcd efg", ".%ues", "t12 ^&3", "yupe"]})
df["col2"] = df["col2"].str.lower()
df["col2"] = df["col2"].str.strip()
df["col2"].replace(to_replace="[^a-zA-Z ]", value="", regex=True, inplace=True)

edited Sep 29, 2020 at 9:20

Ben A's user avatar

Ben A

10.7k5 gold badges37 silver badges101 bronze badges

asked Sep 29, 2020 at 7:32

p.vitzliputzli's user avatar

p.vitzliputzli p.vitzliputzli

2371 silver badge7 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

This is not too bad. It's a good thing you use keyword arguments for the replace method

I always try to keep my original data in its original state, and continue with the cleaned dataframe.

fluent style

This lends itself very well to a kind of fluent style as in this example. I use it too, and use a lot of df.assign, df.pipe, df.query...

In this example I would do something like

df_cleaned = df.assign(
 col2=(
 df["col2"]
 .str.lower()
 .str.strip()
 .replace(to_replace="[^a-zA-Z ]", value="", regex=True)
 )
)

So definately no inplace replacements

edited Sep 29, 2020 at 14:32

answered Sep 29, 2020 at 11:19

Maarten Fabré's user avatar

Maarten Fabré Maarten Fabré

9,3901 gold badge15 silver badges27 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Best practice for cleaning Pandas dataframe columns

1 Answer 1

fluent style

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Best practice for cleaning Pandas dataframe columns

1 Answer 1

fluent style

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions