4
\$\begingroup\$

I am cleaning str columns in a Pandas dataframe (see below for an example), and I have been wondering if there are more concise ways or additional inplace methods to do so. What are the general best practices for cleaning columns in Pandas?

import pandas as pd
df = pd.DataFrame.from_dict({"col1": [0, 1, 2, 3], "col2": ["abcd efg", ".%ues", "t12 ^&3", "yupe"]})
df["col2"] = df["col2"].str.lower()
df["col2"] = df["col2"].str.strip()
df["col2"].replace(to_replace="[^a-zA-Z ]", value="", regex=True, inplace=True)
Ben A
10.7k5 gold badges37 silver badges101 bronze badges
asked Sep 29, 2020 at 7:32
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

This is not too bad. It's a good thing you use keyword arguments for the replace method

I always try to keep my original data in its original state, and continue with the cleaned dataframe.

fluent style

This lends itself very well to a kind of fluent style as in this example. I use it too, and use a lot of df.assign, df.pipe, df.query...

In this example I would do something like

df_cleaned = df.assign(
 col2=(
 df["col2"]
 .str.lower()
 .str.strip()
 .replace(to_replace="[^a-zA-Z ]", value="", regex=True)
 )
)

So definately no inplace replacements

answered Sep 29, 2020 at 11:19
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.