The code:
excluded_columns = ['sort_id', 'weekday', 'sort']
for pct_col in sort_df.columns:
if pct_col in excluded_columns:
continue
else:
sort_df[pct_col + '_pct'] = (sort_df[pct_col] / sort_df['volume'] * 100)
There are no specific concerns I have with this code, it just looks off. I know pandas is quite liberal when it comes to how you choose to achieve the desired outcome, but I just get a funky smell from this code. How could it be more pythonic/pandonic?
-
1\$\begingroup\$ This code doesn't run. Your brackets are imbalanced. Also, it's probably too small of a snippet to review. \$\endgroup\$Reinderien– Reinderien2022年05月06日 18:53:52 +00:00Commented May 6, 2022 at 18:53
-
\$\begingroup\$ Fixed missing bracket. Too small to review? I'm just asking for the method of operating on the dataframe to be reviewed. Should I post the head of the dataframe? \$\endgroup\$rangeseeker– rangeseeker2022年05月06日 19:04:45 +00:00Commented May 6, 2022 at 19:04
-
2\$\begingroup\$ The head of the dataframe would help, but really you need to show your whole program for this to be reviewable in context. \$\endgroup\$Reinderien– Reinderien2022年05月06日 19:06:18 +00:00Commented May 6, 2022 at 19:06
-
3\$\begingroup\$ Posting here is perfectly fine! And it may be difficult to find a more appropriate place on StackExchange, quite honestly. However, this code that you've shown may make sense in some call contexts and far less sense in others. For you to get the most value out of a review, you really should show more context. \$\endgroup\$Reinderien– Reinderien2022年05月06日 19:23:45 +00:00Commented May 6, 2022 at 19:23
-
2\$\begingroup\$ Some possible input and the expected corresponding output would help reviewers (or better, some of your actual unit tests of the code). \$\endgroup\$Toby Speight– Toby Speight2022年05月06日 20:02:02 +00:00Commented May 6, 2022 at 20:02
2 Answers 2
Don't loop. Pandas is designed to be vectorised, which means that 99% of the time when you imagine an operation to require a loop, that loop should be baked into some Numpy or Pandas call rather than you writing it out.
In your case, let's invent a raft of totally bogus data since you haven't shown anything real. We then exclude columns via a negated isin
, vectorise-divide via div
, and concatenate to a new dataframe:
import pandas as pd
from numpy.random import default_rng
rand = default_rng(seed=0)
sort_df = pd.DataFrame(
rand.uniform(low=0, high=100, size=(100, 7)),
columns=(
'sort_id', 'weekday', 'sort', 'volume', 'bananas', 'pears', 'apples',
)
)
excluded_cols = {'sort_id', 'weekday', 'sort', 'volume'}
to_divide = sort_df.loc[:, ~sort_df.columns.isin(excluded_cols)].add_suffix('_pct')
percents = to_divide.div(sort_df.volume/100, axis='rows')
sort_df = pd.concat((sort_df, percents), axis='columns')
It's likely that volume
missing from excluded_cols
is an oversight, since it doesn't make sense to divide it by itself and always get 100%.
-
1\$\begingroup\$ Sorry, I got busy and forgot to post context. This is the perfect answer, very enlightening. Thanks so much. \$\endgroup\$rangeseeker– rangeseeker2022年05月07日 18:04:31 +00:00Commented May 7, 2022 at 18:04
The content of the loop is of the form
if condition:
continue
else:
action()
It would probably be easier to read as
if not condition:
action()
excluded_columns = ['sort_id', 'weekday', 'sort']
for pct_col in sort_df.columns:
if pct_col not in excluded_columns:
sort_df[pct_col + '_pct'] = sort_df[pct_col] / sort_df['volume'] * 100
You might want to consider just looping over the set difference of sort_df.columns
and excluded_columns
, which would eliminate the if
inside the loop. That's certainly valuable if that set of columns will be re-used later (which we don't know - that's why a review of such a short fragment of code can be problematic).
-
\$\begingroup\$ Loops are not called for here. \$\endgroup\$Reinderien– Reinderien2022年05月07日 15:43:22 +00:00Commented May 7, 2022 at 15:43
-
1\$\begingroup\$ I've not used numpy in anger, so just addressing readability here. Your advice is better informed than mine! \$\endgroup\$Toby Speight– Toby Speight2022年05月07日 17:53:18 +00:00Commented May 7, 2022 at 17:53
-
\$\begingroup\$ Not a problem; in that context what you've written is certainly useful. \$\endgroup\$Reinderien– Reinderien2022年05月07日 18:11:09 +00:00Commented May 7, 2022 at 18:11