1
\$\begingroup\$

I was working on new function with Pandas DataFrame. It's meant to sort values by column name but exclude specific rows from sorting and return sorted dataframe with that has rows with preserved positions.

E.g. I have a DataFrame

 Name marks
rank1 Tom 10
rank2 Jack 30
rank3 nick 50
rank4 juli 5

So I created a function (not finalized, but the idea)

def sort_df_values(df, column_names, ascending=False, fixed_categories=None):
 if fixed_categories is not None and fixed_categories:
 original_positions = {name: position for position, name in enumerate(df.index.values) if name in fixed_categories}
 original_positions = dict(sorted(original_positions.items(), key=operator.itemgetter(1), reverse=True))
 
 excluded = df.loc[fixed_categories]
 included = [name for name in list(df.index) if name not in fixed_categories]
 new_df = df.loc[included].sort_values(column_names, ascending=ascending)
 result = pd.concat([new_df, excluded])
 new_index_values = list(result.index.values)
 while original_positions:
 val, old_position = original_positions.popitem()
 print(val, old_position)
 for current_position, name in enumerate(new_index_values):
 if name == val:
 new_index_values.insert(old_position, new_index_values.pop(current_position))
 break
 result = result.reindex(new_index_values)
 else:
 result = df.sort_values(column_names, ascending=ascending)
 return result

And then I call it like this:

sort_df_values(df, ['marks'], fixed_categories=['rank3'])

The result is correct, I sort all data, but I preserve the excluded rows positions:

 Name marks
rank2 Jack 30
rank1 Tom 10
rank3 nick 50
rank4 juli 5

Is there any better option on how to do this? Maybe there is already some feature in pandas that could help me out on this one?

DataFrame example:

{'Name': {'rank1': 'Tom', 'rank2': 'Jack', 'rank3': 'nick', 'rank4': 'juli'},
 'marks': {'rank1': 10, 'rank2': 30, 'rank3': 50, 'rank4': 5}}
asked Nov 30, 2020 at 13:10
\$\endgroup\$
2
  • \$\begingroup\$ Can you share the data in a more convenient format? \$\endgroup\$ Commented Dec 1, 2020 at 1:04
  • \$\begingroup\$ @AMC added example \$\endgroup\$ Commented Dec 1, 2020 at 6:58

1 Answer 1

1
\$\begingroup\$

When using pandas leverage on functions in API almost every function is vectorized.

You can break down your problem into categories.

  • If fixed is None just sort w.r.t given column_names.
  • Else sort w.r.t given column_names but filter filter fixed indices.
  • insert fixed at previous positions.
def sort_values(df, column_names, ascending=False, fixed=None):
 if fixed is None:
 return df.sort_values(column_names, ascending=ascending)
 idx = df.index.get_indexer(fixed).tolist()
 s_df = df.sort_values(column_names, ascending=ascending).T
 keep_cols = s_df.columns.difference(fixed, sort=False)
 out = s_df[keep_cols]
 for (loc, name) in zip(idx, fixed):
 out.insert(loc, name, s_df[name])
 return out.T

Note: fixed should be a list.

Example runs:

fixed = ['rank3']
sort_values(df, ['marks'], ascending=False, fixed=fixed)
# Name marks
# rank2 Jack 30
# rank1 Tom 10
# rank3 nick 50 # rank3 position fixed.
# rank4 juli 5
fixed = ['rank2', 'rank3']
sort_values(df, ['marks'], ascending=True, fixed=fixed) # ascending order.
# sorted in ascending order
# Name marks
# rank4 juli 5
# rank2 Jack 30 # rank2 position fixed
# rank3 nick 50 # rank3 position fixed
# rank1 Tom 10

Code Review

  • PEP-8 Violations

    • E501 - Lines longer than 79 characters. You have many lines with length greater than 79 characters.
# Examples
original_positions = {name: position for position, name in enumerate(df.index.values) if name in fixed_categories}
# Can be refactored as below 
original_positions = {
 name: position
 for position, name in enumerate(df.index.values)
 if name in fixed_categories
}
original_positions = dict(sorted(original_positions.items(), key=operator.itemgetter(1), reverse=True))
# Can be refactored as below
original_positions = dict(
 sorted(
 original_positions.items(), key=operator.itemgetter(1), reverse=True
 )
)

You can use pep8online * to check if your code is in accordance with PEP-8 standard.

You can use black * (code formatting tool)

black --line-length 79 your_file.py

* I'm not affiliated to pep8online and black.

answered Dec 17, 2020 at 13:49
\$\endgroup\$
1
  • \$\begingroup\$ Thanks very much for the good idea that is sound good what I was looking for. Regarding pep8, I know, this was just a dummy code :) But thanks for the note! \$\endgroup\$ Commented Dec 17, 2020 at 14:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.