Pandas DataFrame.sort_values() new method to exclude by row name

Question 1

I was working on new function with Pandas DataFrame. It's meant to sort values by column name but exclude specific rows from sorting and return sorted dataframe with that has rows with preserved positions.

E.g. I have a DataFrame

 Name marks
rank1 Tom 10
rank2 Jack 30
rank3 nick 50
rank4 juli 5

So I created a function (not finalized, but the idea)

def sort_df_values(df, column_names, ascending=False, fixed_categories=None):
 if fixed_categories is not None and fixed_categories:
 original_positions = {name: position for position, name in enumerate(df.index.values) if name in fixed_categories}
 original_positions = dict(sorted(original_positions.items(), key=operator.itemgetter(1), reverse=True))
 
 excluded = df.loc[fixed_categories]
 included = [name for name in list(df.index) if name not in fixed_categories]
 new_df = df.loc[included].sort_values(column_names, ascending=ascending)
 result = pd.concat([new_df, excluded])
 new_index_values = list(result.index.values)
 while original_positions:
 val, old_position = original_positions.popitem()
 print(val, old_position)
 for current_position, name in enumerate(new_index_values):
 if name == val:
 new_index_values.insert(old_position, new_index_values.pop(current_position))
 break
 result = result.reindex(new_index_values)
 else:
 result = df.sort_values(column_names, ascending=ascending)
 return result

And then I call it like this:

sort_df_values(df, ['marks'], fixed_categories=['rank3'])

The result is correct, I sort all data, but I preserve the excluded rows positions:

 Name marks
rank2 Jack 30
rank1 Tom 10
rank3 nick 50
rank4 juli 5

Is there any better option on how to do this? Maybe there is already some feature in pandas that could help me out on this one?

DataFrame example:

{'Name': {'rank1': 'Tom', 'rank2': 'Jack', 'rank3': 'nick', 'rank4': 'juli'},
 'marks': {'rank1': 10, 'rank2': 30, 'rank3': 50, 'rank4': 5}}

Question 2

Can you share the data in a more convenient format?

Question 3

@AMC added example

Question 4

When using pandas leverage on functions in API almost every function is vectorized.

You can break down your problem into categories.

If fixed is None just sort w.r.t given column_names.
Else sort w.r.t given column_names but filter filter fixed indices.
insert fixed at previous positions.

def sort_values(df, column_names, ascending=False, fixed=None):
 if fixed is None:
 return df.sort_values(column_names, ascending=ascending)
 idx = df.index.get_indexer(fixed).tolist()
 s_df = df.sort_values(column_names, ascending=ascending).T
 keep_cols = s_df.columns.difference(fixed, sort=False)
 out = s_df[keep_cols]
 for (loc, name) in zip(idx, fixed):
 out.insert(loc, name, s_df[name])
 return out.T

Note: fixed should be a list.

Example runs:

fixed = ['rank3']
sort_values(df, ['marks'], ascending=False, fixed=fixed)
# Name marks
# rank2 Jack 30
# rank1 Tom 10
# rank3 nick 50 # rank3 position fixed.
# rank4 juli 5
fixed = ['rank2', 'rank3']
sort_values(df, ['marks'], ascending=True, fixed=fixed) # ascending order.
# sorted in ascending order
# Name marks
# rank4 juli 5
# rank2 Jack 30 # rank2 position fixed
# rank3 nick 50 # rank3 position fixed
# rank1 Tom 10

Code Review

PEP-8 Violations
- E501 - Lines longer than 79 characters. You have many lines with length greater than 79 characters.

# Examples
original_positions = {name: position for position, name in enumerate(df.index.values) if name in fixed_categories}
# Can be refactored as below 
original_positions = {
 name: position
 for position, name in enumerate(df.index.values)
 if name in fixed_categories
}
original_positions = dict(sorted(original_positions.items(), key=operator.itemgetter(1), reverse=True))
# Can be refactored as below
original_positions = dict(
 sorted(
 original_positions.items(), key=operator.itemgetter(1), reverse=True
 )
)

You can use pep8online ^* to check if your code is in accordance with PEP-8 standard.

You can use black ^* (code formatting tool)

black --line-length 79 your_file.py

_{_{* I'm not affiliated to pep8online and black.}}

Question 5

Thanks very much for the good idea that is sound good what I was looking for. Regarding pep8, I know, this was just a dummy code :) But thanks for the note!

Ch3steR Ch3steR 6447 silver badges14 bronze badges · Accepted Answer · 2020-12-17 13:49:52Z

When using pandas leverage on functions in API almost every function is vectorized.

You can break down your problem into categories.

If fixed is None just sort w.r.t given column_names.
Else sort w.r.t given column_names but filter filter fixed indices.
insert fixed at previous positions.

def sort_values(df, column_names, ascending=False, fixed=None):
 if fixed is None:
 return df.sort_values(column_names, ascending=ascending)
 idx = df.index.get_indexer(fixed).tolist()
 s_df = df.sort_values(column_names, ascending=ascending).T
 keep_cols = s_df.columns.difference(fixed, sort=False)
 out = s_df[keep_cols]
 for (loc, name) in zip(idx, fixed):
 out.insert(loc, name, s_df[name])
 return out.T

Note: fixed should be a list.

Example runs:

fixed = ['rank3']
sort_values(df, ['marks'], ascending=False, fixed=fixed)
# Name marks
# rank2 Jack 30
# rank1 Tom 10
# rank3 nick 50 # rank3 position fixed.
# rank4 juli 5
fixed = ['rank2', 'rank3']
sort_values(df, ['marks'], ascending=True, fixed=fixed) # ascending order.
# sorted in ascending order
# Name marks
# rank4 juli 5
# rank2 Jack 30 # rank2 position fixed
# rank3 nick 50 # rank3 position fixed
# rank1 Tom 10

Code Review

PEP-8 Violations
- E501 - Lines longer than 79 characters. You have many lines with length greater than 79 characters.

# Examples
original_positions = {name: position for position, name in enumerate(df.index.values) if name in fixed_categories}
# Can be refactored as below 
original_positions = {
 name: position
 for position, name in enumerate(df.index.values)
 if name in fixed_categories
}
original_positions = dict(sorted(original_positions.items(), key=operator.itemgetter(1), reverse=True))
# Can be refactored as below
original_positions = dict(
 sorted(
 original_positions.items(), key=operator.itemgetter(1), reverse=True
 )
)

You can use pep8online ^* to check if your code is in accordance with PEP-8 standard.

You can use black ^* (code formatting tool)

black --line-length 79 your_file.py

_{_{* I'm not affiliated to pep8online and black.}}

Thanks very much for the good idea that is sound good what I was looking for. Regarding pep8, I know, this was just a dummy code :) But thanks for the note!

Stack Exchange Network

Pandas DataFrame.sort_values() new method to exclude by row name

1 Answer 1

Example runs:

Code Review

`PEP-8` Violations

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Pandas DataFrame.sort_values() new method to exclude by row name

1 Answer 1

Example runs:

Code Review

PEP-8 Violations

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`PEP-8` Violations