I was working on new function with Pandas DataFrame. It's meant to sort values by column name but exclude specific rows from sorting and return sorted dataframe with that has rows with preserved positions.
E.g. I have a DataFrame
Name marks
rank1 Tom 10
rank2 Jack 30
rank3 nick 50
rank4 juli 5
So I created a function (not finalized, but the idea)
def sort_df_values(df, column_names, ascending=False, fixed_categories=None):
if fixed_categories is not None and fixed_categories:
original_positions = {name: position for position, name in enumerate(df.index.values) if name in fixed_categories}
original_positions = dict(sorted(original_positions.items(), key=operator.itemgetter(1), reverse=True))
excluded = df.loc[fixed_categories]
included = [name for name in list(df.index) if name not in fixed_categories]
new_df = df.loc[included].sort_values(column_names, ascending=ascending)
result = pd.concat([new_df, excluded])
new_index_values = list(result.index.values)
while original_positions:
val, old_position = original_positions.popitem()
print(val, old_position)
for current_position, name in enumerate(new_index_values):
if name == val:
new_index_values.insert(old_position, new_index_values.pop(current_position))
break
result = result.reindex(new_index_values)
else:
result = df.sort_values(column_names, ascending=ascending)
return result
And then I call it like this:
sort_df_values(df, ['marks'], fixed_categories=['rank3'])
The result is correct, I sort all data, but I preserve the excluded rows positions:
Name marks
rank2 Jack 30
rank1 Tom 10
rank3 nick 50
rank4 juli 5
Is there any better option on how to do this? Maybe there is already some feature in pandas that could help me out on this one?
DataFrame example:
{'Name': {'rank1': 'Tom', 'rank2': 'Jack', 'rank3': 'nick', 'rank4': 'juli'},
'marks': {'rank1': 10, 'rank2': 30, 'rank3': 50, 'rank4': 5}}
-
\$\begingroup\$ Can you share the data in a more convenient format? \$\endgroup\$AMC– AMC2020年12月01日 01:04:53 +00:00Commented Dec 1, 2020 at 1:04
-
\$\begingroup\$ @AMC added example \$\endgroup\$simkusr– simkusr2020年12月01日 06:58:33 +00:00Commented Dec 1, 2020 at 6:58
1 Answer 1
When using pandas
leverage on functions in API almost every function is vectorized.
You can break down your problem into categories.
- If
fixed
isNone
just sort w.r.t givencolumn_names
. - Else sort w.r.t given
column_names
but filter filterfixed
indices. - insert
fixed
at previous positions.
def sort_values(df, column_names, ascending=False, fixed=None):
if fixed is None:
return df.sort_values(column_names, ascending=ascending)
idx = df.index.get_indexer(fixed).tolist()
s_df = df.sort_values(column_names, ascending=ascending).T
keep_cols = s_df.columns.difference(fixed, sort=False)
out = s_df[keep_cols]
for (loc, name) in zip(idx, fixed):
out.insert(loc, name, s_df[name])
return out.T
Note: fixed
should be a list
.
Example runs:
fixed = ['rank3']
sort_values(df, ['marks'], ascending=False, fixed=fixed)
# Name marks
# rank2 Jack 30
# rank1 Tom 10
# rank3 nick 50 # rank3 position fixed.
# rank4 juli 5
fixed = ['rank2', 'rank3']
sort_values(df, ['marks'], ascending=True, fixed=fixed) # ascending order.
# sorted in ascending order
# Name marks
# rank4 juli 5
# rank2 Jack 30 # rank2 position fixed
# rank3 nick 50 # rank3 position fixed
# rank1 Tom 10
Code Review
PEP-8
ViolationsE501
- Lines longer than 79 characters. You have many lines with length greater than 79 characters.
# Examples
original_positions = {name: position for position, name in enumerate(df.index.values) if name in fixed_categories}
# Can be refactored as below
original_positions = {
name: position
for position, name in enumerate(df.index.values)
if name in fixed_categories
}
original_positions = dict(sorted(original_positions.items(), key=operator.itemgetter(1), reverse=True))
# Can be refactored as below
original_positions = dict(
sorted(
original_positions.items(), key=operator.itemgetter(1), reverse=True
)
)
You can use pep8online * to check if your code is in accordance with PEP-8 standard.
You can use black
* (code formatting tool)
black --line-length 79 your_file.py
* I'm not affiliated to pep8online
and black
.
-
\$\begingroup\$ Thanks very much for the good idea that is sound good what I was looking for. Regarding pep8, I know, this was just a dummy code :) But thanks for the note! \$\endgroup\$simkusr– simkusr2020年12月17日 14:01:46 +00:00Commented Dec 17, 2020 at 14:01