Below I created a function to format all the floats in a pandas DataFrame to a specific precision (6 d.p) and convert to string for output to a GUI (hence why I didn't just change the pandas display options). Is this the most efficient way to convert all floats in a pandas DataFrame to strings of a specified format?
import pandas as pd
df = pd.DataFrame({'x':[5.4525242,6.5254245,7.254542],'y':[5.4525242,6.5254245,7.254542]})
def _format_floats(df):
df.loc[:,df.dtypes==float]=df.loc[:,df.dtypes==float].apply(lambda row: ["{:.6f}".format(num) for num in row])
_format_floats(df)
#Example output numbers changes to string with 6 decimals
df.iloc[0,0]
#'5.452524'
1 Answer 1
Since you're already calling
.apply
, I'd stick with that approach to iteration rather than mix that with a list comprehension.It's generally better to avoid making data modifications in-place within a function unless explicitly asked to (via an argument, like
inplace=False
that you'll see in many Pandas methods) or if it's made clear by the functions name and/or docstring.The logic is reasonably complex, so it might be clearer as a named function.
The leading
_
in the function name is usually reserved for "private" functions, whereas this seems to be a general utility function. It's fine if you don't want external code to touch it, that's just not clear from this code snippet.
Here's one way you might re-write the function to follow these tips:
def format_floats(df):
"""Replaces all float columns with string columns formatted to 6 decimal places"""
def format_column(col):
if col.dtype != float:
return col
return col.apply("{:.6f}".format)
return df.apply(format_column)
And a usage example:
In [1]: format_floats(pd.DataFrame([{'a': 1, 'b': 2.3}, {'a': 2, 'b': 3.0}]))
Out[1]:
a b
0 1 2.300000
1 2 3.000000
df.loc[...] == ...
looks like it should just bedf.loc[...] = ...
. \$\endgroup\$