I have a DataFrame that contains the data shown below:
soc [%] r0 [ohm] tau1 [s] tau2 [s] r1 [ohm] r2 [ohm] c1 [farad] c2 [farad]
0 90 0.001539 1725.035378 54.339882 0.001726 0.001614 999309.883552 33667.261120
1 80 0.001385 389.753276 69.807148 0.001314 0.001656 296728.345634 42164.808208
2 70 0.001539 492.320311 53.697439 0.001139 0.001347 432184.454388 39865.959637
3 60 0.001539 656.942558 63.233445 0.000990 0.001515 663400.436465 41727.472274
4 50 0.001539 296.080424 53.948112 0.000918 0.001535 322490.860387 35139.878909
5 40 0.001539 501.978979 72.015509 0.001361 0.001890 368919.408585 38100.665763
6 30 0.001539 585.297624 76.972464 0.001080 0.001872 542060.285388 41114.220492
7 20 0.001385 1308.176576 60.541172 0.001426 0.001799 917348.863136 33659.124096
8 10 0.001539 1194.993755 57.078336 0.002747 0.001851 435028.073957 30839.130201
Given a value z
, I want to select a row in the data frame where soc [%]
is closest to z
. The code below demonstrates my current approach.
import pandas as pd
import time
def rc_params(df, z):
if z > 90:
params = df.loc[0]
elif 80 < z <= 90:
params = df.loc[0]
elif 70 < z <= 80:
params = df.loc[1]
elif 60 < z <= 70:
params = df.loc[2]
elif 50 < z <= 60:
params = df.loc[3]
elif 40 < z <= 50:
params = df.loc[4]
elif 30 < z <= 40:
params = df.loc[5]
elif 20 < z <= 30:
params = df.loc[6]
elif 10 < z <= 20:
params = df.loc[7]
else:
params = df.loc[8]
r0 = params['r0 [ohm]']
tau1 = params['tau1 [s]']
tau2 = params['tau2 [s]']
r1 = params['r1 [ohm]']
r2 = params['r2 [ohm]']
return r0, tau1, tau2, r1, r2
start = time.time()
z = 20
df = pd.read_csv('results/soc_rc.csv')
r0, tau1, tau2, r1, r2 = rc_params(df, z)
end = time.time()
print(f"""
z = {z}
r0 = {r0:.4f}
tau1 = {tau1:.4f}
tau2 = {tau2:.4f}
r1 = {r1:.4f}
r2 = {r2:.4f}
run time = {end - start:.4g} s
""")
Results from the above code give:
z = 20
r0 = 0.0014
tau1 = 1308.1766
tau2 = 60.5412
r1 = 0.0014
r2 = 0.0018
run time = 0.002264 s
My approach works fine but is there a better (faster) way to lookup the values in the data frame? There is a lookup
function in Pandas but it finds exact values, so if a value doesn't exist then nothing is returned.
2 Answers 2
Not sure if this will help, but I'm using this to find nearest in a sorted column: (time series stuff)
result_index = df['col_to_search'].sub(search_value).abs().idxmin()
.sub(search_value)
subtracts search_value
from the df[col_to_search]
to make the nearest value almost-zero,
.abs()
makes the almost-zero the minimum of the column,
.idxmin()
yields the df.index
of the minimum value, or the closest match to search_value.
I got this approach from a similar search, but didn't note the original author or the site I found it.
-
\$\begingroup\$ Welcome to Code Review! Do you know if this approach is faster, or is it simpler because it requires fewer lines? \$\endgroup\$2019年12月17日 18:45:42 +00:00Commented Dec 17, 2019 at 18:45
Adapting from here would be a cleaner way to do what you want.
params = df.iloc[(df['soc [%]']-z).abs().argsort()[:1]]
There might be faster ways if your soc [%]
column is fixed with those values.
Also, you should consider not measuring the time for pd.read_csv
as that isn't what you are wanting to know the execution for.
-
\$\begingroup\$ Thank you for your suggestion about measuring the execution time. I have updated my question with the new timing result. I also tried your suggestion using
argsort()
which gets rid of theif
statements but unfortunately this is about 3 times slower than my original example. \$\endgroup\$wigging– wigging2018年09月29日 14:13:39 +00:00Commented Sep 29, 2018 at 14:13 -
\$\begingroup\$ Wouldn't
idxmax()
be better thanargsort()[:1]
? \$\endgroup\$Gareth Rees– Gareth Rees2018年09月29日 14:31:03 +00:00Commented Sep 29, 2018 at 14:31 -
\$\begingroup\$ @GarethRees Using
idxmax()
does not give the correct results; however,idxmin()
gives the right results but it is still about 3 times slower than my original example. \$\endgroup\$wigging– wigging2018年09月29日 14:43:32 +00:00Commented Sep 29, 2018 at 14:43 -
\$\begingroup\$ @wigging, you are essentially hardcoding a lookup table with your elifs. Its going to be hard to speed that part up dramatically. In the speedup context, there are a couple options: 1. Speed up the elifs ( you could do this with a binary tree) 2. Speed up the lookups ( you could do this with 1 lookup)
r0, tau1, tau2, r1, r2 = params[['r0 [ohm]', 'tau1 [s]', 'tau2 [s]', 'r1 [ohm]', 'r2 [ohm]']]
\$\endgroup\$Derek Thomas– Derek Thomas2018年09月30日 16:03:24 +00:00Commented Sep 30, 2018 at 16:03
min(max(9 - round(z / 10), 0), 8)
\$\endgroup\$params = df.iloc[min(max(9 - round(z / 10), 0), 8)]
in the function. This gets rid of theif
statements but execution time is the same as my original example. My goal is to find a faster way to lookup the values form the data frame compared to my original example. \$\endgroup\$