6
\$\begingroup\$

I have a DataFrame that contains the data shown below:

 soc [%] r0 [ohm] tau1 [s] tau2 [s] r1 [ohm] r2 [ohm] c1 [farad] c2 [farad]
0 90 0.001539 1725.035378 54.339882 0.001726 0.001614 999309.883552 33667.261120
1 80 0.001385 389.753276 69.807148 0.001314 0.001656 296728.345634 42164.808208
2 70 0.001539 492.320311 53.697439 0.001139 0.001347 432184.454388 39865.959637
3 60 0.001539 656.942558 63.233445 0.000990 0.001515 663400.436465 41727.472274
4 50 0.001539 296.080424 53.948112 0.000918 0.001535 322490.860387 35139.878909
5 40 0.001539 501.978979 72.015509 0.001361 0.001890 368919.408585 38100.665763
6 30 0.001539 585.297624 76.972464 0.001080 0.001872 542060.285388 41114.220492
7 20 0.001385 1308.176576 60.541172 0.001426 0.001799 917348.863136 33659.124096
8 10 0.001539 1194.993755 57.078336 0.002747 0.001851 435028.073957 30839.130201

Given a value z, I want to select a row in the data frame where soc [%] is closest to z. The code below demonstrates my current approach.

import pandas as pd
import time
def rc_params(df, z):
 if z > 90:
 params = df.loc[0]
 elif 80 < z <= 90:
 params = df.loc[0]
 elif 70 < z <= 80:
 params = df.loc[1]
 elif 60 < z <= 70:
 params = df.loc[2]
 elif 50 < z <= 60:
 params = df.loc[3]
 elif 40 < z <= 50:
 params = df.loc[4]
 elif 30 < z <= 40:
 params = df.loc[5]
 elif 20 < z <= 30:
 params = df.loc[6]
 elif 10 < z <= 20:
 params = df.loc[7]
 else:
 params = df.loc[8]
 r0 = params['r0 [ohm]']
 tau1 = params['tau1 [s]']
 tau2 = params['tau2 [s]']
 r1 = params['r1 [ohm]']
 r2 = params['r2 [ohm]']
 return r0, tau1, tau2, r1, r2
start = time.time()
z = 20
df = pd.read_csv('results/soc_rc.csv')
r0, tau1, tau2, r1, r2 = rc_params(df, z)
end = time.time()
print(f"""
z = {z}
r0 = {r0:.4f}
tau1 = {tau1:.4f}
tau2 = {tau2:.4f}
r1 = {r1:.4f}
r2 = {r2:.4f}
run time = {end - start:.4g} s
""")

Results from the above code give:

z = 20
r0 = 0.0014
tau1 = 1308.1766
tau2 = 60.5412
r1 = 0.0014
r2 = 0.0018
run time = 0.002264 s

My approach works fine but is there a better (faster) way to lookup the values in the data frame? There is a lookup function in Pandas but it finds exact values, so if a value doesn't exist then nothing is returned.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Sep 28, 2018 at 20:13
\$\endgroup\$
3
  • \$\begingroup\$ min(max(9 - round(z / 10), 0), 8) \$\endgroup\$ Commented Sep 29, 2018 at 14:49
  • \$\begingroup\$ @GarethRees I implemented params = df.iloc[min(max(9 - round(z / 10), 0), 8)] in the function. This gets rid of the if statements but execution time is the same as my original example. My goal is to find a faster way to lookup the values form the data frame compared to my original example. \$\endgroup\$ Commented Sep 29, 2018 at 15:27
  • \$\begingroup\$ @GarethRees Would working with the data in a NumPy array (instead of a DataFrame) allow me to get faster lookup times? \$\endgroup\$ Commented Sep 29, 2018 at 15:34

2 Answers 2

12
\$\begingroup\$

Not sure if this will help, but I'm using this to find nearest in a sorted column: (time series stuff)

result_index = df['col_to_search'].sub(search_value).abs().idxmin()

.sub(search_value) subtracts search_value from the df[col_to_search] to make the nearest value almost-zero,
.abs() makes the almost-zero the minimum of the column,
.idxmin() yields the df.index of the minimum value, or the closest match to search_value.

I got this approach from a similar search, but didn't note the original author or the site I found it.

greybeard
7,3713 gold badges21 silver badges55 bronze badges
answered Dec 17, 2019 at 18:05
\$\endgroup\$
1
  • \$\begingroup\$ Welcome to Code Review! Do you know if this approach is faster, or is it simpler because it requires fewer lines? \$\endgroup\$ Commented Dec 17, 2019 at 18:45
4
\$\begingroup\$

Adapting from here would be a cleaner way to do what you want.

params = df.iloc[(df['soc [%]']-z).abs().argsort()[:1]]

There might be faster ways if your soc [%] column is fixed with those values.

Also, you should consider not measuring the time for pd.read_csv as that isn't what you are wanting to know the execution for.

answered Sep 29, 2018 at 13:14
\$\endgroup\$
4
  • \$\begingroup\$ Thank you for your suggestion about measuring the execution time. I have updated my question with the new timing result. I also tried your suggestion using argsort() which gets rid of the if statements but unfortunately this is about 3 times slower than my original example. \$\endgroup\$ Commented Sep 29, 2018 at 14:13
  • \$\begingroup\$ Wouldn't idxmax() be better than argsort()[:1]? \$\endgroup\$ Commented Sep 29, 2018 at 14:31
  • \$\begingroup\$ @GarethRees Using idxmax() does not give the correct results; however, idxmin() gives the right results but it is still about 3 times slower than my original example. \$\endgroup\$ Commented Sep 29, 2018 at 14:43
  • \$\begingroup\$ @wigging, you are essentially hardcoding a lookup table with your elifs. Its going to be hard to speed that part up dramatically. In the speedup context, there are a couple options: 1. Speed up the elifs ( you could do this with a binary tree) 2. Speed up the lookups ( you could do this with 1 lookup) r0, tau1, tau2, r1, r2 = params[['r0 [ohm]', 'tau1 [s]', 'tau2 [s]', 'r1 [ohm]', 'r2 [ohm]']] \$\endgroup\$ Commented Sep 30, 2018 at 16:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.