Lookup closest value in Pandas DataFrame

Question 1

I have a DataFrame that contains the data shown below:

 soc [%] r0 [ohm] tau1 [s] tau2 [s] r1 [ohm] r2 [ohm] c1 [farad] c2 [farad]
0 90 0.001539 1725.035378 54.339882 0.001726 0.001614 999309.883552 33667.261120
1 80 0.001385 389.753276 69.807148 0.001314 0.001656 296728.345634 42164.808208
2 70 0.001539 492.320311 53.697439 0.001139 0.001347 432184.454388 39865.959637
3 60 0.001539 656.942558 63.233445 0.000990 0.001515 663400.436465 41727.472274
4 50 0.001539 296.080424 53.948112 0.000918 0.001535 322490.860387 35139.878909
5 40 0.001539 501.978979 72.015509 0.001361 0.001890 368919.408585 38100.665763
6 30 0.001539 585.297624 76.972464 0.001080 0.001872 542060.285388 41114.220492
7 20 0.001385 1308.176576 60.541172 0.001426 0.001799 917348.863136 33659.124096
8 10 0.001539 1194.993755 57.078336 0.002747 0.001851 435028.073957 30839.130201

Given a value z, I want to select a row in the data frame where soc [%] is closest to z. The code below demonstrates my current approach.

import pandas as pd
import time
def rc_params(df, z):
 if z > 90:
 params = df.loc[0]
 elif 80 < z <= 90:
 params = df.loc[0]
 elif 70 < z <= 80:
 params = df.loc[1]
 elif 60 < z <= 70:
 params = df.loc[2]
 elif 50 < z <= 60:
 params = df.loc[3]
 elif 40 < z <= 50:
 params = df.loc[4]
 elif 30 < z <= 40:
 params = df.loc[5]
 elif 20 < z <= 30:
 params = df.loc[6]
 elif 10 < z <= 20:
 params = df.loc[7]
 else:
 params = df.loc[8]
 r0 = params['r0 [ohm]']
 tau1 = params['tau1 [s]']
 tau2 = params['tau2 [s]']
 r1 = params['r1 [ohm]']
 r2 = params['r2 [ohm]']
 return r0, tau1, tau2, r1, r2
start = time.time()
z = 20
df = pd.read_csv('results/soc_rc.csv')
r0, tau1, tau2, r1, r2 = rc_params(df, z)
end = time.time()
print(f"""
z = {z}
r0 = {r0:.4f}
tau1 = {tau1:.4f}
tau2 = {tau2:.4f}
r1 = {r1:.4f}
r2 = {r2:.4f}
run time = {end - start:.4g} s
""")

Results from the above code give:

z = 20
r0 = 0.0014
tau1 = 1308.1766
tau2 = 60.5412
r1 = 0.0014
r2 = 0.0018
run time = 0.002264 s

My approach works fine but is there a better (faster) way to lookup the values in the data frame? There is a lookup function in Pandas but it finds exact values, so if a value doesn't exist then nothing is returned.

Question 2

min(max(9 - round(z / 10), 0), 8)

Question 3

@GarethRees I implemented params = df.iloc[min(max(9 - round(z / 10), 0), 8)] in the function. This gets rid of the if statements but execution time is the same as my original example. My goal is to find a faster way to lookup the values form the data frame compared to my original example.

Question 4

@GarethRees Would working with the data in a NumPy array (instead of a DataFrame) allow me to get faster lookup times?

Question 5

Not sure if this will help, but I'm using this to find nearest in a sorted column: (time series stuff)

result_index = df['col_to_search'].sub(search_value).abs().idxmin()

.sub(search_value) subtracts search_value from the df[col_to_search] to make the nearest value almost-zero,
.abs() makes the almost-zero the minimum of the column,
.idxmin() yields the df.index of the minimum value, or the closest match to search_value.

I got this approach from a similar search, but didn't note the original author or the site I found it.

Question 6

Welcome to Code Review! Do you know if this approach is faster, or is it simpler because it requires fewer lines?

Question 7

Adapting from here would be a cleaner way to do what you want.

params = df.iloc[(df['soc [%]']-z).abs().argsort()[:1]]

There might be faster ways if your soc [%] column is fixed with those values.

Also, you should consider not measuring the time for pd.read_csv as that isn't what you are wanting to know the execution for.

Question 8

Thank you for your suggestion about measuring the execution time. I have updated my question with the new timing result. I also tried your suggestion using argsort() which gets rid of the if statements but unfortunately this is about 3 times slower than my original example.

Question 9

Wouldn't idxmax() be better than argsort()[:1]?

Question 10

@GarethRees Using idxmax() does not give the correct results; however, idxmin() gives the right results but it is still about 3 times slower than my original example.

Question 11

@wigging, you are essentially hardcoding a lookup table with your elifs. Its going to be hard to speed that part up dramatically. In the speedup context, there are a couple options: 1. Speed up the elifs ( you could do this with a binary tree) 2. Speed up the lookups ( you could do this with 1 lookup) r0, tau1, tau2, r1, r2 = params[['r0 [ohm]', 'tau1 [s]', 'tau2 [s]', 'r1 [ohm]', 'r2 [ohm]']]

Doug Griggs Doug Griggs 1211 silver badge3 bronze badges · Answer 1 · 2019-12-17 18:05:50Z

Not sure if this will help, but I'm using this to find nearest in a sorted column: (time series stuff)

result_index = df['col_to_search'].sub(search_value).abs().idxmin()

.sub(search_value) subtracts search_value from the df[col_to_search] to make the nearest value almost-zero,
.abs() makes the almost-zero the minimum of the column,
.idxmin() yields the df.index of the minimum value, or the closest match to search_value.

I got this approach from a similar search, but didn't note the original author or the site I found it.

Welcome to Code Review! Do you know if this approach is faster, or is it simpler because it requires fewer lines?

Derek Thomas Derek Thomas 411 bronze badge · Answer 2 · 2018-09-29 13:14:53Z

4

\$\begingroup\$

Adapting from here would be a cleaner way to do what you want.

params = df.iloc[(df['soc [%]']-z).abs().argsort()[:1]]

There might be faster ways if your soc [%] column is fixed with those values.

Also, you should consider not measuring the time for pd.read_csv as that isn't what you are wanting to know the execution for.

Share

answered Sep 29, 2018 at 13:14

Derek Thomas's user avatar

Derek Thomas Derek Thomas

411 bronze badge

\$\endgroup\$

4

\$\begingroup\$ Thank you for your suggestion about measuring the execution time. I have updated my question with the new timing result. I also tried your suggestion using argsort() which gets rid of the if statements but unfortunately this is about 3 times slower than my original example. \$\endgroup\$

wigging
– wigging

2018年09月29日 14:13:39 +00:00
Commented Sep 29, 2018 at 14:13
\$\begingroup\$ Wouldn't idxmax() be better than argsort()[:1]? \$\endgroup\$

Gareth Rees
– Gareth Rees

2018年09月29日 14:31:03 +00:00
Commented Sep 29, 2018 at 14:31
\$\begingroup\$ @GarethRees Using idxmax() does not give the correct results; however, idxmin() gives the right results but it is still about 3 times slower than my original example. \$\endgroup\$

wigging
– wigging

2018年09月29日 14:43:32 +00:00
Commented Sep 29, 2018 at 14:43
\$\begingroup\$ @wigging, you are essentially hardcoding a lookup table with your elifs. Its going to be hard to speed that part up dramatically. In the speedup context, there are a couple options: 1. Speed up the elifs ( you could do this with a binary tree) 2. Speed up the lookups ( you could do this with 1 lookup) r0, tau1, tau2, r1, r2 = params[['r0 [ohm]', 'tau1 [s]', 'tau2 [s]', 'r1 [ohm]', 'r2 [ohm]']] \$\endgroup\$

Derek Thomas
– Derek Thomas

2018年09月30日 16:03:24 +00:00
Commented Sep 30, 2018 at 16:03

Add a comment |

Stack Exchange Network

Lookup closest value in Pandas DataFrame

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Lookup closest value in Pandas DataFrame

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions