pandas.DaraFrame resample OHLC based on a non-time value

Question 1

I have a CSV file that looks like this:

 time, price
0 2021年07月23日T20:00:00.000221421Z 368.06
1 2021年07月23日T20:00:00.001131397Z 368.06
2 2021年07月23日T20:00:00.008030544Z 368.06
3 2021年07月23日T20:00:00.00807574Z 368.06
4 2021年07月23日T20:00:00.008084129Z 368.06
... ... ...
32435 2021年07月23日T23:59:46.272261376Z 367.84
32436 2021年07月23日T23:59:53.782418944Z 367.84
32437 2021年07月23日T23:59:56.24058112Z 367.84
32438 2021年07月23日T23:59:58.981374464Z 367.84
32439 2021年07月23日T23:59:58.981374464Z 367.84
32440 rows ×ばつ 2 columns

I'm trying to merge the rows into 0.10 OHLC price range intervals, so that the end result looks like this:

time, open, high, low, close
0 2021年07月23日T20:00:00.000221421Z 368.06 368.06 367.96 367.96
1 2021年07月23日T20:00:00.337192Z 367.95 368.04 367.94 368.04
2 2021年07月23日T20:00:11.632994Z 368.05 368.06 367.96 368.06
3 2021年07月23日T20:00:39.91849344Z 368.07 368.08 367.98 367.98
4 2021年07月23日T20:01:04.188584192Z 367.97 368.04 367.94 367.94

Here's my attempt, which works but it seems quite slow:

import pandas
print('time, open, high, low, close', file = open(file = 'range.csv', mode = 'a'))
dataframe = pandas.read_csv(filepath_or_buffer = 'quotes.csv')
bar = [{'time': dataframe.iloc[0, 0], 'open': dataframe.iloc[0, 1], 'high': dataframe.iloc[0, 1], 'low': dataframe.iloc[0, 1], 'close': dataframe.iloc[0, 1]}]
cycle = [0]
while cycle[-1] < len(dataframe.index):
 high = float(bar[-1]['high'])
 low = float(bar[-1]['low'])
 price = float(dataframe.iloc[cycle[-1], 1])
 time = dataframe.iloc[cycle[-1], 0]
 if price < high - 0.10 or price > low + 0.10:
 bar.append({'time': time, 'open': price, 'high': price, 'low': price})
 bar[-2]['close'] = dataframe.iloc[cycle[-2], 1]
 print(bar[-2]['time'], ',', bar[-2]['open'], ',', bar[-2]['high'], ',', bar[-2]['low'], ',', bar[-2]['close'], file = open(file = 'range.csv', mode ='a'))
 if price > high:
 bar[-1]['high'] = price
 if price < low:
 bar[-1]['low'] = price
 cycle.append(cycle[-1] + 1)

I'm a novice coder at best, and so is this the correct/"pythonic" way of achieving this result, or is there a better way? I haven't found a way of implementing pandas.DataFrame.resample to achieve this result.

Question 2

Layout

For readability, it is better to use 4 spaces per indentation level than 2.

Long lines are hard to understand and maintain. The black program can be used to automatically reformat the code to split up long lines and use recommended indentation.

Documentation

You should add a docstring to the top of the code to summarize its purpose. It should mention expected input CSV file(s) and which file is modified.

Efficiency

If high is always greater than low, then these separate if statements:

if price > high:
 bar[-1]['high'] = price
if price < low:

can be combined into a single if/elif statement:

if price > high:
 bar[-1]['high'] = price
elif price < low:

The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true.

Naming

bar and cycle look like they might be good variable names, but it would be better to add documentation or comments describing what they stand for.

toolic toolic 14.6k5 gold badges29 silver badges204 bronze badges · Answer 1 · 2025-03-17 18:22:38Z

Layout

For readability, it is better to use 4 spaces per indentation level than 2.

Long lines are hard to understand and maintain. The black program can be used to automatically reformat the code to split up long lines and use recommended indentation.

Documentation

You should add a docstring to the top of the code to summarize its purpose. It should mention expected input CSV file(s) and which file is modified.

Efficiency

If high is always greater than low, then these separate if statements:

if price > high:
 bar[-1]['high'] = price
if price < low:

can be combined into a single if/elif statement:

if price > high:
 bar[-1]['high'] = price
elif price < low:

The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true.

Naming

bar and cycle look like they might be good variable names, but it would be better to add documentation or comments describing what they stand for.

Stack Exchange Network

pandas.DaraFrame resample OHLC based on a non-time value

1 Answer 1

Layout

Documentation

Efficiency

Naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

pandas.DaraFrame resample OHLC based on a non-time value

1 Answer 1

Layout

Documentation

Efficiency

Naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions