I have a CSV file that looks like this:
time, price
0 2021年07月23日T20:00:00.000221421Z 368.06
1 2021年07月23日T20:00:00.001131397Z 368.06
2 2021年07月23日T20:00:00.008030544Z 368.06
3 2021年07月23日T20:00:00.00807574Z 368.06
4 2021年07月23日T20:00:00.008084129Z 368.06
... ... ...
32435 2021年07月23日T23:59:46.272261376Z 367.84
32436 2021年07月23日T23:59:53.782418944Z 367.84
32437 2021年07月23日T23:59:56.24058112Z 367.84
32438 2021年07月23日T23:59:58.981374464Z 367.84
32439 2021年07月23日T23:59:58.981374464Z 367.84
32440 rows ×ばつ 2 columns
I'm trying to merge the rows into 0.10 OHLC price range intervals, so that the end result looks like this:
time, open, high, low, close
0 2021年07月23日T20:00:00.000221421Z 368.06 368.06 367.96 367.96
1 2021年07月23日T20:00:00.337192Z 367.95 368.04 367.94 368.04
2 2021年07月23日T20:00:11.632994Z 368.05 368.06 367.96 368.06
3 2021年07月23日T20:00:39.91849344Z 368.07 368.08 367.98 367.98
4 2021年07月23日T20:01:04.188584192Z 367.97 368.04 367.94 367.94
Here's my attempt, which works but it seems quite slow:
import pandas
print('time, open, high, low, close', file = open(file = 'range.csv', mode = 'a'))
dataframe = pandas.read_csv(filepath_or_buffer = 'quotes.csv')
bar = [{'time': dataframe.iloc[0, 0], 'open': dataframe.iloc[0, 1], 'high': dataframe.iloc[0, 1], 'low': dataframe.iloc[0, 1], 'close': dataframe.iloc[0, 1]}]
cycle = [0]
while cycle[-1] < len(dataframe.index):
high = float(bar[-1]['high'])
low = float(bar[-1]['low'])
price = float(dataframe.iloc[cycle[-1], 1])
time = dataframe.iloc[cycle[-1], 0]
if price < high - 0.10 or price > low + 0.10:
bar.append({'time': time, 'open': price, 'high': price, 'low': price})
bar[-2]['close'] = dataframe.iloc[cycle[-2], 1]
print(bar[-2]['time'], ',', bar[-2]['open'], ',', bar[-2]['high'], ',', bar[-2]['low'], ',', bar[-2]['close'], file = open(file = 'range.csv', mode ='a'))
if price > high:
bar[-1]['high'] = price
if price < low:
bar[-1]['low'] = price
cycle.append(cycle[-1] + 1)
I'm a novice coder at best, and so is this the correct/"pythonic" way of achieving this result, or is there a better way? I haven't found a way of implementing pandas.DataFrame.resample
to achieve this result.
1 Answer 1
Layout
For readability, it is better to use 4 spaces per indentation level than 2.
Long lines are hard to understand and maintain. The black program can be used to automatically reformat the code to split up long lines and use recommended indentation.
Documentation
You should add a docstring to the top of the code to summarize its purpose. It should mention expected input CSV file(s) and which file is modified.
Efficiency
If high
is always greater than low
, then
these separate if
statements:
if price > high:
bar[-1]['high'] = price
if price < low:
can be combined into a single if/elif
statement:
if price > high:
bar[-1]['high'] = price
elif price < low:
The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true.
Naming
bar
and cycle
look like they might be good variable names,
but it would be better to add documentation or comments describing
what they stand for.