2
\$\begingroup\$

I did the following

  1. Created DB and table in prepate_table function
  2. Extracted data from binance in GetHistoricalData function
  3. Saved the data to DB

My code works fine but I want to optimize my code and remove redundant steps

My solution

import os
import pandas as pd
import time
import clickhouse_driver
import datetime
from binance.client import Client
# Binance test_key https://testnet.binance.vision/key/generate
API_KEY = "--"
API_SECRET = "--"
def GetHistoricalData(
 timedelta_days=10,
 ticker="BTCUSDT",
 kline_interval=Client.KLINE_INTERVAL_1HOUR
 ):
 # Calculate the timestamps for the binance api function
 untilThisDate = datetime.datetime.now()
 sinceThisDate = untilThisDate - datetime.timedelta(days=timedelta_days)
 client = Client(API_KEY, API_SECRET) 
 client.API_URL = 'https://testnet.binance.vision/api'
 candle = client.get_historical_klines(ticker, kline_interval, str(sinceThisDate), str(untilThisDate))
 # Create a dataframe to label all the columns returned by binance so we work with them later.
 df = pd.DataFrame(candle, columns=['dateTime', 'open', 'high', 'low', 'close', 'volume', 'closeTime', 'quoteAssetVolume', 'numberOfTrades', 'takerBuyBaseVol', 'takerBuyQuoteVol', 'ignore'])
 # as timestamp is returned in ms, let us convert this back to proper timestamps.
 df.dateTime = pd.to_datetime(df.dateTime, unit='ms').dt.strftime("%Y-%m-%d %X")
 df.set_index('dateTime', inplace=True)
 # Get rid of columns we do not need
 df = df.drop(['quoteAssetVolume', 'numberOfTrades', 'takerBuyBaseVol','takerBuyQuoteVol', 'ignore'], axis=1)
 return df
def prepare_table():
 client = clickhouse_driver.Client.from_url(f'clickhouse://default:{os.getenv("CLICK_PASSWORD")}@localhost:9000/crypto_exchange')
 
 # field names from binance API
 client.execute('''
CREATE TABLE IF NOT EXISTS historical_data_binance
(
 dateTime DateTime,
 closeTime Int64,
 open Float64,
 high Float64,
 low Float64,
 close Float64,
 volume Float64,
 kline_type String,
 ticker String
) ENGINE = Memory
''')
 return client
def insert_data(client, insert_data, db_name="crypto_exchange", table_name="historical_data_binance"):
 """
 insert_data = {
 "dateTime": dateTime, 
 "closeTime": closeTime, 
 "open": open,
 "high": hign,
 "low": low,
 "close": close,
 "volume": volume,
 "kline_type": kline_type,
 "ticker": ticker
 }
 """
 columns = ', '.join(insert_data.keys())
 query = 'insert into {}.{} ({}) values'.format(db_name, table_name, columns)
 
 data = []
 data.append(insert_data)
 client.execute(query, data)
client_db = prepare_table()
hist_data = GetHistoricalData(kline_interval=Client.KLINE_INTERVAL_1HOUR, ticker="BTCUSDT",)
for row in hist_data.iterrows():
 data = row[1].to_dict()
 data["dateTime"] = datetime.datetime.strptime(row[0], "%Y-%m-%d %X")
 data["closeTime"] = int(data["closeTime"])
 data["open"] = float(data["open"])
 data["high"] = float(data["high"])
 data["low"] = float(data["low"])
 data["close"] = float(data["close"])
 data["volume"] = float(data["volume"])
 data["kline_type"] = Client.KLINE_INTERVAL_1HOUR
 data["ticker"] = "BTCUSDT"
 insert_data(client_db, data)
 

What can be improved?

asked Nov 2, 2021 at 6:21
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$
  • By PEP8, GetHistoricalData should be get_historical_data; likewise for your local variables like until_this_date
  • Introduce PEP484 type hints, for instance timedelta_days: Real (if it's allowed to be floating-point) or int otherwise
  • You should not be redefining 'https://testnet.binance.vision/api'; that's already defined as BaseClient.API_TESTNET_URL - see the documentation. This also suggests that you should be using testnet=True upon construction.
  • This date conversion:
 df.dateTime = pd.to_datetime(df.dateTime, unit='ms').dt.strftime("%Y-%m-%d %X")

is only half good idea. Datetime data should be stored in machine format instead of rendered user presentation data, so keep the to_datetime and drop the strftime.

  • When you call df.drop, pass inplace=True since you overwrite df anyway.
  • Your use of iterrows broken down into individual ClickHouse insert statements is going to be slow. ClickHouse supports multi-row values() syntax. Try making insertion batches that use this syntax to reduce the total number of inserts that you perform.
answered Nov 4, 2021 at 20:07
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.