I wrote a cryptocurrency (or any other) market forecasting application in Python 3.5. The code is getting bigger and more complex, so I would love if some experts would take a look at it and check and look out for especially the following things:
- Loop indexes, range values, array indexes being correctly set (I find problems with this constantly)
- Correct exchange of values between global and local variables
- Local variables used in a nested loop should be nulled out before the 2nd loop
- Possibly making the code more efficient/faster/cleaner
- Other potential bugs
Description
The app is in Python 3.5+. It loads in a .csv
format file which contains the market data of BTC/USD for example. A dummy file can be downloaded here for testing: Pastebin
Then you have 3 options:
- Backtest, which backtests the current configuration on the data
- Forecast, which forecasts
forward_steps
ahead into the future - Brute force, which calculates the optimal parameter for the
period
I would like if the aesthetics of the code would not be changed too much. I know it's messy, I am not a professional, just a hobbyist. So unless the change directly improves the speed or efficiency of the code, I'd like it to be left as it is. Otherwise, all bugs and problems should be fixed.
#☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀ ▶ dependencies ◀ ☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀
import os.path
import math
#☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀ ▶ parameters ◀ ☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀
version = "2.6"
ARRAY = []
FORECAST = []
file_name= "btc_usd.csv"
period = 840
forward_steps =1
best_period = 0
min_error = 999999999999.0
#☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀ ▶ functions ◀ ☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀
def process_file():
with open("data/"+file_name,'r') as f:
lines=f.readlines()
ARRAY.append(['dummytime', 'dummyprice'])
FORECAST.append('dummyforecast')
for i in range(0,len(lines)):
ARRAY.append(lines[i].strip().split(','))
FORECAST.append("dummy")
return (len(ARRAY) -1)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def linear_forecast_diff(limit,f_period):
f_diff=0.0
for i in range(limit-f_period+1,limit+1): # make it [,]
f_diff=f_diff + (float(ARRAY[i][1])-float(ARRAY[i-1][1]))
f_diff/=f_period
return (f_diff)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def verification(limit,f_period_corrector,f_error_ignore,f_period):
f_diff=linear_forecast_diff(limit,f_period)
last_real =float(ARRAY[limit+1][1])
last_forecast=float(ARRAY[limit ][1])+(f_diff*forward_steps)
# manual correction for negative log values or zero division
if(last_forecast<=0):
f_period_corrector+=1
f_error_ignore=True
FORECAST[limit+1]="error"
# print("here "+str(f_period)+" "+str(f_diff)+" "+str(limit))
return (0,f_period_corrector,f_error_ignore)
else:
FORECAST[limit+1]=last_forecast
error=abs(math.log(float(last_real/last_forecast)))
return (error,f_period_corrector,f_error_ignore)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# formula: https://docs.oracle.com/cd/E17236_01/epm.1112/cb_statistical/frameset.htm?ch07s02s03s04.html
def theil_u():
upper=0.0
lower=0.0
for x in range(1+period,arraysize-forward_steps): # N-1 for Theil's U
if(ARRAY[x+1][1]!="error" and ARRAY[x+2][1]!="error" and FORECAST[x+1]!="error" and FORECAST[x+2]!="error"):
cur_real =float(ARRAY [x+1][1])
cur_forecast =float(FORECAST[x+1])
future_real =float(ARRAY [x+2][1])
future_forecast=float(FORECAST[x+2])
upper+= ((future_forecast-future_real)/cur_real)**2
lower+= ((future_real-cur_real) /cur_real)**2
if(upper!=0.0 and lower!=0.0):
theil=math.sqrt(upper/lower)
return (theil)
else:
return (1)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def user_interaction():
print ("\n")
print ("Choose Option!")
print ("Backtest: 0")
print ("Forecast: 1")
print ("Brute Force: 2")
return(input())
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def main_function(f_period,f_best_period,f_min_error):
if(userinput=="0" or userinput=="2" ):
error=0.0
period_corrector=0
error_ignore=False
theilu=0.0
for k in range(1+f_period,arraysize-forward_steps+1): # +1 to fill the [] zone , -1 to go until the prev value
output=verification(k,period_corrector,error_ignore,f_period)
if(output[2]==False):
error+=output[0]
error_ignore=False
error/= (arraysize-f_period-forward_steps-output[1])
if(userinput=="2" ):
if (error<f_min_error):
f_min_error=error
f_best_period=f_period
if(userinput=="0"):
theilu=theil_u()
print (" "+'033円[103m'+'033円[1m' +"Profitgenerator's CRYPTO FORECASTING TOOL v"+version+ '033円[0m')
print ("\n")
print (" Mode: "+"Backtest")
print (" File Name: "+file_name)
print (" From: "+str(ARRAY[1][0])+" to "+str(ARRAY[arraysize][0]))
print (" Array Size: "+str(arraysize))
print (" Period: "+str(f_period))
print (" Forward Steps: "+str(forward_steps))
print (" Iterations: "+str((arraysize-f_period-forward_steps)))
print (" "+'033円[106m' +'033円[1m' +"LN Error: "+str(error)+ '033円[0m' )
print (" Theil's U: "+str(theilu))
if(theilu<1):
print (" "+'033円[102m'+'033円[1m' +"Predictive Edge: "+str((1-theilu)) + '033円[0m')
else:
print (" "+'033円[101m'+'033円[1m' +"Predictive Edge: "+str((1-theilu)) + '033円[0m')
if(userinput=="1"):
difference=linear_forecast_diff(arraysize,f_period)
lastprice=float(ARRAY[arraysize][1])
forecast=lastprice+(difference*forward_steps)
print (" "+'033円[103m'+'033円[1m' +"Profitgenerator's CRYPTO FORECASTING TOOL v"+version+ '033円[0m')
print ("\n")
print(" Mode: "+"Forecast")
print(" File Name: "+file_name)
print(" Array Size: "+str(arraysize))
print(" Period: "+str(f_period))
print(" Forward Steps: "+str(forward_steps))
print(" Last Price: "+str(lastprice))
print(" "+'033円[106m' +'033円[1m' +"Forecasted Next Price: "+str(forecast)+ '033円[0m')
# print(f_best_period,f_min_error)
return (f_best_period,f_min_error)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀ ▶ main ◀ ☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀☀
arraysize=process_file()
if((period+forward_steps)>arraysize-2):
period=1
forward_steps=1
userinput = user_interaction()
print ("\n\n")
if(userinput=="0" or userinput=="1" ):
main_function(period,best_period,min_error)
if(userinput=="2" ):
for period in range (1,arraysize-forward_steps):
main_output=main_function(period,best_period,min_error) # +1 to fill the [] zone
best_period=main_output[0]
min_error =main_output[1]
print (" Mode: "+"Brute Force")
print (" Forward Steps: "+str(forward_steps))
print (" Best Period: "+str(best_period))
print (" Error: "+str(min_error))
print ("\n\n")
-
\$\begingroup\$ Just to note, the code you post here, and we do, is licensed implicitly as cc by-sa 3.0. This is stated in the footer of all pages. So you may not be able to take code from an answer and add it straight to your MIT code. \$\endgroup\$Peilonrayz– Peilonrayz ♦2017年10月25日 08:16:46 +00:00Commented Oct 25, 2017 at 8:16
-
\$\begingroup\$ @Peilonrayz well I have posted the code in other places too under MIT license, so from my point of view I don't care which license people use. As for contributor's code,okay fair enough, still good as an inspiration source, I would just like if somebody would take a look at it and debug it. \$\endgroup\$user8244558– user82445582017年10月26日 10:59:09 +00:00Commented Oct 26, 2017 at 10:59
1 Answer 1
Unused code
This line should be deleted because it is not used:
import os.path
This line can aslo be deleted:
cur_forecast =float(FORECAST[x+1])
These were discovered automatically using the ruff tool
Efficiency
In main_function
, these separate if
statements:
if(userinput=="2" ):
if(userinput=="0"):
should be combined into an if/elif
:
if(userinput=="2" ):
elif(userinput=="0"):
The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true. Also, this more clearly shows the intent of the code.
The same is true for the outermost pair of if
statements in this function.
And again in the "main" section of the code.
The remainder of the review concerns code style.
I would like if the aesthetics of the code would not be changed too much
I'm not sure what you consider aesthetics, but code style is an important aspect of coding, especially for reviews. Also, answers on Code Review are for everyone, not just the person who posted the question. You are free to adopt or ignore any of the advice.
Documentation
The PEP 8 style guide recommends adding docstrings for functions. The docstrings should summarize the purpose of the function and contain details of the input and return types.
Naming
Many of the functions and variables are well-named.
However, the function name theil_u
could either use a better name
to clarify what it does, or at least a comment explaining its meaning.
The PEP 8 guide recommends all caps for constants:
version = "2.6"
would be:
VERSION = "2.6"
Conversely, the variable FORECAST
would be forecast
.
The variable name ARRAY
does not convey any meaning.
Comments
Commented-out code should be deleted to remove clutter:
# print("here "+str(f_period)+" "+str(f_diff)+" "+str(limit))
Layout
The code uses inconsistent indentation and whitespace around operators. The black program can be used to automatically reformat the code for consistency.
Conventionally, simple if
statements do not use parentheses:
if(userinput=="2" ):
would be cleaner as:
if userinput=="2":
Similarly for return
statements:
return (f_diff)
would be cleaner as:
return f_diff
Another convention is to avoid equality comparisons to True/False
:
if(output[2]==False):
would be cleaner as:
if not output[2]:
Simpler
Lines such as:
print (" File Name: "+file_name)
can be simplified using f-strings:
print (f" File Name: {file_name}")
Explore related questions
See similar questions with these tags.