I built an algorithm to estimate real-time temperature and humidity values based on daily values of the same variables. The algorithm works, however the speed with which it processes the information is too slow and this causes me problems when the data is of many years or decades.
I need to increase the processing speed. I attach the algorithm:
names(hrTable) <- c("date","tmin","tminnext","tmax","tdew","tdewnext", "dayIndex")
hrTable
is a dataframe that contains 8 columns, the first is "date" that contains the date in format "%Y-%m-%d"
. The rest are numerical variables.
Ejemplo : 1995年01月20日, 12, 13, 25, 15, 16, 360
CalculateHumidHoursAndHumidTemp1 <-
function(hrTable,yearLength,lat,incli,rhThreshold,verbose) {
nrowTab = nrow(hrTable);
lines2 = split(hrTable,rep(1:nrowTab,each=1));
hrResult = c("hrvalues");
HumHr = 0;
AccumHumHr = 0;
HumTm = 0;
AccumHumTm = 0;
day0 = 1;
dayn = nrowTab;
for(k in day0:dayn) {
line2 = lines2[[k]];
HumHr = 0;
tempArray2 = line2;
date2 = tempArray2[1];
tMin = tempArray2[2];
tMinNext = tempArray2[3]; #ENTERO PARA PODER COMPARAR EN LA TABLA
tMax = tempArray2[4];
tDew = tempArray2[5];
tDewNext = tempArray2[6];
dayIndex = tempArray2[7];
alpha = tMax - tMin;
t0 = tMax - 0.39*(tMax - tMinNext);
r = tMax - t0;
sindec = -1.0*incli*cos(6.283185307179586*((dayIndex + 10)/yearLength));
cosdec = sqrt(abs(1.0 - (sindec * sindec)));
b = cos(lat*(3.141592653589793/180.0))*cosdec;
a = sin(lat*(3.141592653589793/180.0))*sindec;
dayLength = 12.0*(1.0 + (0.6366197723675814*sin(a/b)));
ho = 12.0 + (dayLength/2.0);
hn = 12.0 - (dayLength/2.0);
hp = hn + 24.0;
beta = (tMinNext - t0)/(sqrt(abs(hp - ho)));
hx = ho - 4.0;
h = as.numeric(round(hn));
if (verbose) {
print(paste(hn,ho));
}
for(n in 1:24) {
tAux1 = 0;
tAux2 = 0;
tAux3 = 0;
hour = h + n;
if (hour >= hn && hour < hx) {
tAux1 = tMin + alpha*sin(((hour - hn)/(hx - hn))*(3.141592653589793/2.0));
}
if (hour >= hx && hour <= ho) {
tAux2 = t0 + r*sin(1.5707963267948966 + ((hour - hx)/4.0)*(3.141592653589793/2.0));
}
if (hour > ho && hour <= hp) {
tAux3 = t0 + beta*sqrt(hour - ho);
}
tSum = tAux1 + tAux2 + tAux3;
if (verbose) {
print(paste("Indice=",n,"\tHour", hour,"\ttSum=",tSum ,"\tTaux1=",tAux1,"\ttAux2=",tAux2,"\ntAux3=",tAux3," dayLength=",dayLength," a=",a," b=",b));
}
es = 611.0*exp(17.27*(tSum/(237.3+tSum)));
if (hour <= 24) {
e = 611.0*exp(17.27*(tDew/(237.3+tDew)));
hr = (e/es)*100.0;
if (hr > rhThreshold) {
HumHr = HumHr + 1;
HumTm = tSum + HumTm;
}
if (verbose) {
print(paste("Indice=",n,"\tHour",hour,"\te=",e,"\tes=",es));
}
}
if (hour > 24) {
e = 611.0*exp(17.27*(tDewNext/(237.3+tDewNext)));
hr = (e/es)*100;
if (hr > rhThreshold) {
AccumHumHr = AccumHumHr + 1;
AccumHumTm = tSum + AccumHumTm;
}
if (verbose) {
print(paste("Indice=",n,"\tHour",hour,"\te=",e,"\tes=",es));
}
}
hrResult = paste(hrResult,hr,tSum)
}
if (verbose) {
print(paste("Horas Humedas= ",HumHr,"Temp Humedad= ",HumTm));
}
}
return(hrResult)
}
HhrandHt <- CalculateHumidHoursAndHumidTemp1(hrTable,365,-12,12,90,T)
1 Answer 1
The first thing to improve about your code/procedure is readability.
Not "just" for others trying to understand your code, but for your latter self should you want to maintain it.
I don't know the first thing about R. The first style guide a superficial web search turned up is from Hadley Wickham's Advanced R, characterised as short and sweet
.
Things I think to harm readability:
(lack of) structuring source code by formatting
(lack of) documentation in the source code: What is "everything" about?
What is the problem to solve, the sequence of well-defined steps perceived to solve it?- abbreviated names without an illuminating comment: is "hr" for hour or humidity, relative or something entirely else?
hx
seems to be four hours before sunset - what does the x signify, and why 4 hours? - in-line arithmetic with magical constants: while I can guess that
lat*(3.141592653589793/180.0)
isradian(
latitude in degrees
)
, I'd rather not.
What is the significance of 10 indayIndex + 10
,
0.39 intMax - 0.39*(tMax - tMinNext)
, why isn't the latter just0.39*tMinNext + 0.61*tMax
? - computing things that don't get used
The only things I see used (pasted tohrResult
- see below) arehr
andtSum
;
some code struggles withHumHr
/HumTm
and "their Accum counterparts".
Once the code is readable, one can start pondering performance impact:
Is there anything straining the memory hierarchy?
Do costly actions get repeated instead of results reused?
Then, there is general advice like Use Vectorisation,
and (resource demand) pitfalls - using paste()
to accumulate the result may be one.
-
\$\begingroup\$ Thank you for your comments, I will correct my code and upload it again. \$\endgroup\$Marvin Jónathan Quispe Sedano– Marvin Jónathan Quispe Sedano2019年11月07日 11:51:18 +00:00Commented Nov 7, 2019 at 11:51
-
1\$\begingroup\$ @MarvinJónathanQuispeSedano be sure to ask a new question. \$\endgroup\$greybeard– greybeard2019年11月07日 14:07:28 +00:00Commented Nov 7, 2019 at 14:07
hrTable
like so:dput(head(hrTable, n = 50))
? It makes testing and understanding your code a bit easier. \$\endgroup\$