R: Optimize algorithm speed that estimates hourly temperature and relative humidity

Question 1

I built an algorithm to estimate real-time temperature and humidity values based on daily values of the same variables. The algorithm works, however the speed with which it processes the information is too slow and this causes me problems when the data is of many years or decades.

I need to increase the processing speed. I attach the algorithm:

names(hrTable) <- c("date","tmin","tminnext","tmax","tdew","tdewnext", "dayIndex")

hrTable is a dataframe that contains 8 columns, the first is "date" that contains the date in format "%Y-%m-%d". The rest are numerical variables.

Ejemplo : 1995年01月20日, 12, 13, 25, 15, 16, 360

CalculateHumidHoursAndHumidTemp1 <- 
function(hrTable,yearLength,lat,incli,rhThreshold,verbose) {
nrowTab = nrow(hrTable);
lines2 = split(hrTable,rep(1:nrowTab,each=1)); 
hrResult = c("hrvalues");
HumHr = 0;
AccumHumHr = 0;
HumTm = 0;
AccumHumTm = 0;
day0 = 1;
dayn = nrowTab;
for(k in day0:dayn) {
 line2 = lines2[[k]];
 HumHr = 0;
 tempArray2 = line2; 
 date2 = tempArray2[1];
 tMin = tempArray2[2]; 
 tMinNext = tempArray2[3]; #ENTERO PARA PODER COMPARAR EN LA TABLA
 tMax = tempArray2[4];
 tDew = tempArray2[5];
 tDewNext = tempArray2[6];
 dayIndex = tempArray2[7];
 alpha = tMax - tMin;
 t0 = tMax - 0.39*(tMax - tMinNext);
 r = tMax - t0;
 sindec = -1.0*incli*cos(6.283185307179586*((dayIndex + 10)/yearLength));
 cosdec = sqrt(abs(1.0 - (sindec * sindec)));
 b = cos(lat*(3.141592653589793/180.0))*cosdec;
 a = sin(lat*(3.141592653589793/180.0))*sindec;
 dayLength = 12.0*(1.0 + (0.6366197723675814*sin(a/b)));
 ho = 12.0 + (dayLength/2.0);
 hn = 12.0 - (dayLength/2.0);
 hp = hn + 24.0;
 beta = (tMinNext - t0)/(sqrt(abs(hp - ho)));
 hx = ho - 4.0;
 h = as.numeric(round(hn));
 if (verbose) {
 print(paste(hn,ho));
 }
 for(n in 1:24) {
 tAux1 = 0;
 tAux2 = 0;
 tAux3 = 0;
 hour = h + n;
if (hour >= hn && hour < hx) {
 tAux1 = tMin + alpha*sin(((hour - hn)/(hx - hn))*(3.141592653589793/2.0));
}
if (hour >= hx && hour <= ho) {
 tAux2 = t0 + r*sin(1.5707963267948966 + ((hour - hx)/4.0)*(3.141592653589793/2.0));
}
if (hour > ho && hour <= hp) {
 tAux3 = t0 + beta*sqrt(hour - ho);
}
tSum = tAux1 + tAux2 + tAux3;
if (verbose) {
 print(paste("Indice=",n,"\tHour", hour,"\ttSum=",tSum ,"\tTaux1=",tAux1,"\ttAux2=",tAux2,"\ntAux3=",tAux3," dayLength=",dayLength," a=",a," b=",b));
}
es = 611.0*exp(17.27*(tSum/(237.3+tSum)));
if (hour <= 24) {
 e = 611.0*exp(17.27*(tDew/(237.3+tDew)));
 hr = (e/es)*100.0;
 if (hr > rhThreshold) {
 HumHr = HumHr + 1;
 HumTm = tSum + HumTm;
 }
 if (verbose) {
 print(paste("Indice=",n,"\tHour",hour,"\te=",e,"\tes=",es));
 }
}
if (hour > 24) {
 e = 611.0*exp(17.27*(tDewNext/(237.3+tDewNext)));
 hr = (e/es)*100;
 if (hr > rhThreshold) {
 AccumHumHr = AccumHumHr + 1;
 AccumHumTm = tSum + AccumHumTm;
 }
 if (verbose) {
 print(paste("Indice=",n,"\tHour",hour,"\te=",e,"\tes=",es));
 }
}
hrResult = paste(hrResult,hr,tSum)
}
if (verbose) {
 print(paste("Horas Humedas= ",HumHr,"Temp Humedad= ",HumTm));
 }
}
return(hrResult)
}
HhrandHt <- CalculateHumidHoursAndHumidTemp1(hrTable,365,-12,12,90,T)

Question 2

Welcome to Code Review! The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.

Question 3

Can you maybe provide a sample of the data in hrTable like so: dput(head(hrTable, n = 50))? It makes testing and understanding your code a bit easier.

Question 4

What do you mean by "too slow"? What would be an acceptable speed?

Question 5

You should always format your code properly before pasting it here. Currently the indentation and spacing is totally broken. Remember that code is written for humans to read and understand ideas, and only incidentally for computers to execute.

Question 6

You say "4 columns" but then list 8 columns. That's inconsistent.

Question 7

The first thing to improve about your code/procedure is readability.
Not "just" for others trying to understand your code, but for your latter self should you want to maintain it.
I don't know the first thing about R. The first style guide a superficial web search turned up is from Hadley Wickham's Advanced R, characterised as short and sweet.

Things I think to harm readability:

(lack of) structuring source code by formatting
(lack of) documentation in the source code: What is "everything" about?
What is the problem to solve, the sequence of well-defined steps perceived to solve it?
abbreviated names without an illuminating comment: is "hr" for hour or humidity, relative or something entirely else?
hx seems to be four hours before sunset - what does the x signify, and why 4 hours?
in-line arithmetic with magical constants: while I can guess that lat*(3.141592653589793/180.0) is radian(latitude in degrees), I'd rather not.
What is the significance of 10 in dayIndex + 10,
0.39 in tMax - 0.39*(tMax - tMinNext), why isn't the latter just 0.39*tMinNext + 0.61*tMax?
computing things that don't get used
The only things I see used (pasted to hrResult - see below) are hr and tSum;
some code struggles with HumHr/HumTm and "their Accum counterparts".

Once the code is readable, one can start pondering performance impact:
Is there anything straining the memory hierarchy?
Do costly actions get repeated instead of results reused?
Then, there is general advice like Use Vectorisation,
and (resource demand) pitfalls - using paste() to accumulate the result may be one.

Question 8

Thank you for your comments, I will correct my code and upload it again.

Question 9

@MarvinJónathanQuispeSedano be sure to ask a new question.

greybeard greybeard 7,3913 gold badges21 silver badges55 bronze badges · Answer 1 · 2019-11-07 04:06:33Z

The first thing to improve about your code/procedure is readability.
Not "just" for others trying to understand your code, but for your latter self should you want to maintain it.
I don't know the first thing about R. The first style guide a superficial web search turned up is from Hadley Wickham's Advanced R, characterised as short and sweet.

Things I think to harm readability:

(lack of) structuring source code by formatting
(lack of) documentation in the source code: What is "everything" about?
What is the problem to solve, the sequence of well-defined steps perceived to solve it?
abbreviated names without an illuminating comment: is "hr" for hour or humidity, relative or something entirely else?
hx seems to be four hours before sunset - what does the x signify, and why 4 hours?
in-line arithmetic with magical constants: while I can guess that lat*(3.141592653589793/180.0) is radian(latitude in degrees), I'd rather not.
What is the significance of 10 in dayIndex + 10,
0.39 in tMax - 0.39*(tMax - tMinNext), why isn't the latter just 0.39*tMinNext + 0.61*tMax?
computing things that don't get used
The only things I see used (pasted to hrResult - see below) are hr and tSum;
some code struggles with HumHr/HumTm and "their Accum counterparts".

Once the code is readable, one can start pondering performance impact:
Is there anything straining the memory hierarchy?
Do costly actions get repeated instead of results reused?
Then, there is general advice like Use Vectorisation,
and (resource demand) pitfalls - using paste() to accumulate the result may be one.

Thank you for your comments, I will correct my code and upload it again.

Stack Exchange Network

R: Optimize algorithm speed that estimates hourly temperature and relative humidity

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

R: Optimize algorithm speed that estimates hourly temperature and relative humidity

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions