Temperature Interpolation

Question 1

I want to interpolate temperature in correlation with height. I have temperature data from stations with coordinates and height in this format:

Lat Lon Temp Height
46.2956 16.3978 10.71196 120
45.9776 16.0159 11.78539 150
45.8521 15.9598 11.18838 133
etc.

After reading the data there are 4 arrays (y, x, temp, and height) and variable z which represents correlation between temperature and height, e.g., z = -0.005 which means that for each meter of height temperature decreases for 0.005 °C.

Z is calculated like this: z = np.polyfit(height, temp, 1,2)

After that I can calculate temperature on 0 meters for each station:

for i in range(len(temp)):
 temp0m.append((temp[i])-(height[i]*z))

I also have data with heights in meters with resolution of 0.003333333333 degrees and I want to use these heights and interpolated temperature to create map.

Heights are in "dem_final2.txt" (ncolums = 2700, nrows = 1560)

1653 1571 1493 1429 1354...
1730 1699 1620 1528 1399...

So this is what I've done so far, everything is working but it's very slow:

if __name__ == '__main__':
 nx, ny = 1080,624
 chunkSize = 10
 t = []
 grid2 = []
 xEast = 12
 ySouth = 42
 xWest = 21
 yNorth = 47.2
 res = 0.003333333333
 x,y,temp0m,z = downloadData()
 xi = np.linspace(xEast+0.1, xWest-0.1, nx)
 yi = np.linspace(ySouth+0.1, yNorth-0.1, ny)
 xi, yi = np.meshgrid(xi, yi)
 xi, yi = xi.flatten(), yi.flatten()
 l = len(xi)/chunkSize
 for i in range(chunkSize):
 interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
 grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]
 grid2 = np.array(grid2, np.float) 
 grid2 = grid2.reshape((ny, nx))
 plot (x,y,temp0m,grid2) #plotting to png with matplotlib
def rbfInterp(x, y, temp0m,xi,yi):
 interp = Rbf(x, y, temp0m, function='linear')
 return interp(xi,yi)
def interpData(xi,yi,z,interp,xEast,ySouth,xWest,yNorth,res):
 append = t.append
 row= -1
 for i in xrange(len(xi)): 
 tmp = int(round((yi[i]-ySouth)/res,0))
 tmp1 = int(round((xi[i]-xEast)/res,0)+1)
 if (row!=tmp): 
 tmp2 = linecache.getline('dem_final2.txt', tmp).split(' ') 
 if (float(tmp2[tmp1])>-10):
 append ((float(tmp2[tmp1]))*z+float(interp[i])) 
 else:
 append (None) 
 row = tmp
 return t

I divided into chunks because a lot of memory was used without them and I think this can be used to speed up things maybe with more processes or something like that but I don't have knowledge to do that without any help. Of course, any other help is also welcome.

Btw this script runs for about 25-30 sec

Question 2

The problems start with this code being fairly unreadable. A few simple rules can help:

Every comma is followed by a space: xi, yi instead of xi,yi.
Every binary operator like + is surrounded by a space on each side:
```
int(round((xi[i] - xEast) / res, 0) + 1)
```
instead of int(round((xi[i]-xEast)/res,0)+1).

If an expression becomes to complicated, assign intermediate values to variables which also assist in explaining the code to a reader:

start = l * i
end = l * (i + 1)
chunk_x = xi[start:end]
chunk_y = yi[start:end]
interp = rbfInterp(x, y, temp0m, chunk_x, chunk_y)
result = interpData(chunk_x, chunk_y, z, interp, xEast, ySouth, xWest, yNorth, res)
grid2 += list(result)[start:end]

instead of

interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]

At least try to provide meaningful names for your variables. Expressions like tmp2[tmp1] are unacceptable.
Avoid overly clever things like append = t.append; ...; append(...), as you gain nothing by this indirection. t.append(...) directly.
In a function call, never put a space between the function name and the parens.
- plot(x, y, temp0m, grid2) instead of plot (x,y,temp0m,grid2)
- append(None) instead of append (None)
Most of Python is using the convention that variable names should use snake_case not camelCase. While consistency is more important than following any specific style, I suggest you adopt this common style.
Some variable names like chunkSize are misleading – this is the number of chunks, not their size.

Once we have cleaned up the code, we can investigate where performance could be increased. This is done mostly by avoiding unnecessary recalculations, but also by finding more efficient expressions.

For example, Rbf(x, y, temp0m, function='linear') will always produce the same value as the parameters never change. Therefore, calculate it once outside of any loops. Actually, you can remove the whole rbfInterp function.

Inside interpData, you translate and rescale the data: (yi[i]-ySouth)/res. If you do this outside of an explicit loop, then numpy can do this more efficiently: translated_yi = (yi - ySouth)/res.

The whole meshgrid business is obfuscating which calculation you are actually performing:

# prepare an interpolation function for this data
interpolation_function = Rbf(x, y, temp0m, function='linear')
# the actual coordinates
yis = np.linspace(ySouth + 0.1, yNorth - 0.1, ny)
xis = np.linspace(xEast + 0.1, xWest - 0.1, nx)
# the grid positions in the data file
yis_normalized = ((yis - ySouth)/res).round(0).astype(np.int)
xis_normalized = ((xis - xEast)/res + 1).round(0).astype(np.int)
grid = np.zeros((ny, nx))
for i in range(ny):
 heights = linecache.getline('dem_final2.txt', yis_normalized[i]).split(' ')
 for j in range(nx):
 height = float(height[xis_normalized[j]])
 result = height * z + interpolation_function(xis[j], yis[i])
 grid[i][j] = result if data_item > -10 else np.nan

Some important points:

We fetch and parse each line once per line, instead of once per grid column.
We factor out repeated calculations outside of the loops
We avoid any confusing reshaping by writing the results directly into the correct data structure.
We try to avoid explicit loops by using numpy transformations

The purpose of the code is now much more obvious.

However, the usage of linecache is quite suspicious. We read the lines in sequential order, so caching will be of no use → read the file manually. You should also consider reading it into a numpy array, as the file in this form should only be using a few MB of memory:

heights = np.loadtxt('dem_final2.txt')
...
result = heights[yis_normalized[i]][xis_normalized[j]] * z + ...

amon amon 12.7k37 silver badges67 bronze badges · Accepted Answer · 2014-04-14 14:46:29Z

The problems start with this code being fairly unreadable. A few simple rules can help:

Every comma is followed by a space: xi, yi instead of xi,yi.
Every binary operator like + is surrounded by a space on each side:
```
int(round((xi[i] - xEast) / res, 0) + 1)
```
instead of int(round((xi[i]-xEast)/res,0)+1).

If an expression becomes to complicated, assign intermediate values to variables which also assist in explaining the code to a reader:

start = l * i
end = l * (i + 1)
chunk_x = xi[start:end]
chunk_y = yi[start:end]
interp = rbfInterp(x, y, temp0m, chunk_x, chunk_y)
result = interpData(chunk_x, chunk_y, z, interp, xEast, ySouth, xWest, yNorth, res)
grid2 += list(result)[start:end]

instead of

interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]

At least try to provide meaningful names for your variables. Expressions like tmp2[tmp1] are unacceptable.
Avoid overly clever things like append = t.append; ...; append(...), as you gain nothing by this indirection. t.append(...) directly.
In a function call, never put a space between the function name and the parens.
- plot(x, y, temp0m, grid2) instead of plot (x,y,temp0m,grid2)
- append(None) instead of append (None)
Most of Python is using the convention that variable names should use snake_case not camelCase. While consistency is more important than following any specific style, I suggest you adopt this common style.
Some variable names like chunkSize are misleading – this is the number of chunks, not their size.

Once we have cleaned up the code, we can investigate where performance could be increased. This is done mostly by avoiding unnecessary recalculations, but also by finding more efficient expressions.

For example, Rbf(x, y, temp0m, function='linear') will always produce the same value as the parameters never change. Therefore, calculate it once outside of any loops. Actually, you can remove the whole rbfInterp function.

Inside interpData, you translate and rescale the data: (yi[i]-ySouth)/res. If you do this outside of an explicit loop, then numpy can do this more efficiently: translated_yi = (yi - ySouth)/res.

The whole meshgrid business is obfuscating which calculation you are actually performing:

# prepare an interpolation function for this data
interpolation_function = Rbf(x, y, temp0m, function='linear')
# the actual coordinates
yis = np.linspace(ySouth + 0.1, yNorth - 0.1, ny)
xis = np.linspace(xEast + 0.1, xWest - 0.1, nx)
# the grid positions in the data file
yis_normalized = ((yis - ySouth)/res).round(0).astype(np.int)
xis_normalized = ((xis - xEast)/res + 1).round(0).astype(np.int)
grid = np.zeros((ny, nx))
for i in range(ny):
 heights = linecache.getline('dem_final2.txt', yis_normalized[i]).split(' ')
 for j in range(nx):
 height = float(height[xis_normalized[j]])
 result = height * z + interpolation_function(xis[j], yis[i])
 grid[i][j] = result if data_item > -10 else np.nan

Some important points:

We fetch and parse each line once per line, instead of once per grid column.
We factor out repeated calculations outside of the loops
We avoid any confusing reshaping by writing the results directly into the correct data structure.
We try to avoid explicit loops by using numpy transformations

The purpose of the code is now much more obvious.

However, the usage of linecache is quite suspicious. We read the lines in sequential order, so caching will be of no use → read the file manually. You should also consider reading it into a numpy array, as the file in this form should only be using a few MB of memory:

heights = np.loadtxt('dem_final2.txt')
...
result = heights[yis_normalized[i]][xis_normalized[j]] * z + ...

Stack Exchange Network

Temperature Interpolation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Temperature Interpolation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions