I want to interpolate temperature in correlation with height. I have temperature data from stations with coordinates and height in this format:
Lat Lon Temp Height
46.2956 16.3978 10.71196 120
45.9776 16.0159 11.78539 150
45.8521 15.9598 11.18838 133
etc.
After reading the data there are 4 arrays (y
, x
, temp
, and height
) and variable z
which represents correlation between temperature and height, e.g., z = -0.005
which means that for each meter of height temperature decreases for 0.005 °C.
Z is calculated like this: z = np.polyfit(height, temp, 1,2)
After that I can calculate temperature on 0 meters for each station:
for i in range(len(temp)):
temp0m.append((temp[i])-(height[i]*z))
I also have data with heights in meters with resolution of 0.003333333333 degrees and I want to use these heights and interpolated temperature to create map.
Heights are in "dem_final2.txt"
(ncolums = 2700, nrows = 1560)
1653 1571 1493 1429 1354...
1730 1699 1620 1528 1399...
So this is what I've done so far, everything is working but it's very slow:
if __name__ == '__main__':
nx, ny = 1080,624
chunkSize = 10
t = []
grid2 = []
xEast = 12
ySouth = 42
xWest = 21
yNorth = 47.2
res = 0.003333333333
x,y,temp0m,z = downloadData()
xi = np.linspace(xEast+0.1, xWest-0.1, nx)
yi = np.linspace(ySouth+0.1, yNorth-0.1, ny)
xi, yi = np.meshgrid(xi, yi)
xi, yi = xi.flatten(), yi.flatten()
l = len(xi)/chunkSize
for i in range(chunkSize):
interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))])
grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]
grid2 = np.array(grid2, np.float)
grid2 = grid2.reshape((ny, nx))
plot (x,y,temp0m,grid2) #plotting to png with matplotlib
def rbfInterp(x, y, temp0m,xi,yi):
interp = Rbf(x, y, temp0m, function='linear')
return interp(xi,yi)
def interpData(xi,yi,z,interp,xEast,ySouth,xWest,yNorth,res):
append = t.append
row= -1
for i in xrange(len(xi)):
tmp = int(round((yi[i]-ySouth)/res,0))
tmp1 = int(round((xi[i]-xEast)/res,0)+1)
if (row!=tmp):
tmp2 = linecache.getline('dem_final2.txt', tmp).split(' ')
if (float(tmp2[tmp1])>-10):
append ((float(tmp2[tmp1]))*z+float(interp[i]))
else:
append (None)
row = tmp
return t
I divided into chunks because a lot of memory was used without them and I think this can be used to speed up things maybe with more processes or something like that but I don't have knowledge to do that without any help. Of course, any other help is also welcome.
Btw this script runs for about 25-30 sec
1 Answer 1
The problems start with this code being fairly unreadable. A few simple rules can help:
Every comma is followed by a space:
xi, yi
instead ofxi,yi
.Every binary operator like
+
is surrounded by a space on each side:int(round((xi[i] - xEast) / res, 0) + 1)
instead of
int(round((xi[i]-xEast)/res,0)+1)
.If an expression becomes to complicated, assign intermediate values to variables which also assist in explaining the code to a reader:
start = l * i end = l * (i + 1) chunk_x = xi[start:end] chunk_y = yi[start:end] interp = rbfInterp(x, y, temp0m, chunk_x, chunk_y) result = interpData(chunk_x, chunk_y, z, interp, xEast, ySouth, xWest, yNorth, res) grid2 += list(result)[start:end]
instead of
interp = rbfInterp(x,y,temp0m,xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))]) grid2 = grid2 + list(interpData(xi[(l*i):(l*(i+1))],yi[(l*i):(l*(i+1))],z,interp,xEast,ySouth,xWest,yNorth,res))[(l*i):(l*(i+1))]
At least try to provide meaningful names for your variables. Expressions like
tmp2[tmp1]
are unacceptable.Avoid overly clever things like
append = t.append; ...; append(...)
, as you gain nothing by this indirection.t.append(...)
directly.In a function call, never put a space between the function name and the parens.
plot(x, y, temp0m, grid2)
instead ofplot (x,y,temp0m,grid2)
append(None)
instead ofappend (None)
Most of Python is using the convention that variable names should use
snake_case
notcamelCase
. While consistency is more important than following any specific style, I suggest you adopt this common style.Some variable names like
chunkSize
are misleading – this is the number of chunks, not their size.
Once we have cleaned up the code, we can investigate where performance could be increased. This is done mostly by avoiding unnecessary recalculations, but also by finding more efficient expressions.
For example, Rbf(x, y, temp0m, function='linear')
will always produce the same value as the parameters never change. Therefore, calculate it once outside of any loops. Actually, you can remove the whole rbfInterp
function.
Inside interpData
, you translate and rescale the data: (yi[i]-ySouth)/res
. If you do this outside of an explicit loop, then numpy can do this more efficiently: translated_yi = (yi - ySouth)/res
.
The whole meshgrid
business is obfuscating which calculation you are actually performing:
# prepare an interpolation function for this data
interpolation_function = Rbf(x, y, temp0m, function='linear')
# the actual coordinates
yis = np.linspace(ySouth + 0.1, yNorth - 0.1, ny)
xis = np.linspace(xEast + 0.1, xWest - 0.1, nx)
# the grid positions in the data file
yis_normalized = ((yis - ySouth)/res).round(0).astype(np.int)
xis_normalized = ((xis - xEast)/res + 1).round(0).astype(np.int)
grid = np.zeros((ny, nx))
for i in range(ny):
heights = linecache.getline('dem_final2.txt', yis_normalized[i]).split(' ')
for j in range(nx):
height = float(height[xis_normalized[j]])
result = height * z + interpolation_function(xis[j], yis[i])
grid[i][j] = result if data_item > -10 else np.nan
Some important points:
- We fetch and parse each line once per line, instead of once per grid column.
- We factor out repeated calculations outside of the loops
- We avoid any confusing reshaping by writing the results directly into the correct data structure.
- We try to avoid explicit loops by using numpy transformations
The purpose of the code is now much more obvious.
However, the usage of linecache
is quite suspicious. We read the lines in sequential order, so caching will be of no use → read the file manually. You should also consider reading it into a numpy array, as the file in this form should only be using a few MB of memory:
heights = np.loadtxt('dem_final2.txt')
...
result = heights[yis_normalized[i]][xis_normalized[j]] * z + ...
Explore related questions
See similar questions with these tags.