I would like to reclassify a raster file from a raster with 10 classes to a raster with 8 classes using Python, GDAL and/or numpy. The classes are represented as integers. I have tried following the steps from this post Reclassifying rasters using GDAL and Python? , the numpy.equal doc and also gdal_calc doc. However, to no avail.
The raster file to be reclassified has integer values ranging from 0 to 11 and also include values 100 and 255. The following show the reclass (from value : to value):
nodata : 4, 0 : 4, 1 : 1, 2 : 2, 3 : 3, 4 : 3, 5 : 4, 6 : 5, 7 : 5, 8 : 6, 9 : 7, 10 : 8, 100 : nodata, 255 : nodata,
What I have been able to do is select the raster file to be reclassified using tkinter.FileDialog and get the raster info such as geotransform, and pixel size with reclass = gdal.Open(raster, GA_ReadOnly).
How do I go about solving the above.
It might be worth mentioning that the rasters to be reclassified can be fairly large in some cases (500mb to 5gb).
-
There is another example on the GeoExamples Blogbennos– bennos2015年09月16日 07:53:15 +00:00Commented Sep 16, 2015 at 7:53
-
@bennos, tried the script on the blog but it returns a memory error when unpacking the array.PyMapr– PyMapr2015年09月17日 08:25:21 +00:00Commented Sep 17, 2015 at 8:25
-
I suggest you discuss this problem with Roger Veciana i Rovira, the author of the post, as he knows his code better than I do and maybe knows how to resolve the issuebennos– bennos2015年09月17日 08:36:34 +00:00Commented Sep 17, 2015 at 8:36
-
Changing the input raster from 16Bit unsigned to 8Bit unsigned solved the memory issue. However, it takes about the same amount of time to reclassify as the dmh126 script below.PyMapr– PyMapr2015年09月17日 09:52:28 +00:00Commented Sep 17, 2015 at 9:52
11 Answers 11
Instead of doing the reclassification as a double for loop described by dmh126, do it using np.where
:
# reclassification
lista[np.where( lista < 200 )] = 1
lista[np.where((200 < lista) & (lista < 400)) ] = 2
lista[np.where((400 < lista) & (lista < 600)) ] = 3
lista[np.where((600 < lista) & (lista < 800)) ] = 4
lista[np.where( lista > 800 )] = 5
On an array of 6163 by 3537 pixels (41.6mb) the classification is done in 1.59 seconds, where it takes 12min 41s using the double for loop. In total just a speedup of 478x.
Bottomline, never use a double for loop using numpy
-
2Thanks for the hint, but I think that will lead into a problem if the input classes overlap with the output classes. I don't want my new value to be changed by the next rule.etrimaille– etrimaille2016年05月11日 06:20:37 +00:00Commented May 11, 2016 at 6:20
-
@Gustry - Just ran into that problem here.relima– relima2016年08月29日 23:20:38 +00:00Commented Aug 29, 2016 at 23:20
-
2So check my answer below : gis.stackexchange.com/questions/163007/…etrimaille– etrimaille2016年08月31日 06:50:28 +00:00Commented Aug 31, 2016 at 6:50
Here's a basic example using rasterio and numpy:
import rasterio as rio
import numpy as np
with rio.open('~/rasterio/tests/data/rgb1.tif') as src:
# Read the raster into a (rows, cols, depth) array,
# dstack this into a (depth, rows, cols) array,
# the sum along the last axis (~= grayscale)
grey = np.mean(np.dstack(src.read()), axis=2)
# Read the file profile
srcprof = src.profile.copy()
classes = 5
# Breaks is an array of the class breaks: [ 0. 51. 102. 153. 204.]
breaks = (np.arange(classes) / float(classes)) * grey.max()
# classify the raster
classified = np.sum(np.dstack([(grey < b) for b in breaks]), axis=2).reshape(1, 400, 400).astype(np.int32)
# Update the file opts to one band
srcprof.update(count=1, nodata=None, dtype=classified.dtype)
with rio.open('/tmp/output.tif', 'w', **srcprof) as dst:
# Write the output
dst.write(classified)
-
Useful answer that works fine! Any idea how can the classification be reversed? I mean lower values in the bin getting high re-classified values instead, which will be reduced as the bin increases. So instead of having a product going from 1 to 5, we have the reverse from 5 to 1. Thanks!Alex K– Alex K2020年08月06日 18:26:58 +00:00Commented Aug 6, 2020 at 18:26
Just to complete the answer from @Mattijn, I think that will lead into a problem if the input classes overlap with the output classes. I don't want my new value to be changed by the next rule.
I don't know if I loose speed but I should do a deep copy :
list_dest = lista.copy()
list_dest[np.where( lista < 0 )] = 0
list_dest[np.where((0 <= lista) & (lista <= 1)) ] = 1
list_dest[np.where((1 < lista) & (lista <= 5)) ] = 2
list_dest[np.where( 5 < lista )] = 3
Here you have a simple python script for reclassification, I wrote it and it works for me:
from osgeo import gdal
driver = gdal.GetDriverByName('GTiff')
file = gdal.Open('/home/user/workspace/raster.tif')
band = file.GetRasterBand(1)
lista = band.ReadAsArray()
# reclassification
for j in range(file.RasterXSize):
for i in range(file.RasterYSize):
if lista[i,j] < 200:
lista[i,j] = 1
elif 200 < lista[i,j] < 400:
lista[i,j] = 2
elif 400 < lista[i,j] < 600:
lista[i,j] = 3
elif 600 < lista[i,j] < 800:
lista[i,j] = 4
else:
lista[i,j] = 5
# create new file
file2 = driver.Create( 'raster2.tif', file.RasterXSize , file.RasterYSize , 1)
file2.GetRasterBand(1).WriteArray(lista)
# spatial ref system
proj = file.GetProjection()
georef = file.GetGeoTransform()
file2.SetProjection(proj)
file2.SetGeoTransform(georef)
file2.FlushCache()
Just change the ranges.
I hope it will help.
-
3You should close
file2
either withdel file2
orfile2 = None
to make sure it gets written to disk..FlushCache()
only influences GDALs internal tile cache.Kersten– Kersten2015年09月15日 13:30:21 +00:00Commented Sep 15, 2015 at 13:30 -
@dmh126, I modified the ranges and the script works. However, it is not very quick (quick being debatable). The input raster was about 120mb and took about 15 min to complete.With the aid of a commercial remote sensing package it takes seconds. Any recommendations on decreasing processing time?PyMapr– PyMapr2015年09月17日 08:23:43 +00:00Commented Sep 17, 2015 at 8:23
-
I think that multithreading can help. You can try to use all of your cores, there is a question gis.stackexchange.com/questions/162978/…dmh126– dmh1262015年09月17日 09:07:48 +00:00Commented Sep 17, 2015 at 9:07
-
Doesn't make sense to use a double for loop, see answer belowMattijn– Mattijn2016年01月21日 08:43:33 +00:00Commented Jan 21, 2016 at 8:43
-
Right, the double loop and per-element reclassification is the slowest of all possible ways to do this. Use the powerful parts of numpy like ufuncs: docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html.sgillies– sgillies2016年05月27日 15:57:07 +00:00Commented May 27, 2016 at 15:57
In some cases, numpy digitize can be useful to quickly go from ranges to bins.
import rasterio
import numpy as np
with rasterio.open('my_raster.tif') as src:
array = src.read()
profile = src.profile
bins = np.array([-1.,-0.7,-0.4, 0.2, 1])
inds = np.digitize(array, bins)
with rasterio.open('output_raster.tif', 'w', **profile) as dst:
dst.write(inds)
Here is another Rasterio approach that I hacked together using the Rasterio Cookbook and @Mattijn's answer.
import rasterio
import numpy as np
with rasterio.open('input_raster.tif') as src:
# Read as numpy array
array = src.read()
profile = src.profile
# Reclassify
array[np.where(array == 0)] = 4
array[np.where(array == 2)] = 1
# and so on ...
with rasterio.open('output_raster.tif', 'w', **profile) as dst:
# Write to disk
dst.write(array)
Apply a dictionary directly to array: https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key
source_raster = rio.open(r'source_raster.tif')
my_dict = {source_raster.nodata: 4,
0:4,
1:1,
2:2,
3:3,
4:3,
5:4,
6:5,
7:5,
8:6,
9:7,
10:8,
100: source_raster.nodata,
255: source_raster.nodata}
source_array = source_raster.read(1)
target_array = np.vectorize(my_dict.__getitem__)(source_array)
profile = source_raster.profile()
with rio.open(r'target_raster.tif', 'w', **profile) as dest:
dest.write(target_array, 1)
If the source raster is too large for reading into memory, use rasterio windows to serialize the process. Read a windowed portion from the source raster, apply the dict and write the window-content to the target file.
-
1I liked the idea of using
np.vectorize
but it turns outfor key, value in reclass_dct.items(): array[array == key] = value
is a fair amount faster.PyMapr– PyMapr2024年05月14日 14:55:48 +00:00Commented May 14, 2024 at 14:55
With raster RGB color table support :
import numpy as np
from osgeo import gdal
path_inDs = "/data/OCS_2016.extract.tif"
path_outDs = "/data/OCS_2016.postpython.tif"
driver = gdal.GetDriverByName('GTiff')
file = gdal.Open(path_inDs)
if file is None:
print ('Could not open image file')
sys.exit(1)
band = file.GetRasterBand(1)
lista = band.ReadAsArray()
# reclassification
lista[np.where(lista == 31)] = 16
# create new file
file2 = driver.Create(path_outDs, file.RasterXSize , file.RasterYSize , 1, gdal.GPI_RGB)
file2.GetRasterBand(1).WriteArray(lista)
# spatial ref system
proj = file.GetProjection()
georef = file.GetGeoTransform()
meta = file.GetMetadata()
colors = file.GetRasterBand(1).GetRasterColorTable()
file2.SetProjection(proj)
file2.SetGeoTransform(georef)
file2.SetMetadata(meta)
file2.GetRasterBand(1).SetRasterColorTable(colors)
file2.FlushCache()
del file2
A slightly different alternative could be the following:
import numpy as np
from osgeo import gdal
original = gdal.Open('**PATH**\\origianl_raster.tif')
# read the original file
band = original.GetRasterBand(1) # assuming that the file has only one band
band_array = band.ReadAsArray()
#create a new array with reclassified values
new_array = np.where(band_array == np.nan, 4,
np.where(band_array == 0, 4,
np.where(band_array == 1, 1,
np.where(band_array == 2, 2,
np.where(band_array == 3, 3,
np.where(band_array == 4, 3,
np.where(band_array == 5, 4,
np.where(band_array == 6, 5,
np.where(band_array == 7, 5,
np.where(band_array == 8, 6,
np.where(band_array == 9, 7,
np.where(band_array == 10, 8,
np.where(band_array == 100, np.nan, np.nan)))))))))))))
# the last line also includes values equal to 255, as they are the only ones left
# create and save reclassified raster as a new file
outDs = gdal.GetDriverByName('GTiff').Create("**PATH**\\reclassified_raster.tif", original.RasterXSize, original.RasterYSize, 1, gdal.GDT_Float32)
outBand = outDs.GetRasterBand(1)
outBand.WriteArray(new_array)
outDs.SetGeoTransform(original.GetGeoTransform())
outDs.SetProjection(original.GetProjection())
# flush cache
outDs.FlushCache()
This script plays with numpy.where (https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html): in all the steps apart from the last one, instead of returning a value when the condition is not satisfied, it returns another np.where. And it keeps going until all possible values of the raster are considered.
Here's an approach for arbitrary reclassification of integer rasters that avoids using a million calls to np.where
. Rasterio bits taken from @Aaron's answer:
import rasterio
import numpy as np
# Build a "lookup array" where the index is the original value and the value
# is the reclassified value. Setting all of the reclassified values is cheap
# because the memory is only allocated once for the lookup array.
lookup = np.arange(255, dtype=np.uint8)
lookup[5] = 10
lookup[6] = 100
lookup[7] = 200
with rasterio.open('input_raster.tif') as src:
# Read as numpy array
array = src.read()
profile = src.profile
# Reclassify in a single operation using broadcasting
array = lookup[array]
with rasterio.open('output_raster.tif', 'w', **profile) as dst:
# Write to disk
dst.write(array)
I combined @dmh126, @mattijn, and @etrimaille way
import numpy as np
from osgeo import gdal
driver = gdal.GetDriverByName('GTiff')
file = gdal.Open('example.tif')
band = file.GetRasterBand(1)
lista = band.ReadAsArray()
# reclassification
list_dest = lista.copy()
list_dest[np.where( lista < 0 )] = 0
list_dest[np.where((0 <= lista) & (lista <= 1)) ] = 1
list_dest[np.where((1 < lista) & (lista <= 5)) ] = 2
list_dest[np.where( 5 < lista )] = 3
# create new file
file2 = driver.Create( 'example_reclassed.tif', file.RasterXSize , file.RasterYSize , 1)
file2.GetRasterBand(1).WriteArray(list_dest)
# spatial ref system
proj = file.GetProjection()
georef = file.GetGeoTransform()
file2.SetProjection(proj)
file2.SetGeoTransform(georef)
file2.FlushCache()