12

I would like to reclassify a raster file from a raster with 10 classes to a raster with 8 classes using Python, GDAL and/or numpy. The classes are represented as integers. I have tried following the steps from this post Reclassifying rasters using GDAL and Python? , the numpy.equal doc and also gdal_calc doc. However, to no avail.

The raster file to be reclassified has integer values ranging from 0 to 11 and also include values 100 and 255. The following show the reclass (from value : to value):

nodata : 4, 0 : 4, 1 : 1, 2 : 2, 3 : 3, 4 : 3, 5 : 4, 6 : 5, 7 : 5, 8 : 6, 9 : 7, 10 : 8, 100 : nodata, 255 : nodata,

What I have been able to do is select the raster file to be reclassified using tkinter.FileDialog and get the raster info such as geotransform, and pixel size with reclass = gdal.Open(raster, GA_ReadOnly).

How do I go about solving the above.

It might be worth mentioning that the rasters to be reclassified can be fairly large in some cases (500mb to 5gb).

ThomasG77
31.7k1 gold badge56 silver badges96 bronze badges
asked Sep 15, 2015 at 12:23
4
  • There is another example on the GeoExamples Blog Commented Sep 16, 2015 at 7:53
  • @bennos, tried the script on the blog but it returns a memory error when unpacking the array. Commented Sep 17, 2015 at 8:25
  • I suggest you discuss this problem with Roger Veciana i Rovira, the author of the post, as he knows his code better than I do and maybe knows how to resolve the issue Commented Sep 17, 2015 at 8:36
  • Changing the input raster from 16Bit unsigned to 8Bit unsigned solved the memory issue. However, it takes about the same amount of time to reclassify as the dmh126 script below. Commented Sep 17, 2015 at 9:52

11 Answers 11

21

Instead of doing the reclassification as a double for loop described by dmh126, do it using np.where:

# reclassification 
lista[np.where( lista < 200 )] = 1
lista[np.where((200 < lista) & (lista < 400)) ] = 2
lista[np.where((400 < lista) & (lista < 600)) ] = 3
lista[np.where((600 < lista) & (lista < 800)) ] = 4
lista[np.where( lista > 800 )] = 5

On an array of 6163 by 3537 pixels (41.6mb) the classification is done in 1.59 seconds, where it takes 12min 41s using the double for loop. In total just a speedup of 478x.

Bottomline, never use a double for loop using numpy

answered Jan 21, 2016 at 8:56
3
  • 2
    Thanks for the hint, but I think that will lead into a problem if the input classes overlap with the output classes. I don't want my new value to be changed by the next rule. Commented May 11, 2016 at 6:20
  • @Gustry - Just ran into that problem here. Commented Aug 29, 2016 at 23:20
  • 2
    So check my answer below : gis.stackexchange.com/questions/163007/… Commented Aug 31, 2016 at 6:50
11

Here's a basic example using rasterio and numpy:

import rasterio as rio
import numpy as np
with rio.open('~/rasterio/tests/data/rgb1.tif') as src:
 # Read the raster into a (rows, cols, depth) array,
 # dstack this into a (depth, rows, cols) array,
 # the sum along the last axis (~= grayscale)
 grey = np.mean(np.dstack(src.read()), axis=2)
 # Read the file profile
 srcprof = src.profile.copy()
classes = 5
# Breaks is an array of the class breaks: [ 0. 51. 102. 153. 204.]
breaks = (np.arange(classes) / float(classes)) * grey.max()
# classify the raster
classified = np.sum(np.dstack([(grey < b) for b in breaks]), axis=2).reshape(1, 400, 400).astype(np.int32)
# Update the file opts to one band
srcprof.update(count=1, nodata=None, dtype=classified.dtype)
with rio.open('/tmp/output.tif', 'w', **srcprof) as dst:
 # Write the output
 dst.write(classified)
answered May 27, 2016 at 15:42
1
  • Useful answer that works fine! Any idea how can the classification be reversed? I mean lower values in the bin getting high re-classified values instead, which will be reduced as the bin increases. So instead of having a product going from 1 to 5, we have the reverse from 5 to 1. Thanks! Commented Aug 6, 2020 at 18:26
9

Just to complete the answer from @Mattijn, I think that will lead into a problem if the input classes overlap with the output classes. I don't want my new value to be changed by the next rule.

I don't know if I loose speed but I should do a deep copy :

list_dest = lista.copy()
list_dest[np.where( lista < 0 )] = 0
list_dest[np.where((0 <= lista) & (lista <= 1)) ] = 1
list_dest[np.where((1 < lista) & (lista <= 5)) ] = 2
list_dest[np.where( 5 < lista )] = 3
answered May 11, 2016 at 6:25
8

Here you have a simple python script for reclassification, I wrote it and it works for me:

from osgeo import gdal
driver = gdal.GetDriverByName('GTiff')
file = gdal.Open('/home/user/workspace/raster.tif')
band = file.GetRasterBand(1)
lista = band.ReadAsArray()
# reclassification
for j in range(file.RasterXSize):
 for i in range(file.RasterYSize):
 if lista[i,j] < 200:
 lista[i,j] = 1
 elif 200 < lista[i,j] < 400:
 lista[i,j] = 2
 elif 400 < lista[i,j] < 600:
 lista[i,j] = 3
 elif 600 < lista[i,j] < 800:
 lista[i,j] = 4
 else:
 lista[i,j] = 5
# create new file
file2 = driver.Create( 'raster2.tif', file.RasterXSize , file.RasterYSize , 1)
file2.GetRasterBand(1).WriteArray(lista)
# spatial ref system
proj = file.GetProjection()
georef = file.GetGeoTransform()
file2.SetProjection(proj)
file2.SetGeoTransform(georef)
file2.FlushCache()

Just change the ranges.

I hope it will help.

Aaron
52k30 gold badges161 silver badges326 bronze badges
answered Sep 15, 2015 at 12:46
5
  • 3
    You should close file2 either with del file2 or file2 = None to make sure it gets written to disk. .FlushCache() only influences GDALs internal tile cache. Commented Sep 15, 2015 at 13:30
  • @dmh126, I modified the ranges and the script works. However, it is not very quick (quick being debatable). The input raster was about 120mb and took about 15 min to complete.With the aid of a commercial remote sensing package it takes seconds. Any recommendations on decreasing processing time? Commented Sep 17, 2015 at 8:23
  • I think that multithreading can help. You can try to use all of your cores, there is a question gis.stackexchange.com/questions/162978/… Commented Sep 17, 2015 at 9:07
  • Doesn't make sense to use a double for loop, see answer below Commented Jan 21, 2016 at 8:43
  • Right, the double loop and per-element reclassification is the slowest of all possible ways to do this. Use the powerful parts of numpy like ufuncs: docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html. Commented May 27, 2016 at 15:57
3

In some cases, numpy digitize can be useful to quickly go from ranges to bins.

import rasterio
import numpy as np
with rasterio.open('my_raster.tif') as src: 
 array = src.read()
 profile = src.profile
 bins = np.array([-1.,-0.7,-0.4, 0.2, 1]) 
 inds = np.digitize(array, bins)
with rasterio.open('output_raster.tif', 'w', **profile) as dst:
 dst.write(inds)
answered May 9, 2019 at 6:12
2

Here is another Rasterio approach that I hacked together using the Rasterio Cookbook and @Mattijn's answer.

import rasterio
import numpy as np
with rasterio.open('input_raster.tif') as src: 
 # Read as numpy array
 array = src.read()
 profile = src.profile
 # Reclassify
 array[np.where(array == 0)] = 4 
 array[np.where(array == 2)] = 1
 # and so on ... 
with rasterio.open('output_raster.tif', 'w', **profile) as dst:
 # Write to disk
 dst.write(array)
answered Mar 29, 2018 at 5:06
1

Apply a dictionary directly to array: https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key

source_raster = rio.open(r'source_raster.tif')
my_dict = {source_raster.nodata: 4, 
0:4, 
1:1,
2:2,
3:3,
4:3, 
5:4,
6:5,
7:5,
8:6,
9:7,
10:8,
100: source_raster.nodata,
255: source_raster.nodata}
source_array = source_raster.read(1)
target_array = np.vectorize(my_dict.__getitem__)(source_array)
profile = source_raster.profile()
with rio.open(r'target_raster.tif', 'w', **profile) as dest:
 dest.write(target_array, 1)

If the source raster is too large for reading into memory, use rasterio windows to serialize the process. Read a windowed portion from the source raster, apply the dict and write the window-content to the target file.

answered May 1, 2024 at 15:30
1
  • 1
    I liked the idea of using np.vectorize but it turns out for key, value in reclass_dct.items(): array[array == key] = value is a fair amount faster. Commented May 14, 2024 at 14:55
0

With raster RGB color table support :

import numpy as np
from osgeo import gdal
path_inDs = "/data/OCS_2016.extract.tif"
path_outDs = "/data/OCS_2016.postpython.tif"
driver = gdal.GetDriverByName('GTiff')
file = gdal.Open(path_inDs)
if file is None:
 print ('Could not open image file')
 sys.exit(1)
band = file.GetRasterBand(1)
lista = band.ReadAsArray()
# reclassification
lista[np.where(lista == 31)] = 16
# create new file
file2 = driver.Create(path_outDs, file.RasterXSize , file.RasterYSize , 1, gdal.GPI_RGB)
file2.GetRasterBand(1).WriteArray(lista)
# spatial ref system
proj = file.GetProjection()
georef = file.GetGeoTransform()
meta = file.GetMetadata()
colors = file.GetRasterBand(1).GetRasterColorTable()
file2.SetProjection(proj)
file2.SetGeoTransform(georef)
file2.SetMetadata(meta)
file2.GetRasterBand(1).SetRasterColorTable(colors)
file2.FlushCache()
del file2
answered Jan 23, 2020 at 16:18
0

A slightly different alternative could be the following:

import numpy as np
from osgeo import gdal
original = gdal.Open('**PATH**\\origianl_raster.tif')
# read the original file
band = original.GetRasterBand(1) # assuming that the file has only one band
band_array = band.ReadAsArray()
#create a new array with reclassified values
new_array = np.where(band_array == np.nan, 4, 
 np.where(band_array == 0, 4,
 np.where(band_array == 1, 1,
 np.where(band_array == 2, 2,
 np.where(band_array == 3, 3,
 np.where(band_array == 4, 3, 
 np.where(band_array == 5, 4,
 np.where(band_array == 6, 5,
 np.where(band_array == 7, 5,
 np.where(band_array == 8, 6, 
 np.where(band_array == 9, 7,
 np.where(band_array == 10, 8,
 np.where(band_array == 100, np.nan, np.nan))))))))))))) 
 # the last line also includes values equal to 255, as they are the only ones left
# create and save reclassified raster as a new file
outDs = gdal.GetDriverByName('GTiff').Create("**PATH**\\reclassified_raster.tif", original.RasterXSize, original.RasterYSize, 1, gdal.GDT_Float32)
outBand = outDs.GetRasterBand(1)
outBand.WriteArray(new_array)
outDs.SetGeoTransform(original.GetGeoTransform())
outDs.SetProjection(original.GetProjection())
# flush cache
outDs.FlushCache()

This script plays with numpy.where (https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html): in all the steps apart from the last one, instead of returning a value when the condition is not satisfied, it returns another np.where. And it keeps going until all possible values of the raster are considered.

answered Mar 11, 2020 at 10:54
0

Here's an approach for arbitrary reclassification of integer rasters that avoids using a million calls to np.where. Rasterio bits taken from @Aaron's answer:

import rasterio
import numpy as np
# Build a "lookup array" where the index is the original value and the value
# is the reclassified value. Setting all of the reclassified values is cheap 
# because the memory is only allocated once for the lookup array.
lookup = np.arange(255, dtype=np.uint8)
lookup[5] = 10
lookup[6] = 100
lookup[7] = 200
with rasterio.open('input_raster.tif') as src:
 # Read as numpy array
 array = src.read()
 profile = src.profile
 # Reclassify in a single operation using broadcasting
 array = lookup[array]
with rasterio.open('output_raster.tif', 'w', **profile) as dst:
 # Write to disk
 dst.write(array)
answered Mar 25, 2021 at 18:21
0

I combined @dmh126, @mattijn, and @etrimaille way

import numpy as np
from osgeo import gdal
driver = gdal.GetDriverByName('GTiff')
file = gdal.Open('example.tif')
band = file.GetRasterBand(1)
lista = band.ReadAsArray()
# reclassification
list_dest = lista.copy()
list_dest[np.where( lista < 0 )] = 0
list_dest[np.where((0 <= lista) & (lista <= 1)) ] = 1
list_dest[np.where((1 < lista) & (lista <= 5)) ] = 2
list_dest[np.where( 5 < lista )] = 3
# create new file
file2 = driver.Create( 'example_reclassed.tif', file.RasterXSize , file.RasterYSize , 1)
file2.GetRasterBand(1).WriteArray(list_dest)
# spatial ref system
proj = file.GetProjection()
georef = file.GetGeoTransform()
file2.SetProjection(proj)
file2.SetGeoTransform(georef)
file2.FlushCache()
answered Oct 21, 2022 at 23:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.