I have a raster image with 3 bands. I would like to convert this image to a csv
file where each row will be one pixel and each column will be one band, so that I can easily see the three values each pixel got.
This is how I have tried to do it:
import rasterio
import rasterio.features
import rasterio.warp
from matplotlib import pyplot
from rasterio.plot import show
import pandas as pd
import numpy as np
img=rasterio.open("01032020.tif")
show(img,0)
#read image
array=img.read()
#create np array
array=np.array(array)
#create pandas df
dataset = pd.DataFrame({'Column1': [array[0]], 'Column2': [array[1]],'Column3': [array[2]]})
dataset
and also like this:
dataset = pd.DataFrame({'Column1': [array[0,:,:]], 'Column2': [array[1,:,:]],'Column3': [array[2:,:]]})
but i'm getting something weird like this table: enter image description here
I have also tried:
index = [i for i in range(0, len(array[0]))]
dataset = pd.DataFrame({'Column1': array[0], 'Column2': array[1],'Column3': array[2]},index=index)
dataset
but then I get the number of the rows I have and it's still not good: enter image description here
what do I do wrong?
My goal
Get one pandas table, where each row is a pixel, and it should have 3 columns, one for each band.
3 Answers 3
Quick solution
pd.DataFrame(array.reshape([3,-1]).T)
Explanation
- Take array of shape
(3, x, y)
and flatten out the 2nd and 3rd dimension. From the numpy docs: One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
reshaped_array = array.reshape([3,-1])
- Transpose array to get array of shape
(x*y, 3)
transposed_array = reshaped_array.T
- Build DataFrame
pd.DataFrame(transposed_array)
-
1thank you for yoir aswer, is ther any way to preserve the coordinates?ReutKeller– ReutKeller2020年10月30日 10:14:22 +00:00Commented Oct 30, 2020 at 10:14
-
2You will need one or two extra columns that store the index/indices of the original image. I think that's for a new question. -> gis.stackexchange.com/questions/askStefanBrand_EOX– StefanBrand_EOX2020年10月30日 10:32:39 +00:00Commented Oct 30, 2020 at 10:32
Or another simple solution with numpy ravel():
import rasterio as rio
src= rio.open('myraster.tif')
# number of bands
src.count
3
# read bands
array = src.read()
# convert to a DataFrame
import pandas as pd
df = pd.DataFrame()
df['band1'] = array[0].ravel()
df['band2'] = array[1].ravel()
df['band3'] = array[2].ravel()
df.head(2)
band1 band2 band 3
0 250 249 254
1 250 249 254
df.tail(2) # last
band1 band2 band 3
78609002 190 182 180
78609003 190 186 174
Or
You can check that here http://shreshai.blogspot.com/
The implementation is for a multiband raster and also keeps the coordinates
with rasterio.open(RASTER_PATH) as src:
#read image
image= src.read()
# transform image
bands,rows,cols = np.shape(image)
image1 = image.reshape (rows*cols,bands)
print(np.shape(image1))
# bounding box of image
l,b,r,t = src.bounds
#resolution of image
res = src.res
res = src.res
# meshgrid of X and Y
x = np.arange(l,r, res[0])
y = np.arange(t,b, -res[0])
X,Y = np.meshgrid(x,y)
print (np.shape(X))
# flatten X and Y
newX = np.array(X.flatten())
newY = np.array(Y.flatten())
print (np.shape(newX))
# join XY and Z information
export = np.column_stack((newX, newY, image1))
Explore related questions
See similar questions with these tags.