I was wondering if there's a way to read a Sentinel-2 zipfile directly into memory through python without extracting the whole thing or selected contents.
I figured since GDAL does have a driver for the S2 format, it should be somehow possible to open the zipfile in python and then read the bands into arrays separately.
Another approach I was thinking about is the zipfile
module in python. I came as far as accessing the single files in the archive, but I haven't managed yet to read them into memory or pass them over to GDAL.
Edit:
Sorry, I forgot to mention I'm generally looking for a platform-independent solution
Edit2:
Also, I intend to stay away from ESA's snappy
Solution:
Alright, after the helpful tips I got here I finally implemented the following solution. I ended up installing gdal 2.1.3 from conda-forge
inside a new environment to bypass any dependency issues. There's still an issue with the UTF encoding using vsizip
which can be fixed setting the corresponding environment variable as discribed in this issue on github.
#!/usr/bin/env python
import gdal
from zipfile import ZipFile
import re
import os
## vsizip bugfix
os.environ['CPL_ZIP_ENCODING'] = 'UTF-8'
#choose tile
tile = 'T36PXV'
tileREXP = re.compile(r'.*(%s).*B(02|03|04|05|06|07|08|8A|11|12).jp2$' %tile)
# sort bands 2 - 12 with 8A after 8
bndIX = [1,2,3,4,5,6,7,9,10,8]
with open('S2_FILELIST.txt') as flist:
for filename in flist:
with ZipFile(filename,'r') as zfile:
bands = [x for x in zfile.namelist() if re.match(tileREXP,x)]
bands.sort()
bands = [x for (y,x) in zip(bndIX,bands)]
for b in bands:
ds = gdal.Open('/vsizip/%s/%s' %(filename,b))
arr = ds.ReadAsArray()
# DO STUFF
ds = None
-
You can mount the zip file. Many tools can do that. A lot Linux Distributions have that option preinstalled (e.g. Ubuntu). Tools for Linux: avfs, fuse-zip, gvfs, ... I know there are tools for Windows too, but I cannot name them.pLumo– pLumo2017年04月20日 14:43:18 +00:00Commented Apr 20, 2017 at 14:43
1 Answer 1
GDAL>= 2.1 is able to open the Sentinel-2 .zip
file directly. For instance (from the SENTINEL2 driver documentation page):
$ gdalinfo S2A_OPER_PRD_MSIL1C_PDMC_20150818T101440_R022_V20150813T102406_20150813T102406.zip
Then you could convert subdataset into a VRT
file (without extracting any .jp2
file). This is an useful example from the driver documentation:
$ python -c "import sys; from osgeo import gdal; ds = gdal.Open(sys.argv[1]); open(sys.argv[2], 'wb').write(ds.GetMetadata('xml:VRT')[0].encode('utf-8'))" \
SENTINEL2_L1C:S2A_OPER_MTD_SAFL1C_PDMC_20150818T101440_R022_V20150813T102406_20150813T102406.xml:10m:EPSG_32632 10m.vrt
Note: The .xml
metadata file can be read directly from the .zip
file thanks to the vsizip virtual file system: /vsizip/S2XXXXX.zip/metadata.xml
-
Yes, I mentioned that GDAL has the driver ... but I need to read the files with the GDAL module in python ... I don't want to do anything command line ... otherwhise I could as well extract the files through python and then read them ... but my plan would be to minimize the I/O by zipping directly into memory OR reading the entire dataset with the gdal module in Python if possible. Maybe theres a pythonic version of what you're suggesting, opening the dataset and then creating a vrt and reading it as arrayVal– Val2017年04月20日 15:24:03 +00:00Commented Apr 20, 2017 at 15:24
-
I've just added further useful info extracted from the driver documentation. I think that it's a good starting point.Antonio Falciano– Antonio Falciano2017年04月20日 16:57:46 +00:00Commented Apr 20, 2017 at 16:57
-
I successfully ran an example analogous to the one you posted above from the driver documentation. But this example expects the zipfile to be extracted entirely. You mentioned the
vsizip
system. This is also working fine for me through commandline (I can easily callgdalinfo
et al.) but when I want to go through python, I can't really manage. I come as far as opening thexml
metadata file to retrieve any kind of info, but I can't seem to actually access the imagery. If I use avsizip
path all the way to ajp2
band, I'm receiving an error (while it's working through command line)Val– Val2017年04月21日 08:06:51 +00:00Commented Apr 21, 2017 at 8:06 -
Something like
ds = gdal.open('/vsizip/path/to/S2XXXXX.zip/path/to/file')
should work. Is the above Python example working for you? IMHO the solution is there.Antonio Falciano– Antonio Falciano2017年04月21日 08:55:48 +00:00Commented Apr 21, 2017 at 8:55 -
Ok, it does seem to be a version issue ... the gdal I was using with anaconda was 2.1.0 ... once I did the cumbersome upgrade to the conda-forge version 2.1.3, I could indeed read the bands separately out of the zipfile. In light of this, I will delete my previous commentVal– Val2017年04月21日 13:43:09 +00:00Commented Apr 21, 2017 at 13:43
Explore related questions
See similar questions with these tags.