When I use gdal_array.SaveArray()
to create a raster, the newly created dataset appears to stay open in Python, preventing other processes from working with it. For instance, consider the following (super minimal) code:
>>> a = np.arange(300).reshape((3, 10, 10))
>>> gdal_array.SaveArray(a, "test.tif")
<osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x0458B968> >
>>> os.rename("test.tif", "test2.tif")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
WindowsError: [Error 32] The process cannot access the file because it is being used by another process
I am similarly prevented from moving, renaming, or deleting the file directly from Windows explorer, with the message The action can't be completed because the file is open in python.exe
. More importantly, I can't open the file some visualization/processing programs that I use. Once I exit python, the file is "released" and I can manipulate it to my heart's content.
It doesn't seem like the file has a name associated with it, so I can't close it as I would a raster that I had specifically opened:
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'a', 'gdal', 'gdal_array', 'np', 'os']
What causes this behavior? Is there a way to call SaveArray()
such that it doesn't keep the file opened after writing it? Or, a way to close the file from within Python?
In case it's important, my Python bindings are from gdal 1.11.1 for Python 2.7.8 on Windows 7.
1 Answer 1
Edit based on the comments below:
Assigning the gdal_array.SaveArray(a, "test.tif") call to a variable returns an osgeo.gdal.Dataset object that can be managed as a per the below gotchas. Using the above example this should work:
a = np.arange(300).reshape((3, 10, 10))
ds = gdal_array.SaveArray(a, "test.tif")
ds = None
os.rename("test.tif", "test2.tif")
Checkout the Python gotchas documentation: https://trac.osgeo.org/gdal/wiki/PythonGotchas
Specifically:
Saving and closing datasets/datasources
To save and close GDAL raster datasets or OGR vector datasources, the object needs to be dereferenced, such as setting it to None, a different value, or deleting the object. If there are more than one copies of the dataset or datasource object, then each copy needs to be dereferenced.
For example, creating and saving a raster dataset:
>>> from osgeo import gdal >>> driver = gdal.GetDriverByName('GTiff') >>> dst_ds = driver.Create('new.tif', 10, 15) >>> band = dst_ds.GetRasterBand(1) >>> arr = band.ReadAsArray() # raster values are all zero >>> arr[2, 4:] = 50 # modify some data >>> band.WriteArray(arr) # raster file still unmodified >>> band = None # dereference band to avoid gotcha described previously >>> dst_ds = None # save, close
The last dereference to the raster dataset writes the data modifications and closes the raster file. WriteArray(arr) does not write the array to disk, unless the GDAL block cache is full (typically 40 MB).
With some drivers, raster datasets can be intermittently saved without closing using FlushCache(). Similarly, vector datasets can be saved using SyncToDisk(). However, neither of these methods guarantee that the data are written to disk, so the preferred method is to deallocate as shown above.
-
I understand that it's important to close a referenced dataset in order to save changes. But in my specific case, there is no referenced dataset (at least at the scope of my script, I'm sure
SaveArray()
does something like this under the hood). The question is, if I'm usingSaveArray()
how do a close a dataset I don't have an object for?Joe– Joe2015年05月18日 22:12:06 +00:00Commented May 18, 2015 at 22:12 -
You might try assigning the gdal_array.SaveArray() to a variable and then
dir(variable)
on that to see the available methods. It looks like gdal_array is opening the dataset, and I suspect you need to either set that variable toNone
before theos.rename
call or call a method to close. Link to OpenArray source (called via gdal_array: gdal.org/python/osgeo.gdal_array-pysrc.html#OpenArray)Jay Laura– Jay Laura2015年05月19日 02:11:23 +00:00Commented May 19, 2015 at 2:11 -
It honestly never occurred to me to assign the SaveArray() call to a variable. Saving the array with eg
b = gdal_array.SaveArray(a, "test.tif")
creates the new dataset withb
as a standardgdal.Dataset
object which can be handled and closed as normal. If you want to edit that detail into your answer I'll accept it so you can have the points, if not I can self answer as suggested above with everything I've figured out.Joe– Joe2015年05月19日 14:35:51 +00:00Commented May 19, 2015 at 14:35
del a
does indeed do the trick. This isn't ideal, though as I sometimes useSaveArray()
in an interactive session (inelegant, I know, but effective) and want to check my results before deleting the data. Nevertheless, for want of a better solution, if you want to post that as an answer with some explanation I'll accept it.