I'm writing a little GIS tool in Python and I want to display shapefiles. I've successfully gotten matplotlib working using this answer, but it is very slow and heavy to load any shapefile that happens to be over 5mb. The code follows:
if len(shpFilePath) > 0:
try:
shp = shapefile.Reader(shpFilePath)
plt.figure()
try:
for shape in shp.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
plt.plot(x, y)
plt.show(1)
except AssertionError:
self.error_popup('Error', 'Shapefile does not contain points.',
'Check feature type and try again')
except shapefile.ShapefileException:
self.error_popup('Error', 'Shapefile not found.', 'Ensure that the path is correct and try again.')
Is there anything I can do to improve the speed and the memory that this consumes? I'm sure the problem lies in the creation of the x and y lists.
I loaded a 10mb shapefile yesterday and it ended up using nearly 1gb of RAM just to display it. I've been looking at other libraries but it seems like matplotlib is the most frequently recommended one.
4 Answers 4
This is not a problem of Matplotlib but your script and the module you use for reading shapefiles
1) You know that there are points in the geometries of the Polygon shapefile thus eliminate try... except
2) you load and read the shapefile twice for x and y (memory)
for shape in shp.shapeRecords():
xy = [i for i in shape.shape.points[:]]
x = [i[0] for i in xy]
y = [i[1] for i in xy]
or directly
for shape in shp.shapeRecords():
xy = [i for i in shape.shape.points[:]]
x,y = zip(*[(j[0],j[1]) for j in xy])
3) You can also use the Geo_interface (look at Plot shapefile with matplotlib)
for shape in shp.shapeRecords():
poly = shape.shape.__geo_interface__
print(poly)
{'type': 'Polygon', 'coordinates': (((203602.55736766502, 89867.47994546698), (204061.86095852466, 89822.92064187612), (203983.02526755622, 89322.48538616339), (203684.82069737124, 89031.13609345393), (203280.35932631575, 89260.78788888374), (203184.3854416585, 89624.11759508614), (203602.55736766502, 89867.47994546698)),)}
And you have the GeoJSON representation of the geometry (Polygon). You can plot the Polygon as in the reference
The LinearRing of the Polygon
x,y = zip(*[(i[0],i[1]) for i in poly['coordinates'][0]])
And the nodes of the Polygon
4) The problem of Pyshp (shapefile) is that it loads the complete shapefile into memory and if the shapefile is too big...
You can use a generator (read the layer one feature by one feature)
def records(filename):
# generator
reader = shapefile.Reader(filename)
for sr in reader.shapeRecords():
geom = sr.shape.__geo_interface__
yield geom
features = records("a_polygon.shp")
features.next()
{'type': 'Polygon', 'coordinates': (((203602.55736766502, 89867.47994546698), (204061.86095852466, 89822.92064187612), (203983.02526755622, 89322.48538616339), (203684.82069737124, 89031.13609345393), (203280.35932631575, 89260.78788888374), (203184.3854416585, 89624.11759508614), (203602.55736766502, 89867.47994546698)),)}
Or directly
shapes = shapefile.Reader('a_polygon.shp')
shapes.iterShapes().next().__geo_interface__
{'type': 'Polygon', 'coordinates': (((203602.55736766502, 89867.47994546698), (204061.86095852466, 89822.92064187612), (203983.02526755622, 89322.48538616339), (203684.82069737124, 89031.13609345393), (203280.35932631575, 89260.78788888374), (203184.3854416585, 89624.11759508614), (203602.55736766502, 89867.47994546698)),)}
5) Or use a Python module that directly uses generators/iterators :Fiona
import fiona
shapes = fiona.open("a_polygon.shp")
first = shapes.next() # for for feature in shapes
print(first)
{'geometry': {'type': 'Polygon', 'coordinates': [[(203602.55736766502, 89867.47994546698), (204061.86095852466, 89822.92064187612), (203983.02526755622, 89322.48538616339), (203684.82069737124, 89031.13609345393), (203280.35932631575, 89260.78788888374), (203184.3854416585, 89624.11759508614), (203602.55736766502, 89867.47994546698)]]}, 'type': 'Feature', 'id': '0', 'properties': OrderedDict([(u'id', None)])}
print(first['geometry']['coordinates']
[[(203602.55736766502, 89867.47994546698), (204061.86095852466, 89822.92064187612), (203983.02526755622, 89322.48538616339), (203684.82069737124, 89031.13609345393), (203280.35932631575, 89260.78788888374), (203184.3854416585, 89624.11759508614), (203602.55736766502, 89867.47994546698)]]
-
Thank you for the awesome response! It hadn't occurred to me that I was loading the data twice. The information about using a generator was helpful as well.Ross Wardrup– Ross Wardrup2016年07月20日 20:35:33 +00:00Commented Jul 20, 2016 at 20:35
-
A good answer, but bad advice on #1.
try/except
doesn't add much overhead, and it is never a good idea to run without exception handlingMawg– Mawg2016年09月22日 12:55:09 +00:00Commented Sep 22, 2016 at 12:55
You can use geopandas for plotting as discussed in this answer.
You can also use pyshp as in following code
from descartes import PolygonPatch
import shapefile
sf=shapefile.Reader('shapefile')
fig = plt.figure()
ax = fig.gca()
for poly in sf.shapes():
poly_geo=poly.__geo_interface__
ax.add_patch(PolygonPatch(poly_geo, fc='#ffffff', ec='#000000', alpha=0.5, zorder=2 ))
ax.axis('scaled')
plt.show()
Using geopandas
, the code would look like:
import geopandas
df = geopandas.read_file(shpFilePath)
df.plot()
and this should also be faster (at least starting from geopandas 0.3.0)
Dowloading the shape file repository of Berlin, here are some of the solutions.
joris
This solution only takes 20s to execute.
import geopandas
import matplotlib.pyplot as plt
df = geopandas.read_file('berlin-latest-free.shp/gis_osm_landuse_a_free_1.dbf')
df.plot()
plt.savefig('test2.png')
It looks a bit as if the projection was not done at all or not done correctly.
ldocao
https://gis.stackexchange.com/a/152331/70242 - takes super long to execute
import matplotlib.pyplot as plt
import shapefile as shp # Requires the pyshp package
sf = shp.Reader('berlin-latest-free.shp/gis_osm_landuse_a_free_1.dbf')
plt.figure(dpi=300)
for shape in sf.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
plt.plot(x, y)
plt.savefig('test.png')