3

I have a Postgres DB with a lot of lines that are too long and so now I want to split those lines into smaller ones.

This is why I started writing a little python script for that job. Basically it fetches my lines like this:

cursor.execute("SELECT ogc_fid, id, ST_AsGeoJSON(wkb_geometry) as geometry, ST_Length(wkb_geometry) as length, height FROM cont OFFSET " + str(offset) + " LIMIT " + str(limit) )

Here's my first problem. In order to further work with the data I have to select the geometry as JSON. I'll come to that later.

Now I iterate over the lines and make something like this:

sqlCommand = "INSERT INTO " + table + " (ogc_fid, id, wkb_geometry, height) VALUES(" + ogc_fid + ", " + row_id + ", ST_Line_Substring(ST_GeomFromGeoJSON(\'" + geometry + "\'), " + str(rangeFrom) + ", " + str(min(rangeFrom + stepWidth, 1)) + "), " + str(height) + ");"
cursor.execute(sqlCommand)

So if I didn't use ST_AsGeoJSON in the select, I couldn't append it to my command cause it can only append a string.

That way my whole script is very slow. It always has to convert my geometries into JSON and then back from JSON.

Is there a datatype that python can handle?

Is there a more efficient way of dealing with geometry fields?

I heard of GeoDjango. Could that help here?

PolyGeo
65.5k29 gold badges115 silver badges349 bronze badges
asked May 14, 2015 at 8:18
4
  • 1
    Why not do it in a single query? Commented May 14, 2015 at 8:47
  • What Jakub said. Do it directly in Postgres. It isn't at all clear why you would want all the overhead of JSON for this. If you must do it in Python, use a library that deals with geometries directly, such as shapely of fiona. Commented May 14, 2015 at 9:17
  • Well... I tried doing it as a Postgres function, but as it would then all be one big transaction, it fails as there are too many queries. Basically what I need to do is to figure out if a line is too long and if so, I need to cut it into pieces. I know that the JSON part is slowing everything down like crazy... that's why I'd like to get rid of it... I'll have a look at shapely Commented May 14, 2015 at 13:42
  • There is no way of doing it directly as a postgres function but forcing it not to do everything in one transaction but to write the data after some loops, right? Commented May 14, 2015 at 13:45

1 Answer 1

4

You could try using geoalchemy2. Personally I would look at using the Object Relational Mapper (ORM) model for working with your data, for example:

from sqlalchemy import create_engine
# Enter your database connection below
engine = create_engine('postgresql://gis:gis@localhost/gis', echo=True)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Float
from geoalchemy2 import Geometry
Base = declarative_base()
class Cont(Base):
 __tablename__ = "cont"
 ogc_fid = Column(Integer, primary_key=True)
 id = Column(Integer)
 wkb_geometry = Column(Geometry("LINESTRING"))
 height = Column(Float)
 # Other database columns here

From there you can setup a session and easily query the database:

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()

And write a query to get back the lines you want to update:

query = session.query(Cont).filter(Cont.geom.ST_Length() > tolerance)

Queries are iterators, so it's easy to loop over them and work with the data, especially as geoalchemy2 has inbuilt shapely integration.

from geoalchemy2.shape import from_shape, to_shape
new_rows = []
for row in query:
 shapely_geometry = to_shape(row.wkb_geometry)
 # ... some operations to create `new_geom`
 new_rows.append(
 Cont(
 ogc_fid=row.ogc_fid,
 id=row.id,
 wkb_geometry=new_geom.wkt, #shapely geometries have a wkt property
 height=row.height
 )
 )

Then you can commit all your rows at once:

session.add_all(new_rows)
session.commit()

Or commit by chunk by splitting your new rows into chunks (credit):

for chunk in (new_rows[i:i+chunksize] for i in range(0, len(new_rows), chunksize)):
 session.add_all(chunk)
 session.commit()
answered May 15, 2015 at 4:55
3
  • you are super awesome!!!! thank you so much for your help!!!! Commented May 17, 2015 at 18:10
  • One more question... So, once I have a row, I'd like to figure out it's length. shapely_geometry.ST_Length() didn't did the trick for me. And once I have it I'd need to run ST_Line_Substring to split the line. Sorry to bother you, but my python skills are very limited... would be so super awesome if you might tell me how to do those two commands!!! Commented Jun 2, 2015 at 11:32
  • I managed to figure it out! What took 2 weeks before took 6hours now :) thanks again!!!! Commented Jun 3, 2015 at 6:06

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.