How use multiprocessing with function arcpy.da.InsertCursor

Asked 2 years, 6 months ago

Viewed 272 times

I'm working with a table in the database that has more than 1 million records and a total load of this table takes about 8 hours because it has geographic data, the table is a featureclass. Basically, the solution I see is to apply threads, I already tried using concurrente.futures, but for some reason, this solution cannot open the table folder, the next solution is to use the multiprocessing library, but I am not able to understand how to add it to this project. Or how can I use the threads in this? The important thing would be to run in parallel.

def divide_data(query, batch_size):
 con = Connection.connect()
 cur = con.cursor()
 cur.execute(query)
 while True:
 batch = cur.fetchmany(batch_size)
 if not batch:
 break
 yield batch
 cur.close()
 con.close()
def insert_table(input, query, camp, batch_size=10000):
 with arcpy.da.InsertCursor(input, camp) as cursor:
 for batch in divide_data(query, batch_size):
 for r in tqdm(batch, desc='Insert lines'):
 id = r[0]
 WKB = r[1].read()
 row = [id, arcpy.FromWKB(WKB)]
 for i in range(2, len(camp)):
 row.append(r[i])
 try:
 cursor.insertRow(row)
 except Exception as e:
 print(e)
 pass
 del cursor

Improve this question

edited Mar 31, 2023 at 1:51

Vince's user avatar

Vince

20.5k16 gold badges49 silver badges65 bronze badges

asked Mar 30, 2023 at 19:27

Felipe L's user avatar

Felipe L Felipe L

11 bronze badge

2

ArcGIS Desktop objects are not thread safe, you will not be able declare more than one InsertCursor on a feature class/table as the first will write lock the data. The only exception is an enterprise geodatabase.

Michael Stimson
– Michael Stimson

2023年03月31日 02:50:00 +00:00
Commented Mar 31, 2023 at 2:50
3

Eight hours to load a million records? That's probably seven and three quarters hours too long (in a single thread solution). There's a lot missing here, including an indication of how long the processing takes with a pass in lieu of cursor.insertRow(row), which database is being used, whether a spatial index is present on the table, the number of vertices in each feature,... I just created a million row point table in PostgreSQL 11 via SQL in 22 seconds (including adding the spatial index), then queried it into a WKT list in 22 sec, then inserted it back into a second table in 121 sec

Vince
– Vince

2023年03月31日 03:02:25 +00:00
Commented Mar 31, 2023 at 3:02
1

The same load code with a GIST index on the geometry column increased runtime to 132 sec.

Vince
– Vince

2023年03月31日 03:07:26 +00:00
Commented Mar 31, 2023 at 3:07

Add a comment |

0

Sorted by: Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

How use multiprocessing with function arcpy.da.InsertCursor

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How use multiprocessing with function arcpy.da.InsertCursor

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions