0

I'm working with a table in the database that has more than 1 million records and a total load of this table takes about 8 hours because it has geographic data, the table is a featureclass. Basically, the solution I see is to apply threads, I already tried using concurrente.futures, but for some reason, this solution cannot open the table folder, the next solution is to use the multiprocessing library, but I am not able to understand how to add it to this project. Or how can I use the threads in this? The important thing would be to run in parallel.

def divide_data(query, batch_size):
 con = Connection.connect()
 cur = con.cursor()
 cur.execute(query)
 while True:
 batch = cur.fetchmany(batch_size)
 if not batch:
 break
 yield batch
 cur.close()
 con.close()
def insert_table(input, query, camp, batch_size=10000):
 with arcpy.da.InsertCursor(input, camp) as cursor:
 for batch in divide_data(query, batch_size):
 for r in tqdm(batch, desc='Insert lines'):
 id = r[0]
 WKB = r[1].read()
 row = [id, arcpy.FromWKB(WKB)]
 for i in range(2, len(camp)):
 row.append(r[i])
 try:
 cursor.insertRow(row)
 except Exception as e:
 print(e)
 pass
 del cursor
Vince
20.5k16 gold badges49 silver badges65 bronze badges
asked Mar 30, 2023 at 19:27
3
  • 2
    ArcGIS Desktop objects are not thread safe, you will not be able declare more than one InsertCursor on a feature class/table as the first will write lock the data. The only exception is an enterprise geodatabase. Commented Mar 31, 2023 at 2:50
  • 3
    Eight hours to load a million records? That's probably seven and three quarters hours too long (in a single thread solution). There's a lot missing here, including an indication of how long the processing takes with a pass in lieu of cursor.insertRow(row), which database is being used, whether a spatial index is present on the table, the number of vertices in each feature,... I just created a million row point table in PostgreSQL 11 via SQL in 22 seconds (including adding the spatial index), then queried it into a WKT list in 22 sec, then inserted it back into a second table in 121 sec Commented Mar 31, 2023 at 3:02
  • 1
    The same load code with a GIST index on the geometry column increased runtime to 132 sec. Commented Mar 31, 2023 at 3:07

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.