I'm working with a table in the database that has more than 1 million records and a total load of this table takes about 8 hours because it has geographic data, the table is a featureclass. Basically, the solution I see is to apply threads, I already tried using concurrente.futures, but for some reason, this solution cannot open the table folder, the next solution is to use the multiprocessing library, but I am not able to understand how to add it to this project. Or how can I use the threads in this? The important thing would be to run in parallel.
def divide_data(query, batch_size):
con = Connection.connect()
cur = con.cursor()
cur.execute(query)
while True:
batch = cur.fetchmany(batch_size)
if not batch:
break
yield batch
cur.close()
con.close()
def insert_table(input, query, camp, batch_size=10000):
with arcpy.da.InsertCursor(input, camp) as cursor:
for batch in divide_data(query, batch_size):
for r in tqdm(batch, desc='Insert lines'):
id = r[0]
WKB = r[1].read()
row = [id, arcpy.FromWKB(WKB)]
for i in range(2, len(camp)):
row.append(r[i])
try:
cursor.insertRow(row)
except Exception as e:
print(e)
pass
del cursor
pass
in lieu ofcursor.insertRow(row)
, which database is being used, whether a spatial index is present on the table, the number of vertices in each feature,... I just created a million row point table in PostgreSQL 11 via SQL in 22 seconds (including adding the spatial index), then queried it into a WKT list in 22 sec, then inserted it back into a second table in 121 sec