Optimizing arcpy UpdateCursor on large Polygon Feature Class and Table

Question 1

I'm attempting to update a numeric field using an arcpy.da.UpdateCursor. I'm applying the update to a Feature Layer and Table, which participate in a One to Many Relationship. The Feature layer is quite large, with 1,219,495 Polygon records. The table has even more records at 4,735,679 rows.

The update should only be applied if certain conditions are met. If the Child Table's Provider field has a value of "Optimum", it needs updated. If the Parent Feature's MaxDownload field is 1,000, it needs updated. I'm doing the queries first via SearchCursor, and saving the relevant IDs that satisfy the query. Then, in an edit session, I'm attempting to update the relevant fields using a where_clause to select the OIDs I need.

import datetime as dt
import arcpy
def log_it(msg):
 print(f"[{dt.datetime.now()}]: {msg}")
 return
def query_str_in_list(field_name, values):
 vals = "'" + "', '".join(values) + "'"
 return f"{field_name} IN ({vals})"
def query_int_in_list(field_name, values):
 vals = ", ".join([str(v) for v in values])
 return f"{field_name} IN ({vals})"
def main():
 start_time = dt.datetime.now()
 print(f"Starting script at {start_time}")
 arcpy.env.workspace = r"C:\dev\arcgis\NYS-DPS\NysDpsBroadbandMap2022円-06-01_Analysis2022円-06-01_Analysis\NysDpsBroadbandMap.gdb"
 global_ids = []
 provider_oids = []
 with arcpy.da.SearchCursor(
 "StandardProviders",
 ["ParentGlobalID", "OBJECTID"],
 where_clause="Provider = 'Optimum'",
 ) as search_cursor:
 for row in search_cursor:
 global_ids.append(row[0])
 provider_oids.append(row[1])
 polygon_oids = [
 row[0]
 for row in arcpy.da.SearchCursor(
 "StandardSummaryPolygons",
 ["OBJECTID"],
 where_clause=query_str_in_list("GlobalID", global_ids)
 + " AND MaxDownload = 1000",
 )
 ]
 log_it(
 f"Beginning to edit {len(provider_oids)} Provider Records and {len(polygon_oids)} Polygon Records"
 )
 with arcpy.da.Editor(arcpy.env.workspace) as edit_session:
 with arcpy.da.UpdateCursor(
 "StandardSummaryPolygons",
 ["OID@", "MaxDownload"],
 where_clause=query_int_in_list("OBJECTID", polygon_oids),
 ) as polygon_update_cursor:
 for idx, row in enumerate(polygon_update_cursor):
 log_it(f"{idx} Polygon Iterations")
 oid = row[0]
 if oid in polygon_oids:
 polygon_update_cursor.updateRow([oid, 940])
 with arcpy.da.UpdateCursor(
 "StandardProviders",
 ["OID@", "Download"],
 where_clause=query_int_in_list("OBJECTID", provider_oids),
 ) as provider_update_cursor:
 for idx, row in enumerate(provider_update_cursor):
 log_it(f"{idx} Provider Iterations")
 oid = row[0]
 if oid in provider_oids:
 provider_update_cursor.updateRow([oid, 940])
 end_time = dt.datetime.now()
 duration = end_time - start_time
 print(f"Finished at {end_time} after executing for {duration}")
 return
if __name__ == "__main__":
 main()

The script is going quite slow, and it appears that the Polygon Feature Class is the root cause. When iterating through the cursor, each iteration is taking 2 seconds. Iterating through the table gives the performance I would expect.

Here's a log snippet:

## Parent Table Log (Polygon Features)
Starting script at 2022年07月20日 14:13:51.937132
[2022年07月20日 14:13:59.535107]: Beginning to edit 276149 Provider Records and 212815 Polygon Records
[2022年07月20日 14:13:59.801700]: 0 Polygon Iterations
[2022年07月20日 14:14:01.015486]: 1 Polygon Iterations
[2022年07月20日 14:14:02.363565]: 2 Polygon Iterations
[2022年07月20日 14:14:03.780736]: 3 Polygon Iterations

## Child Table Log (Table Rows)
[2022年07月20日 14:19:10.037031]: 6740 Provider Iterations
[2022年07月20日 14:19:10.037031]: 6741 Provider Iterations
[2022年07月20日 14:19:10.038029]: 6742 Provider Iterations
[2022年07月20日 14:19:10.038029]: 6743 Provider Iterations

Am I simply running into physical limits due to the volume of the data? Are there any performance gains that could be implemented here?

Question 2

The IN list could be slowing down the query; the existing updateRow logic would let you go without it. You could see a 50% performance improvement if the source is switched from a network share to a local disk. If you're assigning a constant, using a selection set and a Calculate Field might be faster.

Question 3

Interestingly, if I remove the where_clause from the UpdateCursor, I get the expected performance on the first 1,728 records. Then it starts to slow down to 2s again. Quite strange.

Question 4

Thanks for the tip. I think this led me to my answer.

Question 5

If the IN list is 6700+ features, the SQL query is likely to be really inefficient (Oracle won't let the number of terms exceed 2000).

Question 6

The production data actually lives in an AGOL Hosted Feature Layer. In the script, I have everything batched into chunks of 1,000 to try and avoid REST issues, including the list comprehension SearchCursor that's in the example above. Would iterating over all of the rows in an UpdateCursor work?

Question 7

I believe this was a case of a corrupted file geodatabase. While interacting with @Vince in the comments, we uncovered that the first ~1700 records were performing normally. After that, things went haywire.

I tried to copy/paste the layers into a new file geodatabase, which has helped in the past. That did not work here. I went back to the FGDB source and got a fresh copy of the data, and performance returned to normal.

FWIW, this FGDB has been "Compressed" then later "Uncompressed", which probably complicates things.

Question 8

Compress/Uncompress shouldn't make a difference. Just Compress might make it faster (lower I/O load). Compact and/or Reindex might be able to correct this sort of issue.

Question 9

I had to Uncompress since Compressed makes a fgbd read-only. I'll have to try Compact, that makes a lot of sense. A reindex did not seem to help.

Question 10

Compact won't help a brand-new FGDB downloaded from AGOL.

vitale232 vitale232 1314 bronze badges · Accepted Answer · 2022-07-20 19:48:18Z

I believe this was a case of a corrupted file geodatabase. While interacting with @Vince in the comments, we uncovered that the first ~1700 records were performing normally. After that, things went haywire.

I tried to copy/paste the layers into a new file geodatabase, which has helped in the past. That did not work here. I went back to the FGDB source and got a fresh copy of the data, and performance returned to normal.

FWIW, this FGDB has been "Compressed" then later "Uncompressed", which probably complicates things.

Compress/Uncompress shouldn't make a difference. Just Compress might make it faster (lower I/O load). Compact and/or Reindex might be able to correct this sort of issue.
I had to Uncompress since Compressed makes a fgbd read-only. I'll have to try Compact, that makes a lot of sense. A reindex did not seem to help.

Stack Exchange Network

Optimizing arcpy UpdateCursor on large Polygon Feature Class and Table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimizing arcpy UpdateCursor on large Polygon Feature Class and Table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions