2

I'm attempting to update a numeric field using an arcpy.da.UpdateCursor. I'm applying the update to a Feature Layer and Table, which participate in a One to Many Relationship. The Feature layer is quite large, with 1,219,495 Polygon records. The table has even more records at 4,735,679 rows.

The update should only be applied if certain conditions are met. If the Child Table's Provider field has a value of "Optimum", it needs updated. If the Parent Feature's MaxDownload field is 1,000, it needs updated. I'm doing the queries first via SearchCursor, and saving the relevant IDs that satisfy the query. Then, in an edit session, I'm attempting to update the relevant fields using a where_clause to select the OIDs I need.

import datetime as dt
import arcpy
def log_it(msg):
 print(f"[{dt.datetime.now()}]: {msg}")
 return
def query_str_in_list(field_name, values):
 vals = "'" + "', '".join(values) + "'"
 return f"{field_name} IN ({vals})"
def query_int_in_list(field_name, values):
 vals = ", ".join([str(v) for v in values])
 return f"{field_name} IN ({vals})"
def main():
 start_time = dt.datetime.now()
 print(f"Starting script at {start_time}")
 arcpy.env.workspace = r"C:\dev\arcgis\NYS-DPS\NysDpsBroadbandMap2022円-06-01_Analysis2022円-06-01_Analysis\NysDpsBroadbandMap.gdb"
 global_ids = []
 provider_oids = []
 with arcpy.da.SearchCursor(
 "StandardProviders",
 ["ParentGlobalID", "OBJECTID"],
 where_clause="Provider = 'Optimum'",
 ) as search_cursor:
 for row in search_cursor:
 global_ids.append(row[0])
 provider_oids.append(row[1])
 polygon_oids = [
 row[0]
 for row in arcpy.da.SearchCursor(
 "StandardSummaryPolygons",
 ["OBJECTID"],
 where_clause=query_str_in_list("GlobalID", global_ids)
 + " AND MaxDownload = 1000",
 )
 ]
 log_it(
 f"Beginning to edit {len(provider_oids)} Provider Records and {len(polygon_oids)} Polygon Records"
 )
 with arcpy.da.Editor(arcpy.env.workspace) as edit_session:
 with arcpy.da.UpdateCursor(
 "StandardSummaryPolygons",
 ["OID@", "MaxDownload"],
 where_clause=query_int_in_list("OBJECTID", polygon_oids),
 ) as polygon_update_cursor:
 for idx, row in enumerate(polygon_update_cursor):
 log_it(f"{idx} Polygon Iterations")
 oid = row[0]
 if oid in polygon_oids:
 polygon_update_cursor.updateRow([oid, 940])
 with arcpy.da.UpdateCursor(
 "StandardProviders",
 ["OID@", "Download"],
 where_clause=query_int_in_list("OBJECTID", provider_oids),
 ) as provider_update_cursor:
 for idx, row in enumerate(provider_update_cursor):
 log_it(f"{idx} Provider Iterations")
 oid = row[0]
 if oid in provider_oids:
 provider_update_cursor.updateRow([oid, 940])
 end_time = dt.datetime.now()
 duration = end_time - start_time
 print(f"Finished at {end_time} after executing for {duration}")
 return
if __name__ == "__main__":
 main()

The script is going quite slow, and it appears that the Polygon Feature Class is the root cause. When iterating through the cursor, each iteration is taking 2 seconds. Iterating through the table gives the performance I would expect.

Here's a log snippet:

## Parent Table Log (Polygon Features)
Starting script at 2022年07月20日 14:13:51.937132
[2022年07月20日 14:13:59.535107]: Beginning to edit 276149 Provider Records and 212815 Polygon Records
[2022年07月20日 14:13:59.801700]: 0 Polygon Iterations
[2022年07月20日 14:14:01.015486]: 1 Polygon Iterations
[2022年07月20日 14:14:02.363565]: 2 Polygon Iterations
[2022年07月20日 14:14:03.780736]: 3 Polygon Iterations
## Child Table Log (Table Rows)
[2022年07月20日 14:19:10.037031]: 6740 Provider Iterations
[2022年07月20日 14:19:10.037031]: 6741 Provider Iterations
[2022年07月20日 14:19:10.038029]: 6742 Provider Iterations
[2022年07月20日 14:19:10.038029]: 6743 Provider Iterations

Am I simply running into physical limits due to the volume of the data? Are there any performance gains that could be implemented here?

asked Jul 20, 2022 at 18:25
6
  • The IN list could be slowing down the query; the existing updateRow logic would let you go without it. You could see a 50% performance improvement if the source is switched from a network share to a local disk. If you're assigning a constant, using a selection set and a Calculate Field might be faster. Commented Jul 20, 2022 at 18:42
  • Interestingly, if I remove the where_clause from the UpdateCursor, I get the expected performance on the first 1,728 records. Then it starts to slow down to 2s again. Quite strange. Commented Jul 20, 2022 at 18:58
  • Thanks for the tip. I think this led me to my answer. Commented Jul 20, 2022 at 19:43
  • If the IN list is 6700+ features, the SQL query is likely to be really inefficient (Oracle won't let the number of terms exceed 2000). Commented Jul 20, 2022 at 20:33
  • The production data actually lives in an AGOL Hosted Feature Layer. In the script, I have everything batched into chunks of 1,000 to try and avoid REST issues, including the list comprehension SearchCursor that's in the example above. Would iterating over all of the rows in an UpdateCursor work? Commented Jul 20, 2022 at 21:14

1 Answer 1

1

I believe this was a case of a corrupted file geodatabase. While interacting with @Vince in the comments, we uncovered that the first ~1700 records were performing normally. After that, things went haywire.

I tried to copy/paste the layers into a new file geodatabase, which has helped in the past. That did not work here. I went back to the FGDB source and got a fresh copy of the data, and performance returned to normal.

FWIW, this FGDB has been "Compressed" then later "Uncompressed", which probably complicates things.

answered Jul 20, 2022 at 19:48
3
  • Compress/Uncompress shouldn't make a difference. Just Compress might make it faster (lower I/O load). Compact and/or Reindex might be able to correct this sort of issue. Commented Jul 20, 2022 at 20:25
  • I had to Uncompress since Compressed makes a fgbd read-only. I'll have to try Compact, that makes a lot of sense. A reindex did not seem to help. Commented Jul 20, 2022 at 20:44
  • Compact won't help a brand-new FGDB downloaded from AGOL. Commented Jul 20, 2022 at 21:41

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.