I'm attempting to update a numeric field using an arcpy.da.UpdateCursor
. I'm applying the update to a Feature Layer and Table, which participate in a One to Many Relationship. The Feature layer is quite large, with 1,219,495 Polygon records. The table has even more records at 4,735,679 rows.
The update should only be applied if certain conditions are met. If the Child Table's Provider
field has a value of "Optimum", it needs updated. If the Parent Feature's MaxDownload
field is 1,000, it needs updated. I'm doing the queries first via SearchCursor
, and saving the relevant IDs that satisfy the query. Then, in an edit session, I'm attempting to update the relevant fields using a where_clause
to select the OIDs I need.
import datetime as dt
import arcpy
def log_it(msg):
print(f"[{dt.datetime.now()}]: {msg}")
return
def query_str_in_list(field_name, values):
vals = "'" + "', '".join(values) + "'"
return f"{field_name} IN ({vals})"
def query_int_in_list(field_name, values):
vals = ", ".join([str(v) for v in values])
return f"{field_name} IN ({vals})"
def main():
start_time = dt.datetime.now()
print(f"Starting script at {start_time}")
arcpy.env.workspace = r"C:\dev\arcgis\NYS-DPS\NysDpsBroadbandMap2022円-06-01_Analysis2022円-06-01_Analysis\NysDpsBroadbandMap.gdb"
global_ids = []
provider_oids = []
with arcpy.da.SearchCursor(
"StandardProviders",
["ParentGlobalID", "OBJECTID"],
where_clause="Provider = 'Optimum'",
) as search_cursor:
for row in search_cursor:
global_ids.append(row[0])
provider_oids.append(row[1])
polygon_oids = [
row[0]
for row in arcpy.da.SearchCursor(
"StandardSummaryPolygons",
["OBJECTID"],
where_clause=query_str_in_list("GlobalID", global_ids)
+ " AND MaxDownload = 1000",
)
]
log_it(
f"Beginning to edit {len(provider_oids)} Provider Records and {len(polygon_oids)} Polygon Records"
)
with arcpy.da.Editor(arcpy.env.workspace) as edit_session:
with arcpy.da.UpdateCursor(
"StandardSummaryPolygons",
["OID@", "MaxDownload"],
where_clause=query_int_in_list("OBJECTID", polygon_oids),
) as polygon_update_cursor:
for idx, row in enumerate(polygon_update_cursor):
log_it(f"{idx} Polygon Iterations")
oid = row[0]
if oid in polygon_oids:
polygon_update_cursor.updateRow([oid, 940])
with arcpy.da.UpdateCursor(
"StandardProviders",
["OID@", "Download"],
where_clause=query_int_in_list("OBJECTID", provider_oids),
) as provider_update_cursor:
for idx, row in enumerate(provider_update_cursor):
log_it(f"{idx} Provider Iterations")
oid = row[0]
if oid in provider_oids:
provider_update_cursor.updateRow([oid, 940])
end_time = dt.datetime.now()
duration = end_time - start_time
print(f"Finished at {end_time} after executing for {duration}")
return
if __name__ == "__main__":
main()
The script is going quite slow, and it appears that the Polygon Feature Class is the root cause. When iterating through the cursor, each iteration is taking 2 seconds. Iterating through the table gives the performance I would expect.
Here's a log snippet:
## Parent Table Log (Polygon Features)
Starting script at 2022年07月20日 14:13:51.937132
[2022年07月20日 14:13:59.535107]: Beginning to edit 276149 Provider Records and 212815 Polygon Records
[2022年07月20日 14:13:59.801700]: 0 Polygon Iterations
[2022年07月20日 14:14:01.015486]: 1 Polygon Iterations
[2022年07月20日 14:14:02.363565]: 2 Polygon Iterations
[2022年07月20日 14:14:03.780736]: 3 Polygon Iterations
## Child Table Log (Table Rows)
[2022年07月20日 14:19:10.037031]: 6740 Provider Iterations
[2022年07月20日 14:19:10.037031]: 6741 Provider Iterations
[2022年07月20日 14:19:10.038029]: 6742 Provider Iterations
[2022年07月20日 14:19:10.038029]: 6743 Provider Iterations
Am I simply running into physical limits due to the volume of the data? Are there any performance gains that could be implemented here?
1 Answer 1
I believe this was a case of a corrupted file geodatabase. While interacting with @Vince in the comments, we uncovered that the first ~1700 records were performing normally. After that, things went haywire.
I tried to copy/paste the layers into a new file geodatabase, which has helped in the past. That did not work here. I went back to the FGDB source and got a fresh copy of the data, and performance returned to normal.
FWIW, this FGDB has been "Compressed" then later "Uncompressed", which probably complicates things.
-
Compress/Uncompress shouldn't make a difference. Just Compress might make it faster (lower I/O load). Compact and/or Reindex might be able to correct this sort of issue.Vince– Vince2022年07月20日 20:25:36 +00:00Commented Jul 20, 2022 at 20:25
-
I had to Uncompress since Compressed makes a fgbd read-only. I'll have to try Compact, that makes a lot of sense. A reindex did not seem to help.vitale232– vitale2322022年07月20日 20:44:38 +00:00Commented Jul 20, 2022 at 20:44
-
Compact won't help a brand-new FGDB downloaded from AGOL.Vince– Vince2022年07月20日 21:41:11 +00:00Commented Jul 20, 2022 at 21:41
Explore related questions
See similar questions with these tags.
IN
list could be slowing down the query; the existingupdateRow
logic would let you go without it. You could see a 50% performance improvement if the source is switched from a network share to a local disk. If you're assigning a constant, using a selection set and a Calculate Field might be faster.where_clause
from theUpdateCursor
, I get the expected performance on the first 1,728 records. Then it starts to slow down to 2s again. Quite strange.SearchCursor
that's in the example above. Would iterating over all of the rows in anUpdateCursor
work?