Before I start I should mention that before this I have never worked in Python before.
I am trying to process a large fc to fc update, 3 million plus rows with 70 columns. I tried using updatecursor with a searchcursor but performance was extremely slow, around 1 record per minute. I switched over to using an updatecursor with a dictionary, which hugely improved performance but as I am restricted to 32bit Python I can only process around 600000 rows at a time. To get around this I am splitting the fc into a smaller temporary fc and then running the dictionary update on these before deleting the temporary fc and dictionary and looping back through.
This works perfectly for the first iteration but then gives a memory error on the dictionary on the second iteration. I have tried forcing garbage collection to try and clear this issue but that didn't work.
I can only guess that there are still references open to the dictionary. Any insights and suggestions would be most welcome.
Here is the offending section of code
please excuse the print commands, they are only there for testing.
# start changes
# split changes into smaller feature classes
lyr = arcpy.mapping.Layer(sourceFc)
arcpy.SelectLayerByAttribute_management(lyr, "CLEAR_SELECTION")
fList = list()
with arcpy.da.SearchCursor(lyr, "OID@", "Change_Type = 'C'") as cursor:
for row in cursor:
fList.append(row[0])
listGroup = listSplit(fList, outputNum)
for x in listGroup:
lyr.setSelectionSet("NEW", x)
arcpy.CopyFeatures_management(lyr, outputFCName)# + str(nums))
#process changes
upDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(outputFCName, fields)}
print "Dictionary Created"
with arcpy.da.UpdateCursor(targetFc, fields) as updateRows:
for updateRow in updateRows:
keyValue = updateRow[0]
if keyValue in upDict:
for n in range (1,len(fields)):
updateRow[n] = upDict[keyValue][n-1]
ucount = ucount + 1
updateRows.updateRow(updateRow)
del(upDict)
gc.collect()
arcpy.Delete_management(outputFCName)
print "Processed " + str(ucount) + " records."
-
2upDict.clear() will reset the contents.klewis– klewis2019年02月21日 20:48:23 +00:00Commented Feb 21, 2019 at 20:48
-
3Just reduce the size of chunk.FelixIP– FelixIP2019年02月21日 21:59:33 +00:00Commented Feb 21, 2019 at 21:59
-
@klewis already tried that, but thanks for commenting.Magus– Magus2019年02月22日 10:18:19 +00:00Commented Feb 22, 2019 at 10:18
-
@FelixIP reducing the chunk size down to 250000 records per iteration allows it to complete. I increased this until I found the size at which it fails (500000). Maybe you could tell me why that should make a difference when the larger chunk size processed fine on one iteration. Is python slow releasing references/memory? or is it that the memory overflows due to the extra object overhead?Magus– Magus2019年02月22日 11:41:51 +00:00Commented Feb 22, 2019 at 11:41
1 Answer 1
@FelixIP - Just reduce the size of chunk
reducing the chunk size down to 250000 records per iteration allows it to complete. I increased this until I found the size at which it fails (500000).