4

Before I start I should mention that before this I have never worked in Python before.

I am trying to process a large fc to fc update, 3 million plus rows with 70 columns. I tried using updatecursor with a searchcursor but performance was extremely slow, around 1 record per minute. I switched over to using an updatecursor with a dictionary, which hugely improved performance but as I am restricted to 32bit Python I can only process around 600000 rows at a time. To get around this I am splitting the fc into a smaller temporary fc and then running the dictionary update on these before deleting the temporary fc and dictionary and looping back through.

This works perfectly for the first iteration but then gives a memory error on the dictionary on the second iteration. I have tried forcing garbage collection to try and clear this issue but that didn't work.

I can only guess that there are still references open to the dictionary. Any insights and suggestions would be most welcome.

Here is the offending section of code

please excuse the print commands, they are only there for testing.

# start changes
# split changes into smaller feature classes
lyr = arcpy.mapping.Layer(sourceFc)
arcpy.SelectLayerByAttribute_management(lyr, "CLEAR_SELECTION")
fList = list()
with arcpy.da.SearchCursor(lyr, "OID@", "Change_Type = 'C'") as cursor:
 for row in cursor:
 fList.append(row[0])
listGroup = listSplit(fList, outputNum)
for x in listGroup:
 lyr.setSelectionSet("NEW", x)
 arcpy.CopyFeatures_management(lyr, outputFCName)# + str(nums))
 #process changes
 upDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(outputFCName, fields)}
 print "Dictionary Created"
 with arcpy.da.UpdateCursor(targetFc, fields) as updateRows: 
 for updateRow in updateRows: 
 keyValue = updateRow[0] 
 if keyValue in upDict: 
 for n in range (1,len(fields)): 
 updateRow[n] = upDict[keyValue][n-1]
 ucount = ucount + 1
 updateRows.updateRow(updateRow) 
 del(upDict)
 gc.collect()
 arcpy.Delete_management(outputFCName)
 print "Processed " + str(ucount) + " records."
PolyGeo
65.5k29 gold badges115 silver badges349 bronze badges
asked Feb 21, 2019 at 20:18
4
  • 2
    upDict.clear() will reset the contents. Commented Feb 21, 2019 at 20:48
  • 3
    Just reduce the size of chunk. Commented Feb 21, 2019 at 21:59
  • @klewis already tried that, but thanks for commenting. Commented Feb 22, 2019 at 10:18
  • @FelixIP reducing the chunk size down to 250000 records per iteration allows it to complete. I increased this until I found the size at which it fails (500000). Maybe you could tell me why that should make a difference when the larger chunk size processed fine on one iteration. Is python slow releasing references/memory? or is it that the memory overflows due to the extra object overhead? Commented Feb 22, 2019 at 11:41

1 Answer 1

1

@FelixIP - Just reduce the size of chunk

reducing the chunk size down to 250000 records per iteration allows it to complete. I increased this until I found the size at which it fails (500000).

answered Feb 25, 2019 at 14:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.