1

I am trying to use multiprocessing to help me run a tool that will clip a feature and sum an attribute in it. The tool needs to iterate through about 24,000 rows of a featureclass while doing this. I can run it without multiprocessing but it takes a long time (over 20 hours). Because of that, I am trying to use multiprocessing to get it to run faster, as I will likely have to use this type of tool again on larger sets of data. However, when I run it, it runs and finishes, but there is no output and when I look at the shapefiles created during it, the updatecursor has not run on them. I am hoping someone can help me code it better so that it works.

import arcpy
import os
import multiprocessing
#this toolbox is full of tools I've made, I use one of them later
arcpy.ImportToolbox( 
r'C:\Users\joe.chestnut\Documents\PythonScripts\JoesTools.pyt')
#geodatabase that my files are all in
arcpy.env.workspace = r"D:\Joe.Chestnut\Dallas\Dallas.gdb"
#This is the actual function that I need to run. 
def ClipCount(stuffpassed):
 #ranges are based on the oid. they stratify the data so that it can be multiprocessed. 
 ranges = stuffpassed[0]
 #this is my file with about 25,000 polygon features (service areas for census blocks in Dallas
 inputfile = stuffpassed[1]
 #This file is the census blocks in dallas, and has some job data in it
 inputfile2 = stuffpassed[2]
 print(ranges)
 print"RunningClipCount)"
 arcpy.env.workspace =r"D:\Joe.Chestnut\Dallas" 
 i, j = ranges[0], ranges[1]
 #This copies my file into a folder and creates 4 different copies of it, for the four different processors to run on
 infile = arcpy.management.CopyFeatures(inputfile, 'layer{0}'.format(i), """OID>= {0} AND OID <= {1}""".format(i, j))
 print(infile)
 #makes a layer out of my census blocks so that they can be geoprocessed
 cliplayer = arcpy.management.MakeFeatureLayer(inputfile2, "clipmedallas")
 print(cliplayer)
 #this cursor runs through each of the rows of the service areas file, and for each one will clip
 #the census block file. Using a searchcursor on the clipped output, it will then sum the values in the 'newpop' field
 #the summed values are then written into the 'jobcount' field of the service area file using the update cursor. 
 with arcpy.da.UpdateCursor(infile, ["Name", "jobcount"]) as cursor:
 print "updatecursorrunning"
 for row in cursor:
 value = row[0]
 field = "Name"
 exp = field+"='" + value + "'"
 infile3 = arcpy.MakeFeatureLayer_management(infile, "infile3")
 rowselect = arcpy.SelectLayerByAttribute_management(infile3, "NEW_SELECTION", exp)
 clipedlyr = arcpy.LEHDClipper(cliplayer, rowselect, "in_memory", "in_memory\clipedlyr")
 with arcpy.da.SearchCursor(clipedlyr, "newpop") as newcursor:
 stationpop = []
 for srow in newcursor:
 value = float(srow[0])
 stationpop.append(value)
 print(stationpop)
 totalpop= 0
 for num in stationpop:
 totalpop = totalpop + num
 print(totalpop)
 row[1] = totalpop
 cursor.updateRow(row)
 arcpy.Delete_management(clipedlyr)
#This is the main function where I give the inputs for the ClipCount and then actually run the multiprocessing module. 
def main():
 ranges = [[1, 6000], [6001, 12000], [12001, 18000], [18001, 24000]]
 inputfile =r'D:\Joe.Chestnut\Dallas\Dallas.gdb\gtfs\BlockAccess'
 inputfile2 = r'D:\Joe.Chestnut\Dallas\Dallas.gdb\gtfs\Dallas_1clip'
 stuffpassed = [ranges, inputfile, inputfile2]
 pool = multiprocessing.Pool(processes=4, initializer=None, maxtasksperchild = 1)
 result = pool.map_async(ClipCount, stuffpassed)
 pool.close()
 pool.join() 
 print result
if __name__ == '__main__':
 main()

When I run this, I get the following result:

<multiprocessing.pool.MapResult object at 0x1467CAD0>

It also creates four copies of inputfile as shapefiles. However, apart from this, I don't think know if it is doing anything.

Does anyone know what I am doing wrong?

How do I get the actual update cursor to run with the multiprocessing?

PolyGeo
65.5k29 gold badges115 silver badges349 bronze badges
asked Nov 2, 2017 at 20:39
6
  • Thanks! I fixed that. But it still doesn't function properly. Commented Nov 2, 2017 at 21:00
  • I dont know your data but why are you not using intersect? No need for iteration and clip Commented Nov 2, 2017 at 21:03
  • The polygons in the first input file are all overlapping. They are service areas made in network analyst. So I need to know what is underneath each of them. Intersect will output everything that's underneath any of them, instead of what is underneath each one separately. (I think) Commented Nov 2, 2017 at 21:19
  • 2
    First- The result is like that because you aren't returning anything from the processing function. What are you even expecting back? Second - In addition to that I tend to just use regular map, not async. Third - I might be wrong, but I do not think it is safe to be updating the same feature class with multiple instances of the task. Fourth - Can you use the Identity analysis function instead. Fifth - If that still doesn't work, use the built in geometry methods of disjoint, clip, and intersect instead of the tools. Commented Nov 2, 2017 at 22:54
  • I'm 95% sure Intersect will work even if the polygons in the first input overlap Commented Nov 3, 2017 at 8:06

2 Answers 2

3

My answer is not about resolving your multiprocessor issue but the general approach to count overlapping parts. I advised a method using spatial join in this answer to How to remove duplicate features with the same geometry in ArcMap? which uses ‘ARE_IDENTICAL’ operator with field mapping. In summary if you self union your input, it will keep all overlapping part then spatial join will find identical ones and tally them.

PolyGeo
65.5k29 gold badges115 silver badges349 bronze badges
answered Nov 3, 2017 at 1:36
0

Your multiprocessing call is all wrong which is the crux(or at least first part) of your issue. I have no idea how you're getting four copies of input file as it seems like you should only be making 3, one for each element of your malformed stuffpassed list.

Map_async should make a call for every element of the argument list so your current code should only ever be running clipcount 3 times(one for each element of your current Stuffpassed list). Its also failing because you're saying get oid between two ranges that are two lists not two numbers. So how to fix this:

Stuffpassed is not at all correctly formatted for a multiprocessing call map_async. If you want to keep the way you're parsing the variables inside of clipcount you need stuffpassed to be a list of lists with each element of the outer list being ALL of the variables you wish you pass to each call of clipcount.

This means you need stuffpassed to be empty and then you need to loop through every element of ranges to populate it.

 stuffpassed = list()
 for range in ranges:
 stuffpassed.append([range,infile1,infile2])

Now your multiprocessing call will, at least, be getting the variables it needs and that you think you're sending.

I'm also fairly certain copyfeatures doesn't return anything and you cannot set infile that way but I could be wrong about that.

answered Nov 9, 2017 at 20:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.