12

I am interested in learning methods to utilize the full extent of multicore processing power available on a desktop computer. Arc states that background geoprocessing allows the user to utilize multiple cores, however, tasks essentially have to wait in line for the previous task to be completed.

Has anyone developed parallel or multithreaded geoprocessing methods in Arc/Python? Are there hardware bottlenecks that prevent multicore processing on individual tasks?

I found an interesting example in Stackoverflow that caught my interest, although it is not a geoprocessing example:

from multiprocessing import Pool
import numpy
numToFactor = 976
def isFactor(x):
 result = None
 div = (numToFactor / x)
 if div*x == numToFactor:
 result = (x,div)
 return result
if __name__ == '__main__':
 pool = Pool(processes=4)
 possibleFactors = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1)
 print 'Checking ', possibleFactors
 result = pool.map(isFactor, possibleFactors)
 cleaned = [x for x in result if not x is None]
 print 'Factors are', cleaned
PolyGeo
65.5k29 gold badges115 silver badges350 bronze badges
asked Jun 26, 2012 at 17:03
3
  • 1
    In my Arc experience, it almost always boils down to either 1) splitting your data up into {number of core} chunks, processing and reassembling or 2) reading everything into memory and letting x API handle the threading. note that this is not meant to discourage. Commented Jun 26, 2012 at 18:35
  • Thanks valveLondon. Perhaps newer Ivy Bridge technology and the Kepler GPU will allow for more sophisticated processing approaches. Commented Jun 27, 2012 at 3:04
  • Here's a link to a useful blog about python multiprocessing from a product engineer on ESRIs Analysis and Geoprocessing team. blogs.esri.com/esri/arcgis/2011/08/29/multiprocessing Commented Jun 27, 2012 at 4:23

2 Answers 2

11

Here is an example of a multicore arcpy script. The process is very CPU-intensive so it scales very well: Porting Avenue code for Producing Building Shadows to ArcPy/Python for ArcGIS Desktop?

Some more general info in this answer: Can concurrent processes be run in a single model?

answered Jun 26, 2012 at 17:10
0
11

In my experience the biggest problem is managing stability. If you do six weeks of processing in a single night you will also have six weeks of inexplicable errors and bugs.

An alternative approach is to develop standalone scripts that can run independently and fail without causing problems:

  • Split the data into chunks that a single core can process in <20 minutes (tasks).
  • Build a standalone Arcpy script that can process a single task and is as simple as possible (worker).
  • Develop a mechanism to run tasks. Lots of pre-existing python solutions exist. Alternatively you can make your own with a simple queue.
  • Write some code to verify that tasks have been completed. This could be as simple as checking that an output file has been written.
  • Merge data back together.
answered Jun 26, 2012 at 22:14
1
  • 1
    I've found that this approach, which can include using the multiprocessing module, is a good one - some extensions, such as spatial analyst, don't work very well if you have multiple copies of the same function running simultaneously, so something like what you describe that allows for a user-controlled form of queuing (ie, avoids scheduling those tasks at the same time or avoids using the same geodatabase at once for file locking reasons) is going to be best. Commented Jul 3, 2012 at 19:33

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.