I am interested in learning methods to utilize the full extent of multicore processing power available on a desktop computer. Arc states that background geoprocessing allows the user to utilize multiple cores, however, tasks essentially have to wait in line for the previous task to be completed.
Has anyone developed parallel or multithreaded geoprocessing methods in Arc/Python? Are there hardware bottlenecks that prevent multicore processing on individual tasks?
I found an interesting example in Stackoverflow that caught my interest, although it is not a geoprocessing example:
from multiprocessing import Pool
import numpy
numToFactor = 976
def isFactor(x):
result = None
div = (numToFactor / x)
if div*x == numToFactor:
result = (x,div)
return result
if __name__ == '__main__':
pool = Pool(processes=4)
possibleFactors = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1)
print 'Checking ', possibleFactors
result = pool.map(isFactor, possibleFactors)
cleaned = [x for x in result if not x is None]
print 'Factors are', cleaned
2 Answers 2
Here is an example of a multicore arcpy script. The process is very CPU-intensive so it scales very well: Porting Avenue code for Producing Building Shadows to ArcPy/Python for ArcGIS Desktop?
Some more general info in this answer: Can concurrent processes be run in a single model?
In my experience the biggest problem is managing stability. If you do six weeks of processing in a single night you will also have six weeks of inexplicable errors and bugs.
An alternative approach is to develop standalone scripts that can run independently and fail without causing problems:
- Split the data into chunks that a single core can process in <20 minutes (tasks).
- Build a standalone Arcpy script that can process a single task and is as simple as possible (worker).
- Develop a mechanism to run tasks. Lots of pre-existing python solutions exist. Alternatively you can make your own with a simple queue.
- Write some code to verify that tasks have been completed. This could be as simple as checking that an output file has been written.
- Merge data back together.
-
1I've found that this approach, which can include using the multiprocessing module, is a good one - some extensions, such as spatial analyst, don't work very well if you have multiple copies of the same function running simultaneously, so something like what you describe that allows for a user-controlled form of queuing (ie, avoids scheduling those tasks at the same time or avoids using the same geodatabase at once for file locking reasons) is going to be best.nicksan– nicksan2012年07月03日 19:33:39 +00:00Commented Jul 3, 2012 at 19:33
Explore related questions
See similar questions with these tags.
this is not meant to discourage
.