8

I have been tinkering around with the multiprocessing module to resolve some processing of data in ArcMap 10.3.

I have created a script which works fine if I run it in IDLE. I see all my cores max out in Task Manager and the code completes without error.

Now if I wire this script up as a Script Tool in ArcToolbox it throws a weird error

Could not find file: from multiprocessing.forking import main; main().mxd

Now reading various threads I regularly see that the script has to be "run out of process". So I unticked the check box "run python in process" in the script properties dialog and was expecting the code to run but it does not, I get a 000714 error and I've done nothing to the code.

So my question is simply can one create a script that uses the multiprocessing module and run it from ArcToolbox? From my limited playing around, it seems I cannot and its a technique that can only be run from IDLE?


I have finally got it working, using every ones help. I decided to fully document my simple example and bring together any of the trip ups I had to overcome in a document on the ESRI GeoNet website. Like Luke's answer I hope this provides a starting point for anyone trying to create a python script tool that utilises multiprocessing. The document is titled Create a script tool that uses multiprocessing.

PolyGeo
65.5k29 gold badges115 silver badges349 bronze badges
asked Mar 26, 2015 at 23:35
9
  • ArcGis is still a single thread application. All Esri objects are not thread safe. That doesn't mean you can't multiprocess but all objects must stay on their own thread, you can serialize objects and pass them as XML or string etc.. to each thread but sub threads can't for example access 'map'. Commented Mar 27, 2015 at 2:18
  • @Hornbydd, do you have access to ArcGIS Pro? Useful info on multiprocessing there: blogs.esri.com/esri/arcgis/2015/02/06/…. For ArcGIS - blogs.esri.com/esri/arcgis/2011/08/29/multiprocessing and blogs.esri.com/esri/arcgis/2012/09/26/… Commented Mar 27, 2015 at 7:20
  • 1
    @MichaelMiles-Stimson, I'm not doing anything fancy, in fact my code is very similar to the second example in the second link that Alex Tereshenkov gave. I've yet to see (may be I have missed it?) a statement that says you cannot create a script using multiprocessing and run it as a script tool from ArcToolbox. I just see these statements that say it must be run out of process which frankly doesn't mean much to me. So can multiprocessing scripts using arcpy be run from ArcToolbox or not? I think the answer is going to be no? :( Commented Mar 27, 2015 at 10:23
  • 2
    You can use multiprocessing. Just make sure each process has it's own separate scratch and work space and you use multiprocessing.set_executable to set python.exe (or pythonw.exe) as the executable instead of ArcMap. Commented Mar 27, 2015 at 10:24
  • 1
    Forgot to mention... Make sure all code is either in a function/class or inside a if__name__=="__main__": block as top level code will be executed each time the module imports itself. Commented Mar 27, 2015 at 10:48

1 Answer 1

13

Yes, you can run multiprocessing child processes from a toolbox script. Below is some code to demonstrate in a Python Toolbox (*.pyt).

There are a number of "gotchas". Some (but not all) will be applicable to Python script tools in a binary toolbox (*.tbx), but I only use Python Toolboxes these days so have not tested.

Some "gotchas"/tips:

  • Make sure each child process has it's own workspace (particularly if using Spatial Analyst or coverage tools) for temporary files and only use the main process to allocate the work, gather up the results and write out the final results. This is so any writing to a final dataset is done by one process avoiding any locking or conflicts;
  • Only pass pickleable objects, such as strings, lists, numbers;
  • Any functions you want to run as a child process MUST be in an importable module (with a different filename to the .pyt), not the .pyt itself. Otherwise you'll get PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed;
  • Make sure the code that executes the multiprocessing is in an importable module (with a different filename to the .pyt), not the .pyt itself. Otherwise you'll get AssertionError: main_name not in sys.modules, main_name

Applies to *.py scripts in a binary toolbox (*.tbx):

  • Make sure the code that parses the script parameters is protected by a if __name__ == '__main__': block.
  • You can keep the functions you want to call in the .py script, but you need to import the script to itself.

Example code

Python Toolbox

# test_multiprocessing.pyt
import os, sys
import multiprocessing
import arcpy
# Any functions you want to run as a child process MUST be in
# an importable module. A *.pyt is not importable by python
# Otherwise you'll get 
# PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
# Also make sure the code that _does_ the multiprocessing is in an importable module
# Otherwise you'll get 
# AssertionError: main_name not in sys.modules, main_name
from test_multiprocessing_functions import execute
class Toolbox(object):
 def __init__(self):
 '''Define toolbox properties (the toolbox name is the .pyt filename).'''
 self.label = 'Test Multiprocessing'
 self.alias = 'multiprocessing'
 # List of tool classes associated with this toolbox
 self.tools = [TestTool]
class TestTool(object):
 def __init__(self):
 self.label = 'Test Multiprocessing'
 self.description = 'Test Multiprocessing Tool'
 self.canRunInBackground = True
 self.showCommandWindow = False
 def isLicensed(self):
 return True
 def updateParameters(self, parameters):
 return
 def updateMessages(self, parameters):
 return
 def getParameterInfo(self):
 '''parameter definitions for GUI'''
 return [arcpy.Parameter(displayName='Input Rasters',
 name='in_rasters',
 datatype='DERasterDataset',
 parameterType='Required',
 direction='Input',
 multiValue=True)]
 def execute(self, parameters, messages):
 # Make sure the code that _does_ the multiprocessing is in an importable module, not a .pyt
 # Otherwise you'll get 
 # AssertionError: main_name not in sys.modules, main_name
 rasters = parameters[0].valueAsText.split(';')
 for raster in rasters:
 messages.addMessage(raster)
 execute(*rasters)

Importable Python Module (can also be used as a script tool in a binary toolbox (.tbx)

#test_multiprocessing_functions.py
# - Always run in foreground - unchecked
# - Run Python script in process - checked
import os, sys, tempfile
import multiprocessing
import arcpy
from arcpy.sa import *
def execute(*rasters):
 for raster in rasters:
 arcpy.AddMessage(raster)
 #Set multiprocessing exe in case we're running as an embedded process, i.e ArcGIS
 #get_install_path() uses a registry query to figure out 64bit python exe if available
 multiprocessing.set_executable(os.path.join(get_install_path(), 'pythonw.exe'))
 #Create a pool of workers, keep one cpu free for surfing the net.
 #Let each worker process only handle 10 tasks before being restarted (in case of nasty memory leaks)
 pool = multiprocessing.Pool(processes=multiprocessing.cpu_count() - 1, maxtasksperchild=10)
 # Simplest multiprocessing is to map an iterable (i.e. a list of things to process) to a function
 # But this doesn't allow you to handle exceptions in a single process
 ##output_rasters = pool.map(worker_function, rasters)
 # Use apply_async instead so we can handle exceptions gracefully
 jobs={}
 for raster in rasters:
 jobs[raster]=pool.apply_async(worker_function, [raster]) # args are passed as a list
 for raster,result in jobs.iteritems():
 try:
 result = result.get()
 arcpy.AddMessage(result)
 except Exception as e:
 arcpy.AddWarning('{}\n{}'.format(raster, repr(e)))
 pool.close()
 pool.join()
def get_install_path():
 ''' Return 64bit python install path from registry (if installed and registered),
 otherwise fall back to current 32bit process install path.
 '''
 if sys.maxsize > 2**32: return sys.exec_prefix #We're running in a 64bit process
 #We're 32 bit so see if there's a 64bit install
 path = r'SOFTWARE\Python\PythonCore2円.7'
 from _winreg import OpenKey, QueryValue
 from _winreg import HKEY_LOCAL_MACHINE, KEY_READ, KEY_WOW64_64KEY
 try:
 with OpenKey(HKEY_LOCAL_MACHINE, path, 0, KEY_READ | KEY_WOW64_64KEY) as key:
 return QueryValue(key, "InstallPath").strip(os.sep) #We have a 64bit install, so return that.
 except: return sys.exec_prefix #No 64bit, so return 32bit path
def worker_function(in_raster):
 ''' Make sure you pass a filepath to raster, NOT an arcpy.sa.Raster object'''
 ## Example "real" work" (untested)
 ## Make a unique scratch workspace
 #scratch = tempfile.mkdtemp()
 #out_raster = os.path.join(scratch, os.path.basename(in_raster))
 #arcpy.env.workspace = scratch
 #arcpy.env.scratchWorkspace=scratch
 #ras = Raster(in_raster)
 #result = Con(IsNull(ras), FocalStatistics(ras), ras)
 #result.save(out_raster)
 #del ras, result
 #return out_raster # leave calling script to clean up tempdir.
 # could also pass out_raster in as a arg,
 # but you'd have to ensure no other child processes
 # are writing the that dir when the current
 # child process is...
 # Do some "fake" work
 import time, random
 time.sleep(random.randint(0,20)/10.0) #sleep for a bit to simulate work
 return in_raster[::-1] #Return a reversed version of what was passed in
if __name__=='__main__':
 # import current script to avoid:
 # PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
 import test_multiprocessing_functions
 rasters = arcpy.GetParameterAsText(0).split(';')
 for raster in rasters:
 arcpy.AddMessage(raster)
 test_multiprocessing_functions.execute(*rasters)
answered Mar 30, 2015 at 2:58
9
  • Luke, thanks for the code template. I've implemented your structure as a script tool so not as a python toolbox (.pyt) and it does not work. If I untick "run in process" it instantly bombs with a 000714 error, way to quickly for it to have even begun to run the code, so I'm not sure if that is a bug in 10.3. I don't think its the code that is the problem. So going to create a pyt version and get that running (which I expect it will). I'm fast coming to the conclusion that you can't run multiprocessing module in a Script tool but you can in say a pyt or just IDLE. Commented Mar 30, 2015 at 11:33
  • @Hornbydd, I've edited the example code and added a bit to the 2nd code block which works fine for me when run as a script tool in a tbx toolbox. Note: I'm running 10.2.2 so perhaps there is an issue in 10.3? Commented Mar 30, 2015 at 22:05
  • @Hornbydd I can replicate your 000714 error running as a script tool in a .tbx if I have a syntax error in the code. Make sure your script runs outside of ArcMap. Commented Mar 30, 2015 at 23:47
  • @Hornbydd re. 000714 see also gis.stackexchange.com/q/108478/2856 Commented Mar 31, 2015 at 0:42
  • 1
    @MrKingsley I haven't used this code in years, I use concurrent.futures or occasionally dask for multiprocessing and work more with geopandas/rasterio than arcpy. That said, a colleague uses multiprocessing in ArcGIS Pro 2.9 successfully. Suggest you ask a new question with a minimal runnable example. Commented Jun 19, 2024 at 22:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.