I have a list of files that need to be preprocessed using just one command before being mosaicked together. This preprocessing command uses third-party software via system call to write to a geoTIFF. I wanted to use multi-threading so that the individual files can be pre-processed at the same time and then, once all individual files are processed, the results can be mosaicked together.
I have never used multi-threading/parallel processing before, and after hours of searching on the internet, I still have no clue what the best, simplest way to go about this is.
Basically, something like this:
files_list = # list of .tif files that need to be mosaicked together but first, need to be individually pre-processed
for tif_file in files_list:
# kick the pre-processing step out to the system, but don't wait for it to finish before moving to preprocess the next tif_file
# wait for all tiffs in files_list to finish pre-processing
# then mosaick together
How could I achieve this?
-
What is the output of the pre-processing?Open AI - Opting Out– Open AI - Opting Out2016年09月22日 15:45:56 +00:00Commented Sep 22, 2016 at 15:45
-
Any reason this task should be parallelized? Doing these files one after another would be definetely much faster (except few special cases) due to overhead python has for multithreading.Tomasz Plaskota– Tomasz Plaskota2016年09月22日 15:49:56 +00:00Commented Sep 22, 2016 at 15:49
-
@PeterWood the output of the pre-processing step are geoTIFFs that I need to mosaic togetheruser20408– user204082016年09月22日 15:51:20 +00:00Commented Sep 22, 2016 at 15:51
-
Are geoTIFFs files or in memory?Open AI - Opting Out– Open AI - Opting Out2016年09月22日 15:52:10 +00:00Commented Sep 22, 2016 at 15:52
-
@TomaszPlaskota Well, the purpose was to make the code faster, hah. Can you explain in more detail? How do you know that that is the case? Thxuser20408– user204082016年09月22日 15:52:41 +00:00Commented Sep 22, 2016 at 15:52
2 Answers 2
See the multiprocessing documentation.
from multiprocessing import Pool
def main():
pool = Pool(processes=8)
pool.map(pre_processing_command, files_list)
mosaic()
if __name__ == '__main__':
main()
Comments
if you need to use multiple processor cores you should use multiprocess, in the most simple case you can use something like:
def process_function(tif_file):
... your processing code here ...
for tif_file in files_list:
p = Process(target=process_function, args=(tif_file))
p.start()
p.join()
You need to take care, because so many process running at same time can overpass the PC resources, you can look here and here for solutions to the problem.
You can also use threading.thread, but it uses only one processor core, and is restricted by the Global Interpreter Lock