Better way to set number of threads used by NumPy

Question 1

Background

When NumPy is linked against multithreaded implementations of BLAS (like MKL or OpenBLAS), the computationally intensive parts of a program run on multiple cores (sometimes all cores) automatically.

This is bad when:

you are sharing resources
you know of a better way to parallelize your program.

In these cases it is reasonable to restrict the number of threads used by MKL/OpenBLAS to 1, and parallelize your program manually.

My solution below involves loading the libraries at runtime and calling the corresponding C functions from Python.

Questions

Are there any best/better practices in solving this problem?
What are the pitfalls of my approach?
Please comment on code quality in general.

Example of use

import numpy
# this uses however many threads MKL/OpenBLAS uses
result = numpy.linalg.svd(matrix) 
# this uses one thread
with single_threaded(numpy):
 result = numpy.linalg.svd(matrix)

Implementation

Imports and definitions

import subprocess
import re
import sys
import os
import glob
import warnings
import ctypes
MKL = 'mkl'
OPENBLAS = 'openblas'

Class BLAS, abstracting a BLAS library with methods to get and set the number of threads:

class BLAS:
 def __init__(self, cdll, kind):
 if kind not in (MKL, OPENBLAS):
 raise ValueError(f'kind must be {MKL} or {OPENBLAS}, got {kind} instead.')
 self.kind = kind
 self.cdll = cdll
 if kind == MKL:
 self.get_n_threads = cdll.MKL_Get_Max_Threads
 self.set_n_threads = cdll.MKL_Set_Num_Threads
 else:
 self.get_n_threads = cdll.openblas_get_num_threads
 self.set_n_threads = cdll.openblas_set_num_threads

Function get_blas, returning a BLAS object given an imported NumPy module.

def get_blas(numpy_module):
 LDD = 'ldd'
 LDD_PATTERN = r'^\t(?P<lib>.*{}.*) => (?P<path>.*) \(0x.*$'
 NUMPY_PATH = os.path.join(numpy_module.__path__[0], 'core')
 MULTIARRAY_PATH = glob.glob(os.path.join(NUMPY_PATH, 'multiarray.*so'))[0]
 ldd_result = subprocess.run(
 args=[LDD, MULTIARRAY_PATH], 
 check=True,
 stdout=subprocess.PIPE, 
 universal_newlines=True
 )
 output = ldd_result.stdout
 if MKL in output:
 kind = MKL
 elif OPENBLAS in output:
 kind = OPENBLAS
 else:
 return None
 pattern = LDD_PATTERN.format(kind)
 match = re.search(pattern, output, flags=re.MULTILINE)
 if match:
 lib = ctypes.CDLL(match.groupdict()['path'])
 return BLAS(lib, kind)
 else:
 return None

Context manager single_threaded, that takes an imported NumPy module, sets number of threads to 1 on enter, resets to previous value on exit.

class single_threaded:
 def __init__(self, numpy_module):
 self.blas = get_blas(numpy_module)
 def __enter__(self):
 if self.blas is not None:
 self.old_n_threads = self.blas.get_n_threads()
 self.blas.set_n_threads(1)
 else:
 warnings.warn(
 'No MKL/OpenBLAS found, assuming NumPy is single-threaded.'
 )
 def __exit__(self, *args):
 if self.blas is not None:
 self.blas.set_n_threads(self.old_n_threads)
 if self.blas.get_n_threads() != self.old_n_threads:
 message = (
 f'Failed to reset {self.blas.kind} '
 f'to {self.old_n_threads} threads (previous value).'
 )
 raise RuntimeError(message)

Question 2

Could you add the definitions for MKL and OPENBLAS, please?

Question 3

To comment on my recent edit (changing "multiarray*.so" to "multiarray.*so" in line 7 of get_blas): this was done to avoid collision with multiarray_tests.architecture_info.so which is sometimes present in numpy/core.

Question 4

What are the pitfalls of my approach?

Calling the functions setting the maximum number of threads to use change a global state inside the respective libraries which makes the decorator not thread safe. When you use the numpy functions from several different threads, the threads not using your decorator might also get a single threaded implementation called.

AFAIK there is no good way to solve this. OpenBLAS FAQ even states that you should disable threads in OpenBLAS if your application uses threads itself.

Zbyněk Winkler Zbyněk Winkler 1212 bronze badges · Answer 1 · 2020-04-17 06:37:01Z

What are the pitfalls of my approach?

Calling the functions setting the maximum number of threads to use change a global state inside the respective libraries which makes the decorator not thread safe. When you use the numpy functions from several different threads, the threads not using your decorator might also get a single threaded implementation called.

AFAIK there is no good way to solve this. OpenBLAS FAQ even states that you should disable threads in OpenBLAS if your application uses threads itself.

Stack Exchange Network

Better way to set number of threads used by NumPy

Background

Questions

Example of use

Implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Better way to set number of threads used by NumPy

Background

Questions

Example of use

Implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions