ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP (including switches at runtime).
The highlights of the latest 1.7.x release family are:
A Python wrapper named PyViennaCL is also available.