bcolz provides columnar, chunked data containers that can be
compressed either in-memory and on-disk. Column storage allows for
efficiently querying tables, as well as for cheap column addition and
removal. It is based on NumPy, and uses it
as the standard data container to communicate with bcolz objects, but
it also comes with support for import/export facilities to/from
HDF5/PyTables tables and pandas
dataframes.
bcolz objects are compressed by default not only for reducing
memory/disk storage, but also to improve I/O speed. The compression
process is carried out internally by Blosc, a
high-performance, multithreaded meta-compressor that is optimized for
binary data (although it works with text data just fine too).
bcolz can also use numexpr
internally (it does that by default if it detects numexpr installed)
so as to accelerate many vector and query operations (although it can
use pure NumPy for doing so too). numexpr can optimize the memory
usage and use multithreading for doing the computations, so it is
blazing fast. This, in combination with carray/ctable disk-based,
compressed containers, can be used for performing out-of-core
computations efficiently, but most importantly transparently.
Just to whet your appetite, here it is an example with real data, where
bcolz is already fulfilling the promise of accelerating memory I/O by
using compression:
Based on the "Science and Data Analysis" category.
Alternatively, view bcolz alternatives based on common mentions on social networks and blogs.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of bcolz or a related project?
Do not miss the trending, packages, news and articles with our weekly report.