Friday, February 13, 2009
Generic programming and out-of-the core computation
STXXL is a C++ library for extra large data sets. The library allows to write generic programs, with an idiom typical of STL, and to leverage the power of out-of-the-core computations for massive data-sets. I played with the library and I loved it.
- You have the STXXL containers, which are similar to STL containers. Anyway, an STXXL container can be mapped to (multiple) external file(s) in a transparent way. Files may have different implementations that might be bases on various file systems or even remote storage interface. Examples of containers are:
- vectors: example: stxxl::vector
v(64 * int64(1024 * 1024)); - map: example: stxxl::map
- queue: example: stxxl:queue
- stacks and priority queues
- You can access the containers with STXXL Iterators and the library is taking care of all I/O operation, supporting large files up to dozens of terabytes even if they are mapped on multiple physical disks.
- unsigned int big_size = 1024 * 1024 * 2 * 16 * 16;
typedef stxxl::vectorvec_big;
vec_big my_vec(big_size);
vec_big::iterator big_it = my_vec.begin(); - You can use a collection of STXXL Algorithms, similar to STL ones but for external memory:
- stxxl::generate(v.begin(), v.end(), stxxl::random_number32(), 4);
- stxxl::for_each_m(v.begin(), v.end(), square
(), 4); - ...
- A particular mention goes to sorting algorithms for external memory:
- typedef stxxl::vector
vector_type;
const stxxl::int64 n_records =
stxxl::int64(384) * stxxl::int64(1024 * 1024) / sizeof(my_type);
vector_type v(n_records);
stxxl::random_number32 rnd;
for (vector_type::size_type i = 0; i < v.size(); i++)
v[i]._key = 1 + (rnd() % 0xfffffff); stxxl::sort(v.begin(), v.end(), cmp(), memory_to_use); - This paper describes the library Stxxl : Standard Template Library for XXL Data Sets ; R. Dementiev, L. Kettner, P. Sanders "STXXL: standard template library for XXL data sets" Software: Practice and Experience Volume 38, Issue 6, Pages 589-637, May 2008 DOI: 10.1002/spe.844. See also the technical report.
- Here you have a collection of example codes; performance are pretty good, and some stxxl users have reported that stxxl sort is much faster than UNIX sort. In addition, i found this paper which reports very good performance (Breadth first search on massive graphs ). Do not forget to consult the STXXL forum if you need any help.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
[フレーム]