Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

stotko/stdgpu

Repository files navigation

stdgpu: Efficient STL-like Data Structures on the GPU

Features | Examples | Getting Started | Contributing | License | Contact

Features

stdgpu is an open-source library providing generic GPU data structures for fast and reliable data management.

  • Lightweight C++17 library with minimal dependencies
  • CUDA, OpenMP, and HIP (experimental) backends
  • Familiar STL-like GPU containers
  • High-level, agnostic container functions like insert(begin, end), to write shared C++ code
  • Low-level, native container functions like find(key), to write custom CUDA kernels, etc.
  • Interoperability with thrust GPU algorithms

Instead of providing yet another ecosystem, stdgpu is designed to be a lightweight container library. Previous libraries such as thrust, VexCL, ArrayFire or Boost.Compute focus on the fast and efficient implementation of various algorithms and only operate on contiguously stored data. stdgpu follows an orthogonal approach and focuses on fast and reliable data management to enable the rapid development of more general and flexible GPU algorithms just like their CPU counterparts.

At its heart, stdgpu offers the following GPU data structures and containers:

atomic & atomic_ref
Atomic primitive types and references bitset
Space-efficient bit array deque
Dynamically sized double-ended queue
queue & stack
Container adapters unordered_map & unordered_set
Hashed collection of unique keys and key-value pairs vector
Dynamically sized contiguous array

In addition, stdgpu also provides further commonly used helper functionality in algorithm, bit, contract, cstddef, execution, functional, iterator, limits, memory, mutex, numeric, ranges, type_traits, utility.

Examples

In order to reliably perform complex tasks on the GPU, stdgpu offers flexible interfaces that can be used in both agnostic code, e.g. via the algorithms provided by thrust, as well as in native code, e.g. in custom CUDA kernels.

For instance, stdgpu is extensively used in SLAMCast, a scalable live telepresence system, to implement real-time, large-scale 3D scene reconstruction as well as real-time 3D data streaming between a server and an arbitrary number of remote clients.

Agnostic code. In the context of SLAMCast, a simple task is the integration of a range of updated blocks into the duplicate-free set of queued blocks for data streaming which can be expressed very conveniently:

#include <stdgpu/cstddef.h> // stdgpu::index_t
#include <stdgpu/iterator.h> // stdgpu::make_device
#include <stdgpu/unordered_set.cuh> // stdgpu::unordered_set
class stream_set
{
public:
 void
 add_blocks(const short3* blocks,
 const stdgpu::index_t n)
 {
 set.insert(stdgpu::make_device(blocks),
 stdgpu::make_device(blocks + n));
 }
 // Further functions
private:
 stdgpu::unordered_set<short3> set;
 // Further members
};

Native code. More complex operations such as the creation of the duplicate-free set of updated blocks or other algorithms can be implemented natively, e.g. in custom CUDA kernels with stdgpu's CUDA backend enabled:

#include <stdgpu/cstddef.h> // stdgpu::index_t
#include <stdgpu/unordered_map.cuh> // stdgpu::unordered_map
#include <stdgpu/unordered_set.cuh> // stdgpu::unordered_set
__global__ void
compute_update_set(const short3* blocks,
 const stdgpu::index_t n,
 const stdgpu::unordered_map<short3, voxel*> tsdf_block_map,
 stdgpu::unordered_set<short3> mc_update_set)
{
 // Global thread index
 stdgpu::index_t i = blockIdx.x * blockDim.x + threadIdx.x;
 if (i >= n) return;
 short3 b_i = blocks[i];
 // Neighboring candidate blocks for the update
 short3 mc_blocks[8]
 = {
 short3(b_i.x - 0, b_i.y - 0, b_i.z - 0),
 short3(b_i.x - 1, b_i.y - 0, b_i.z - 0),
 short3(b_i.x - 0, b_i.y - 1, b_i.z - 0),
 short3(b_i.x - 0, b_i.y - 0, b_i.z - 1),
 short3(b_i.x - 1, b_i.y - 1, b_i.z - 0),
 short3(b_i.x - 1, b_i.y - 0, b_i.z - 1),
 short3(b_i.x - 0, b_i.y - 1, b_i.z - 1),
 short3(b_i.x - 1, b_i.y - 1, b_i.z - 1),
 };
 for (stdgpu::index_t j = 0; j < 8; ++j)
 {
 // Only consider existing neighbors
 if (tsdf_block_map.contains(mc_blocks[j]))
 {
 mc_update_set.insert(mc_blocks[j]);
 }
 }
}

More examples can be found in the examples directory.

Getting Started

stdgpu requires a C++17 compiler as well as minimal backend dependencies and can be easily built and integrated into your project via CMake:

More guidelines as well as a comprehensive introduction into the design and API of stdgpu can be found in the documentation.

Contributing

For detailed information on how to contribute, see the Contributing section in the documentation.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

If you use stdgpu in one of your projects, please cite the following publications:

stdgpu: Efficient STL-like Data Structures on the GPU

@UNPUBLISHED{stotko2019stdgpu,
 author = {Stotko, P.},
 title = {{stdgpu: Efficient STL-like Data Structures on the GPU}},
 year = {2019},
 month = aug,
 note = {arXiv:1908.05936},
 url = {https://arxiv.org/abs/1908.05936}
}

SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence

@article{stotko2019slamcast,
 author = {Stotko, P. and Krumpen, S. and Hullin, M. B. and Weinmann, M. and Klein, R.},
 title = {{SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence}},
 journal = {IEEE Transactions on Visualization and Computer Graphics},
 volume = {25},
 number = {5},
 pages = {2102--2112},
 year = {2019},
 month = may
}

Contact

Patrick Stotko - stotko@cs.uni-bonn.de

AltStyle によって変換されたページ (->オリジナル) /