Generic container wrapper type with a default underlying buffer (2nd rev)

Question 1

This is a revision of the previous question. I was told to ask an additional question with more context for the modified code to be reviewed.

The changes:

changed from struct to class (for semantic purposes) and made the member variables private;
added a public member function get_container to allow access to the only data member that the user code needs;
added a private type alias value_type.

My general aim is to keep things as simple as appropriate. This means I prefer not to use some features (e.g. inheritance, CRTP, etc.) and this may lead to slightly less convenient API which is fine (in my opinion).

Here is the modified code:

template <class Container, std::size_t DefaultBufferCapacity>
requires ( std::same_as<typename Container::allocator_type, std::pmr::polymorphic_allocator<typename Container::value_type>> )
class [[ nodiscard ]] buffer_resource_container
{
public:
 buffer_resource_container( ) noexcept ( noexcept( Container { &buffer_resource } ) ) = default;
 auto& get_container( this auto& self ) noexcept { return self.container; }
private:
 using value_type = Container::value_type;
 alignas ( value_type ) std::array<std::byte, DefaultBufferCapacity * sizeof( value_type )> buffer;
 std::pmr::monotonic_buffer_resource buffer_resource { std::data( buffer ), std::size( buffer ) };
 Container container { &buffer_resource };
};
template <class ValueType, std::size_t DefaultBufferCapacity>
using buffer_resource_vector = buffer_resource_container<std::pmr::vector<ValueType>, DefaultBufferCapacity>;
template <class ValueType, class Allocator>
[[ nodiscard ]] std::errc inline
reserve_capacity( std::vector<ValueType, Allocator>& vec, const std::size_t new_capacity ) noexcept
{
 try
 {
 vec.reserve( new_capacity );
 }
 catch ( const std::bad_alloc& )
 {
 return std::errc::not_enough_memory;
 }
 catch ( const std::length_error& )
 {
 return std::errc::value_too_large;
 }
 catch ( const std::exception& )
 {
 return std::errc::resource_unavailable_try_again;
 }
 return std::errc { };
}

A good demonstrative case in which I actually use the above code is shown below:

#include <concepts>
#include <memory_resource>
#include <array>
#include <vector>
#include <system_error>
#include <new>
#include <stdexcept>
#include <exception>
#include <chrono>
#include <expected>
#include <span>
#include <string_view>
#include <iterator>
#include <algorithm>
#include <iostream>
#include <cstddef>
template <class Container, std::size_t DefaultBufferCapacity>
requires ( std::same_as<typename Container::allocator_type, std::pmr::polymorphic_allocator<typename Container::value_type>> )
class [[ nodiscard ]] buffer_resource_container
{
public:
 buffer_resource_container( ) noexcept ( noexcept( Container { &buffer_resource } ) ) = default;
 auto& get_container( this auto& self ) noexcept { return self.container; }
private:
 using value_type = Container::value_type;
 alignas ( value_type ) std::array<std::byte, DefaultBufferCapacity * sizeof( value_type )> buffer;
 std::pmr::monotonic_buffer_resource buffer_resource { std::data( buffer ), std::size( buffer ) };
 Container container { &buffer_resource };
};
template <class ValueType, std::size_t DefaultBufferCapacity>
using buffer_resource_vector = buffer_resource_container<std::pmr::vector<ValueType>, DefaultBufferCapacity>;
template <class ValueType, class Allocator>
[[ nodiscard ]] std::errc inline
reserve_capacity( std::vector<ValueType, Allocator>& vec, const std::size_t new_capacity ) noexcept
{
 try
 {
 vec.reserve( new_capacity );
 }
 catch ( const std::bad_alloc& )
 {
 return std::errc::not_enough_memory;
 }
 catch ( const std::length_error& )
 {
 return std::errc::value_too_large;
 }
 catch ( const std::exception& )
 {
 return std::errc::resource_unavailable_try_again;
 }
 return std::errc { };
}
namespace
{
[[ nodiscard ]] bool
try_initialize_timezone_database( ) noexcept
{
 bool is_initialized { };
 try
 {
 [[ maybe_unused ]] const auto& tzdb_list { std::chrono::get_tzdb_list( ) };
 is_initialized = true;
 }
 catch ( const std::runtime_error& )
 { }
 return is_initialized;
}
constexpr auto timezone_names_default_count { 1000uz };
[[ nodiscard ]] std::expected<const std::span<const std::string_view>, std::errc>
retrieve_all_timezone_names( const bool is_tzdb_initialized ) noexcept
{
 if ( is_tzdb_initialized == false ) return std::unexpected { std::errc::state_not_recoverable };
 const auto& tzdb { std::chrono::get_tzdb( ) };
 const auto timezone_names_count { tzdb.zones.size( ) + tzdb.links.size( ) };
 static buffer_resource_vector<std::string_view, timezone_names_default_count> brv { };
 auto& timezone_names_buffer { brv.get_container( ) };
 const auto err_code { reserve_capacity( timezone_names_buffer, timezone_names_count ) };
 if ( err_code != std::errc { } ) return std::unexpected { err_code };
 const auto populate_timezone_names_buffer
 {
 [ &tzdb, inserter = std::back_inserter( timezone_names_buffer ) ]( ) noexcept
 {
 std::ranges::transform( tzdb.zones, inserter, &std::chrono::time_zone::name );
 std::ranges::transform( tzdb.links, inserter, &std::chrono::time_zone_link::name );
 }
 };
 populate_timezone_names_buffer( );
 return timezone_names_buffer;
}
}
const auto is_tzdb_initialized { try_initialize_timezone_database( ) };
const auto all_timezone_names { retrieve_all_timezone_names( is_tzdb_initialized ) };
int main( )
{
 if ( is_tzdb_initialized == false )
 {
 std::cout << "The timezone database could not be initialized\n";
 return 1;
 }
 if ( all_timezone_names.has_value( ) == false )
 {
 std::cout << "An error occurred during buffer allocation\n";
 return 1;
 }
 const auto timezone_names { *all_timezone_names };
}

Looking at the above demo program, I think that this container wrapper is a good solution for storing a limited number of std::string_views without doing a dynamic allocation and still having a fallback to dynamic allocation if the number of timezone names exceeds the default count (i.e. 1000) in the future.

Any suggestions for improvements? Or any potential downsides?

Question 2

A vector-like type that uses a static buffer up to a given size then falls back to some other allocation strategy once that size is exceeded—that is, a vector with a "fairly large small size optimization"—is not a terrible idea. But this is absolutely not the way to go about it. This class serves no purpose beyond a standard vector with a polymorphic memory resource, and is in fact worse in just about every way (non-copy/movable, won’t work with any algorithms or range ops, memory can’t be reused, etc.).

Question 3

But I’m honestly at a loss as to why any of this is necessary. If all you want is the list of time zone names, then just get_tzdb().zones | views::transform([](auto& tz) { return tz.name(); }) whenever you want it. There is no sense in copying the list to another container... let alone a barely functional crippled container. Wrap that in a noexcept function that returns a failed std::expected when get_tzdb() fails, and that more or less satisfies all your design requirements.

Question 4

@indi I see. Good points in my opinion. However the reason for storing them in another container is nothing more than making their use easier. I can use an index and get any name I want from the container.

Question 5

You can use an index with the view, too.

Question 6

About API convenience

[...] I prefer not to use some features (e.g. inheritance, CRTP, etc.) and this may lead to slightly less convenient API which is fine (in my opinion).

Making your buffer_resource_container appear like a regular STL container is not just convenience, it also makes it possible to use your class as a drop-in replacement for existing containers types. Consider this code:

std::vector<int> foo;
foo.push_back(42);
std::sort(foo.begin(), foo.end());
...

Ideally, I'd only have to change the first line to:

buffer_resource_vector<int, 10> foo;

But with your code I also have to change any line that actually uses foo:

foo.get_container().push_back(42);
std::sort(foo.get_container().begin(), foo.get_container().end());
...

Sure, you can store the result of get_container() in a reference and use that, which could avoid some of the changes, but with a drop-in replacement it wouldn't be necessary at all.

However, if it is acceptable for you to just have a get_container() member function, then that is fine.

Is it suitable for your use case?

I think your class certainly has some value. However, in the context of loading the timezone database, I don't think it is very useful. This is something you would only be doing once in your program, and avoiding one memory allocation is not a worthwhile performance optimization. The hardcoded default size of 1000 entries is very likely either too small, so it will require an allocation anyway, or too big, which wastes memory.

`retrieve_all_timezone_names()` is unsafe

This function is unsafe for several reasons. First, you can just call it with true as an argument, even if you didn't initialize the timezone database. Furthermore, it might be accidentally called twice. The first time it will work fine, but the second time it will add all timezone database entries again to the static brv. However, this might cause the std::pmr::vector in it to resize its allocation, causing it to move its elements in memory (whether that is still inside the monotonic buffer resource or not doesn't matter!). This invalidates the std::span returned by the first call to retrieve_all_timezone_names().

I recommend you just have one function to retrieve the timezone names, which is structured like so:

std::expected<const std::span<const std::string_view>, std::errc>
retrieve_all_timezone_names() noexcept
{
 static buffer_resource_vector<std::string_view, timezone_names_default_count> brv;
 static std::expected<const std::span<const std::string_view>, std::errc>
 timezone_names = [&]{
 try {
 std::chrono::get_tzdb_list();
 } except(const std::runtime_error&) {
 return std::unexpected{std::errc::state_not_recoverable};
 }
 const auto& tzdb{std::chrono::get_tzdb()};
 const auto timezone_names_count{tzdb.zones.size() + tzdb.links.size()};
 auto& timezone_names_buffer{brv.get_container()};
 const auto err_code{reserve_capacity(timezone_names_buffer, timezone_names_count)};
 if (err_code != std::errc{})
 return std::unexpected {err_code};
 auto inserter = std::back_inserter(timezone_names_buffer);
 std::ranges::transform(tzdb.zones, inserter, &std::chrono::time_zone::name);
 std::ranges::transform(tzdb.links, inserter, &std::chrono::time_zone_link::name);
 
 return timezone_names_buffer;
 }();
 return timezone_names;
}

This uses a static variable inside a function. Its initialization is guaranteed to be done atomically at the first function call (so it's thread-safe). To do this complex initialization an immediately invoked lambda expression is used. This prevents this function from being called in an unsafe way.

Question 7

"The hardcoded default size of 1000 entries is very likely either too small..." It's currently sufficient since the number of time zones is 597 on the platform I'm targeting. There is the possibility that this will gradually increase over the next few years. Thus I made the buffer have empty space for around 403 more entries.

Question 8

Also how about keeping the call to try_initialize_timezone_database in the global scope but using the global is_tzdb_initialized directly inside retrieve_all_timezone_names (instead of passing it to the function which as you said, can result in programming errors)? I mean is it guaranteed that is_tzdb_initialized will be initialized before the call to retrieve_all_timezone_names?

Question 9

So you currently waste space for 403 entries. The overhead for a single heap allocation is much less than that. I really don't see any benefit for using a monotonic buffer resource here. Global variables are initialized in the order in which they appear in the source file, however static variables in functions are initialized the first time the function they are in is called.

G. Sliepen G. Sliepen 68.7k3 gold badges74 silver badges179 bronze badges · Accepted Answer · 2024-04-30 21:20:15Z

About API convenience

[...] I prefer not to use some features (e.g. inheritance, CRTP, etc.) and this may lead to slightly less convenient API which is fine (in my opinion).

Making your buffer_resource_container appear like a regular STL container is not just convenience, it also makes it possible to use your class as a drop-in replacement for existing containers types. Consider this code:

std::vector<int> foo;
foo.push_back(42);
std::sort(foo.begin(), foo.end());
...

Ideally, I'd only have to change the first line to:

buffer_resource_vector<int, 10> foo;

But with your code I also have to change any line that actually uses foo:

foo.get_container().push_back(42);
std::sort(foo.get_container().begin(), foo.get_container().end());
...

Sure, you can store the result of get_container() in a reference and use that, which could avoid some of the changes, but with a drop-in replacement it wouldn't be necessary at all.

However, if it is acceptable for you to just have a get_container() member function, then that is fine.

Is it suitable for your use case?

I think your class certainly has some value. However, in the context of loading the timezone database, I don't think it is very useful. This is something you would only be doing once in your program, and avoiding one memory allocation is not a worthwhile performance optimization. The hardcoded default size of 1000 entries is very likely either too small, so it will require an allocation anyway, or too big, which wastes memory.

`retrieve_all_timezone_names()` is unsafe

This function is unsafe for several reasons. First, you can just call it with true as an argument, even if you didn't initialize the timezone database. Furthermore, it might be accidentally called twice. The first time it will work fine, but the second time it will add all timezone database entries again to the static brv. However, this might cause the std::pmr::vector in it to resize its allocation, causing it to move its elements in memory (whether that is still inside the monotonic buffer resource or not doesn't matter!). This invalidates the std::span returned by the first call to retrieve_all_timezone_names().

I recommend you just have one function to retrieve the timezone names, which is structured like so:

std::expected<const std::span<const std::string_view>, std::errc>
retrieve_all_timezone_names() noexcept
{
 static buffer_resource_vector<std::string_view, timezone_names_default_count> brv;
 static std::expected<const std::span<const std::string_view>, std::errc>
 timezone_names = [&]{
 try {
 std::chrono::get_tzdb_list();
 } except(const std::runtime_error&) {
 return std::unexpected{std::errc::state_not_recoverable};
 }
 const auto& tzdb{std::chrono::get_tzdb()};
 const auto timezone_names_count{tzdb.zones.size() + tzdb.links.size()};
 auto& timezone_names_buffer{brv.get_container()};
 const auto err_code{reserve_capacity(timezone_names_buffer, timezone_names_count)};
 if (err_code != std::errc{})
 return std::unexpected {err_code};
 auto inserter = std::back_inserter(timezone_names_buffer);
 std::ranges::transform(tzdb.zones, inserter, &std::chrono::time_zone::name);
 std::ranges::transform(tzdb.links, inserter, &std::chrono::time_zone_link::name);
 
 return timezone_names_buffer;
 }();
 return timezone_names;
}

This uses a static variable inside a function. Its initialization is guaranteed to be done atomically at the first function call (so it's thread-safe). To do this complex initialization an immediately invoked lambda expression is used. This prevents this function from being called in an unsafe way.

"The hardcoded default size of 1000 entries is very likely either too small..." It's currently sufficient since the number of time zones is 597 on the platform I'm targeting. There is the possibility that this will gradually increase over the next few years. Thus I made the buffer have empty space for around 403 more entries.
Also how about keeping the call to try_initialize_timezone_database in the global scope but using the global is_tzdb_initialized directly inside retrieve_all_timezone_names (instead of passing it to the function which as you said, can result in programming errors)? I mean is it guaranteed that is_tzdb_initialized will be initialized before the call to retrieve_all_timezone_names?
So you currently waste space for 403 entries. The overhead for a single heap allocation is much less than that. I really don't see any benefit for using a monotonic buffer resource here. Global variables are initialized in the order in which they appear in the source file, however static variables in functions are initialized the first time the function they are in is called.

Stack Exchange Network

Generic container wrapper type with a default underlying buffer (2nd rev)

1 Answer 1

About API convenience

Is it suitable for your use case?

`retrieve_all_timezone_names()` is unsafe

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Generic container wrapper type with a default underlying buffer (2nd rev)

1 Answer 1

About API convenience

Is it suitable for your use case?

retrieve_all_timezone_names() is unsafe

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

`retrieve_all_timezone_names()` is unsafe