As an exercise to dive into the C++ standard library functions, particularly in <algorithm>
, I've decided to write a simple Dictionary class that attempts to utilize as many standard library functions as possible. I would really appreciate feedback on:
- General standard library usage: Am I using the correct functions, or are there some that could make this smoother?
- Throwing exceptions: I haven't really written programs that throw exceptions, so tips on my usage of them here would be great
- Any other recommendations are welcome
dictionary.hpp
#ifndef _DICTIONARY_HPP
#define _DICTIONARY_HPP
#include <vector>
#include <iostream>
#include <algorithm>
namespace dev {
template <typename K, typename V>
class Dict {
private:
std::vector<std::pair<K, V>> dict;
public:
Dict() {}
// Construct a dictionary from the passed key value pairs
Dict(std::vector<std::pair<K, V>> data) { dict = data; }
// Construct a dictionary from a vector of keys and a vector of items. Must be the same length.
Dict(const std::vector<K>& keys, const std::vector<V>& values) {
if (!(keys.size() == values.size())) throw std::length_error("Keys and Values must be the same sized vector.");
std::transform(keys.begin(), keys.end(), values.begin(), std::back_inserter(dict),
[](const auto& key, const auto& value) {
return std::pair<K, V>(key, value);
});
}
// Removes all the elements from the dictionary
void Clear(void) { dict.clear(); }
// Returns a copy of the dictionary
Dict Copy(void) {
std::vector<std::pair<K, V>> copy;
copy.reserve(dict.size());
std::copy(dict.begin(), dict.end(), std::back_inserter(copy));
return copy;
}
// Returns a dictionary with the specified keys and value
static Dict<K, V> FromKeys(std::vector<K> keys, V value) {
Dict<K, V> result;
std::for_each(keys.begin(), keys.end(),
[&](const K& key) {
result.Add(key, value);
});
return result;
}
// Returns a value of the specified key
V Get(K key) {
auto it = std::find_if(dict.begin(), dict.end(),
[&](const auto& pair) {
return pair.first == key;
});
if (it != dict.end()) {
return it->second;
}
throw std::out_of_range("Key not found.");
}
// Returns a vector containing a pair for each key and value
std::vector<std::pair<K, V>> Items(void) { return dict; }
// Returns a vector containing the dict's keys
std::vector<K> Keys(void) {
std::vector<K> keys;
keys.reserve(dict.size());
std::transform(dict.begin(), dict.end(), std::back_inserter(keys),
[](const auto& pair) {
return pair.first;
});
return keys;
}
// Removes the element with the specified key
void Pop(K key) {
auto it = std::find_if(dict.begin(), dict.end(),
[&](const auto& pair) {
return pair.first == key;
});
if (it != dict.end()) {
dict.erase(it);
dict.shrink_to_fit();
}
}
// Returns the value of the specified key. If the key does not exist: insert the key, with the specified value
V SetDefault(K key, V defaultValue) {
auto it = std::find_if(dict.begin(), dict.end(),
[&](const auto& pair) {
return pair.first == key;
});
if (it != dict.end()) {
return it->second;
}
dict.emplace_back(key, defaultValue);
return defaultValue;
}
// Updates the dictionary with the specified key-value pairs
void Update(std::vector<std::pair<K, V>> items) {
dict = std::move(items);
dict.shrink_to_fit();
}
// Returns a vector of all the values in the dictionary
std::vector<V> Values(void) {
std::vector<V> values;
values.reserve(dict.size());
std::transform(dict.begin(), dict.end(), std::back_inserter(values),
[](const auto& pair) {
return pair.second;
});
return values;
}
// Adds a key value pair to the dict, if key already exists then overwrite previous value
void Add(K key, V value) {
auto it = std::find_if(dict.begin(), dict.end(),
[&](const auto& pair) {
return pair.first == key;
});
if (it != dict.end()) {
it->second = value;
} else {
dict.emplace_back(key, value);
}
}
// Like FromKeys, but accepts iterators instead
template <typename InputIterator>
void Insert(InputIterator start, InputIterator end, const V& value) {
dict.reserve(dict.size() + std::distance(start, end));
std::for_each(start, end, [&](const K& key) {
dict.emplace_back(key, value);
});
}
// Prints each key value pair to the console
void Print(void) {
std::for_each(dict.begin(), dict.end(),
[](const auto& pair) {
std::cout << "(" << pair.first << ", " << pair.second << ")" << std::endl;
});
}
};
}
#endif // _DICTIONARY_HPP
main.cpp (example usage)
#include <iostream>
#include "dictionary.hpp"
int main(void) {
dev::Dict<char, int> dict;
dict.Add('a', 1);
dict.Add('b', 2);
dict.Add('c', 3);
dict.Add('d', 4);
dict.Add('e', 5);
dev::Dict<char, int> dict2 = dict.Copy();
dict.Print();
std::cout << std::endl;
dict2.Print();
auto dict3 = dev::Dict<std::string, int>::FromKeys({"abc", "def"}, 1);
std::cout << std::endl;
dict3.Print();
dict2.Update(dict.Items());
dev::Dict<char, int> dict4(dict.Keys(), dict.Values());
dict4.SetDefault('e', 6);
std::cout << dict.Get('a') << std::endl;
return 0;
}
Makefile (for those interested)
compile:
@-g++ -std=c++2b -Wall -Wextra -Wpedantic main.cpp -o main
run:
@-./main
clean:
@-rm main
all: compile run clean
# @- hides make command output (still allows program output)
-
2\$\begingroup\$ I can see the value of this exercise, but to clarify, you're aware of std::map, and simply want to implement your own dictionary / map using as many standard library functions as possible ... without just using std::map? \$\endgroup\$Eric Angle– Eric Angle2025年02月06日 17:24:45 +00:00Commented Feb 6 at 17:24
-
\$\begingroup\$ @EricAngle Yep exactly, just an exercise to familiarize myself with some STL functions. \$\endgroup\$Ben A– Ben A2025年02月06日 20:49:21 +00:00Commented Feb 6 at 20:49
-
\$\begingroup\$ You can also try to make your methods constexpr to enable creating and working with dictionary at compile-time. Also you should check if your keys have types that are comparable. \$\endgroup\$kiner_shah– kiner_shah2025年02月07日 08:01:01 +00:00Commented Feb 7 at 8:01
4 Answers 4
I'm assuming the intention is to provide functionality similar to std::map
(albeit with different performance characteristics). If so, it would help users if you used the same public member names as the standard containers; that makes it easier to use generic code that doesn't need to know the type of container it's applied to.
We're missing some includes:
#include <exception> // std::length_error, std::out_of_range
#include <iterator> // std::back_inserter
#include <utility> // std::pair
The test program is missing an include of <string>
.
There's no need for private:
at the start of a class, since classes default to private access.
Instead of assigning members in constructors, prefer to use the initialiser list:
Dict(std::vector<std::pair<K, V>> data)
dict{std::move(data)}
{ }
Note also the use of std::move()
so that we're not copying data
twice.
Instead of !(a == b)
we can use the !=
operator for clarity:
if (keys.size() != values.size()) {
throw std::length_error("Keys and Values must be the same sized vector.");
}
It's good practice to always use braces for such conditions, even those that are a single statement.
The std::ranges
versions of standard algorithms are simpler to use when we're passing whole containers:
std::ranges::transform(keys, values, std::back_inserter(dict),
std::make_pair<const K&,const V&>);
In C++, we don't need to write (void)
for an empty argument list as we do in C. Just write ()
.
I don't see any value in the Copy()
function that does exactly the same as the compiler-generated copy constructor.
Similarly, Update()
seems to duplicate the compiler-generated copy-assignment.
In Get()
(which a standard container would call at()
), the std::ranges
algorithm simplifies the code, since it can use a projection:
auto it = std::ranges::find(dict, key, &std::pair<K,V>::first);
In fact, this expression turns up so frequently it's probably worth writing a private accessor for it.
Since Get()
shouldn't be modifying the container, we should declare it const
. It's also a good idea to pass the key as a const-ref, since we don't know how expensive it is to copy:
V Get(const K& key) const {
The function name SetDefault
is misleading; it's kind of a "get or set".
Print()
is inflexible, as it can only write to standard output stream, not to the log stream or to a file. We normally write this with a signature like
friend std::ostream &operator(std::ostream&, const Dict&);
It flushes output unnecessarily often - replace that std::endl
with plain '\n'
.
The main()
function is fine as far as it goes. It would be better if it exercised all the functionality, and returned EXIT_FAILURE
if any of the results fail to match what's expected (i.e. make it a set of self-checking unit tests).
Makefile is strange - I don't expect it to suppress errors except perhaps from the clean
target. It's also unhelpful to hide the commands being run.
We can lean much more on the built-in Make rules:
.PHONY: run
run: main
./$<
.PHONY: clean
clean:
$(RM) main
.DELETE_ON_ERROR:
For reference, here's my modified version of your code and some minimal changes to the main()
(not full conversion to unit tests):
#ifndef _DICTIONARY_HPP
#define _DICTIONARY_HPP
#include <algorithm>
#include <exception>
#include <iterator>
#include <ostream>
#include <utility>
#include <vector>
#include <functional>
namespace dev {
template <typename K, typename V>
class Dict
{
using element_type = std::pair<K,V>;
std::vector<element_type> dict = {};
// From C++23 onwards, we could use "this auto& self" as
// argument to a single definition.
auto find(const K& key)
{
return std::ranges::find(dict, key, &element_type::first);
}
auto find(const K& key) const
{
return std::ranges::find(dict, key, &element_type::first);
}
public:
Dict() {}
// Construct a dictionary from the passed key value pairs
Dict(std::vector<element_type> data)
: dict {std::move(data)}
{ }
// Construct a dictionary from a vector of keys and a vector
// of items. Must be the same length.
Dict(std::vector<K> const& keys, const std::vector<V>& values)
{
if (keys.size() != values.size()) {
throw std::length_error("Keys and Values must be the same sized vector.");
}
std::ranges::transform(keys, values, std::back_inserter(dict),
std::make_pair<K const&,V const&>);
}
// Removes all the elements from the dictionary
void clear()
{
dict.clear();
}
// Returns a dictionary with the specified keys and value
static Dict<K, V> from_keys(std::vector<K> const& keys, V value) {
Dict<K, V> result;
result.insert(keys.begin(), keys.end(), value);
return result;
}
// Returns a value of the specified key
V const& at(K const& key) const
{
if (auto const it = find(key); it != dict.end()) {
return it->second;
}
throw std::out_of_range("Key not found.");
}
// Returns a vector containing a pair for each key and value
std::vector<element_type> const& items() const
{
return dict;
}
// Returns a vector containing the dict's keys
std::vector<K> keys() const
{
std::vector<K> keys;
keys.reserve(dict.size());
std::ranges::transform(dict, std::back_inserter(keys), &element_type::first);
return keys;
}
// Removes the element with the specified key
void erase(K const& key)
{
if (auto const it = find(key); it != dict.end()) {
dict.erase(it);
dict.shrink_to_fit();
}
}
// Returns the value of the specified key.
// If the key does not exist: insert the key, with the specified value
V const& get_or_assign(K const& key, V const& defaultValue)
{
if (auto const it = find(key); it != dict.end()) {
return it->second;
}
dict.emplace_back(key, defaultValue);
return defaultValue;
}
// Returns a vector of all the values in the dictionary
std::vector<V> values() const
{
std::vector<V> values;
values.reserve(dict.size());
std::ranges::transform(dict, std::back_inserter(values), &element_type::second);
return values;
}
// Adds a key value pair to the dict, if key already exists then overwrite previous value
void insert_or_update(K const& key, V const& value)
{
if (auto const it = find(key); it != dict.end()) {
it->second = value;
} else {
dict.emplace_back(key, value);
}
}
// Like FromKeys, but accepts iterators instead
template<typename InputIterator>
void insert(InputIterator start, InputIterator end, V const& value)
{
dict.reserve(dict.size() + std::distance(start, end));
std::for_each(start, end, [&](K const& key) {
dict.emplace_back(key, value);
});
}
friend std::ostream& operator<<(std::ostream& out, Dict const& dict)
{
for (auto const& [key, val]: dict.dict) {
out << key << ": " << val << "," << '\n';
}
return out;
}
};
}
#endif // _DICTIONARY_HPP
#include "dictionary.hpp"
#include <iostream>
int main()
{
dev::Dict<char, int> dict;
dict.insert_or_update('a', 1);
dict.insert_or_update('b', 2);
dict.insert_or_update('c', 3);
dict.insert_or_update('d', 4);
dict.insert_or_update('e', 5);
dev::Dict<char, int> dict2 = dict;
std::cout << dict << '\n'
<< dict2 << '\n';
auto dict3 = dev::Dict<std::string, int>::from_keys({"abc", "def"}, 1);
std::cout << dict3 << '\n';
dict2 = dict.items();
dev::Dict<char, int> dict4(dict.keys(), dict.values());
dict4.get_or_assign('e', 6);
std::cout << dict.at('a') << '\n';
}
-
\$\begingroup\$ is
std::make_pair<const K&,const V&>
an addressable function? Unfortunately I think you have to wrap it \$\endgroup\$Caleth– Caleth2025年02月06日 10:24:11 +00:00Commented Feb 6 at 10:24 -
\$\begingroup\$ I'm not sure - it works for me on GCC, but perhaps it's not sufficiently portable. \$\endgroup\$Toby Speight– Toby Speight2025年02月06日 10:58:23 +00:00Commented Feb 6 at 10:58
General standard library usage: Am I using the correct functions, or are there some that could make this smoother?
std::flat_map
is a container adaptor that provides the functionality of an associative container over two sequence containers (sorted keys container mapped to the values container). std::flat_map
was added to the standard library with C++23.
Note - Keep in mind that some of the points below apply to multiple functions despite being brought up just once.
#ifndef _DICTIONARY_HPP
#define _DICTIONARY_HPP
Do not use identifiers reserved for the implementation. The standard mandates that identifiers that contains a double underscore
__
or begins with an underscore followed by an uppercase letter is reserved for the implementation. Additionally, global namespace identifiers that begin with an underscore also reserved. See [lex.name]/3.Differentiate preprocessor identifiers to reduce the chance of collisions. Don't just name the guard after the filename. Include a key and a good differentiator. Some coding standards will use the physical or logical path combined with project name and/or a prefix/suffix.
- Bloomberg: INCLUDED_ prefix with package path
- Google: Source path with underscore suffix
#include <vector>
#include <iostream>
#include <algorithm>
- Include what you use. Don't rely on transitive includes. Missing
<iterator>
,<stdexcept>
,<string>
,<utility>
.
Dict() {}
- Use
=default
when you have to be explicit about using default semantics. - Functions that should not throw should be marked
noexcept
.
Dict(std::vector<std::pair<K, V>> data) { dict = data; }
- Prefer initialization to assignment in constructors.
- For "in" parameters, pass cheaply-copied types by value and others by reference-to-
const
. If you insist on passing the vector by value, then move constructdict
with the local copy ofdata
. - Declare single-argument constructors
explicit
.
Dict(const std::vector<K>& keys, const std::vector<V>& values) {
if (!(keys.size() == values.size())) throw std::length_error("Keys and Values must be the same sized vector.");
std::transform(keys.begin(), keys.end(), values.begin(), std::back_inserter(dict),
[](const auto& key, const auto& value) {
return std::pair<K, V>(key, value);
});
}
- Reserve capacity when the destination capacity is known. Avoids the need to reallocate every power of 2 (clang/gcc) or 1.5 (msvc) insertions.
- Use the range variants of the standard algorithms. C++20 added algorithms to the
std::ranges
namespace that don't require you to destructure ranges into iterator pairs. - Be consistent with scoping single-line statements. Other instances of single-statement blocks are braced.
Dict Copy(void) {
std::vector<std::pair<K, V>> copy;
copy.reserve(dict.size());
std::copy(dict.begin(), dict.end(), std::back_inserter(copy));
return copy;
}
- In a function declaration, an empty parameter list and a single unnamed parameter of non-dependent type
void
are equivalent. C considers an empty parameter list as taking anything, with a singlevoid
indicating nothing. C++ considers an empty parameter list as nothing. Explicitly statingvoid
is not necessary. - Member functions that do not modify the object should be marked with
const
. - Use the copy constructor and copy assignment operator. Do you really need a standalone copy function? The only underlying type,
std::vector
, is a value type. The implicitly generated copy and move operations behave correctly with value types. You don't even have to do anything for the compiler to generate these operations (Rule of Zero). If you did want a standalone copy function, you can return*this
, taking advantage of the implicitly generated copy constructor. - Functions results that should not be discarded should be annotated with
[[nodiscard]]
.
V Get(K key) {
auto it = std::find_if(dict.begin(), dict.end(),
[&](const auto& pair) {
return pair.first == key;
});
if (it != dict.end()) {
return it->second;
}
throw std::out_of_range("Key not found.");
}
- Use the explicit object parameter (deducing-
this
) to provideconst
and reference overloads. - Use the projection parameter to convert elements to keys.
std::ranges::find
has an overload that takes a value and projection. - Return a reference to the found value. If using deducing-
this
, returnauto
instead and let the compiler determine the correct return type based on the context. - Like the parameters for
std::vector<K, V>
above, you really don't know ifK
orV
is cheap or expensive to copy. SinceK
is an in-parameter, pass it as reference-to-const
. - Consider limiting the scope of the returned iterator to the
if
block. C++17 added the ability to put an init statement in anif
statement, before the condition.
std::vector<K> Keys(void) {
std::vector<K> keys;
keys.reserve(dict.size());
std::transform(dict.begin(), dict.end(), std::back_inserter(keys),
[](const auto& pair) { return pair.first; });
return keys;
}
- The ranges library provides view adaptors to solve common problems. Slicing a collection of tuple-like objects can be accomplished using
std::views::keys
andstd::ranges::to<std::vector>
. There is alsostd::views::values
forDict<K, V>::Values()
.
void Pop(K key) { ... }
Erase
seems like a better name.- Instead of forcing the shrink, provide the user with the ability to shrink when they want to shrink.
V SetDefault(K key, V defaultValue) { ... }
GetOrDefault
would be a better name.- Return a reference to the value that is found or the value after it is inserted.
template <typename InputIterator>
void Insert(InputIterator start, InputIterator end, const V& value) {
dict.reserve(dict.size() + std::distance(start, end));
std::for_each(start, end,
[&](const K& key) { dict.emplace_back(key, value); });
}
- Constrain template parameters. The assumed meaning of a template argument is fundamental to the interface of a template. Using a concept improves both documentation and error-handling for the template.
- You pass over the range [
start
,end
] twice in this function.InputIterator
s do not provide multipass support. The correct iterator category would beForwardIterator
. - Consider providing an overload that takes a range and specializes whether that range is sized or not.
- Should duplicates be allowed to exist?
Insert
doesn't check if a duplicate key already exists. - Consider returning whether the operation and the iterator to the inserted Key-Value pair.
Any other recommendations are welcome
- If you want to increase interoperability with the standard library and code designed to be used with the standard library, follow the requirements for Container, Reversible Container, Associative Container (sans node related requirements).
- If your
Dict
sorted pairs by key, you could use binary search instead of linear search. - Rather than tie keys and values together as pairs in a single container, use one container for keys and one container for values. Your
Dict
class would be responsible for keeping the two containers synchronized. It would greatly simplify constructing theDict
. Any time you actually need to tie Keys and Values together, you can usestd::views::zip
. Key searching becomes cache efficient as cache lines are filled with only keys.Keys()
andValues()
are simple to return.
None of your constructors use the member initialiser, so they first construct an empty vector then assign or insert into it.
Your parameters are somewhat inconsistent: some are by-value and some are by-const-ref. You also only accept std::vector
s, but there are many other things you could accept in basically the same way.
Related to that, everything you return is by-value.
None of your functions are const
qualified.
std::for_each
is worse for iterating over ranges of pairs that a range-for loop, because in the latter you can directly use structured bindings. In the former, you have to take the pair
as a parameter.
void Print(void) {
for (auto& [key, value] : dict) {
std::cout << "(" << key << ", " << value << ")" << std::endl;
}
}
Insert
is misleadingly templated, and the template parameter should be constrained. An InputIterator
is not necessarily multi-pass, so the distance
call can consume the values. You should also consider letting end
be anything that is a sentinel for start
template <std::input_iterator InputIterator, std::sentinel_for<InputIterator> Sentinel>
requires std::constructible_from<V, std::iter_value_t<InputIterator>>
void Insert(InputIterator start, Sentinel end, const V& value) {
if constexpr(std::forward_iterator<InputIterator>) {
dict.reserve(dict.size() + std::distance(start, end));
}
std::ranges::for_each(start, end, [&](const K& key) {
dict.emplace_back(key, value);
});
}
SetDefault
is a weird name for what it does. std::map
calls this try_emplace
.
Am I using the correct functions, or are there some that could make this smoother?
There's a lot of places where you are doing things longhand that can be simplified with range adaptors, as std::vector
can be constructed from a range.
Dict(const std::vector<K>& keys, const std::vector<V>& values) : dict(std::from_range, std::views::zip(keys, values)) {}
Dict<K, V> FromKeys(std::vector<K> keys, V value) {
std::vector<std::pair<K, V>> dict(std::from_range, std::views::zip(keys, std::views::repeat(value)));
return dict;
}
Dict Copy(void) {
return dict;
}
std::vector<K> Keys(void) {
return { std::from_range, dict | std::views::keys };
}
-
\$\begingroup\$ "
std::for_each
is worse for iterating over ranges of pairs that a range-for loop, because in the latter you can use structured bindings." Named algorithms are always better than nakedfor
loops—yes, even rangefor
loops—because they express intent (and have other benefits, too). You can use structured bindings withstd::for_each
(orstd::ranges::for_each()
) just fine. \$\endgroup\$indi– indi2025年02月11日 03:02:41 +00:00Commented Feb 11 at 3:02 -
\$\begingroup\$ @indi Oh really? While that is generally true,
std::for_each
is the exception, with the reasoning given here which is talking about C#, but the point is laregely language-independant.std::ranges::for_each
expresses exactly the same intent as a ranged-for \$\endgroup\$Caleth– Caleth2025年02月11日 09:16:00 +00:00Commented Feb 11 at 9:16 -
\$\begingroup\$ Admittedly
std::for_each
might be preferable if you already havefirst, last
as two separate objects \$\endgroup\$Caleth– Caleth2025年02月11日 10:55:46 +00:00Commented Feb 11 at 10:55 -
\$\begingroup\$ "Oh really?" Yes, really. "
std::ranges::for_each
expresses exactly the same intent as a ranged-for" Oh, really? Well, then this quiz should be easy for you to solve: Given a ranger
that has 10 elements, and assuming no errors (exceptions or termination), without knowing the operation being done on each element ("...
"), exactly how many times is the operation run in case a)for_each(r, [](auto x) { ... })
; and case b)for (auto x : r) { ... }
? (Answer: a) 10. b) 🤷🏼.) \$\endgroup\$indi– indi2025年02月11日 16:44:39 +00:00Commented Feb 11 at 16:44 -
\$\begingroup\$
for_each(r, ...)
promises that (assuming no errors), the operation...
will be run for every element inr
. Every. ... Element. ... Always. Rangefor
does not.for_each()
promises that nothing local to the calling scope will be affected unless it is explicitly passed by reference. Rangefor
does not.for_each()
promises that when it is done, control will always be passed to the next statement (or sequenced-after evaluation). Rangefor
does not. This is "exactly the same"? \$\endgroup\$indi– indi2025年02月11日 16:44:44 +00:00Commented Feb 11 at 16:44
You already got a lot of comments. Let me say something about performance:
dict.shrink_to_fit()
potentially slows down code a lot. Imagine for example the case where you remove one value and then immediately add another one. First you (maybe) copy the array over into a smaller memory block, and then back into a larger one.std::vector
doubles the array size when appending a value if there is not enough space, so now, after all the copying, you might use much more memory than you would if you hadn’t shrunk the container at all!Instead, consider shrinking if you see that
size()
is less than half thecapacity()
.Obviously you know about
std::map
andstd::unordered_map
. The latter is a hash table, which has O(1) insert and lookup. Your vector-based implementation has O(1) insert but O(n) lookup. If you kept your vector sorted, you could have O(log n) insert and lookup. It’s a bit more work, butstd::binary_search()
gets you 99% of the way there.