So I wrote this class template
for a custom container. Basically, it stores elements and their occurences. When an element is pushed into the container, if it already exists, all it does is increment its counter.
#pragma once
#include <map>
#include <string>
//
// Template class for a tocc, or Thing Occurence Counter Container
//
template <class T>
class tocc
{
public:
tocc();
//
// Increments the count of that item or creates a new pair if it doesn't exist
//
void push(const T& item)
{
// If no the item isn't found, create new pair with a count of 1
if(_tocc.find(item) == _tocc.end())
_tocc.insert(std::pair<T, long int>(item, 1));
else
_tocc.at(item)++;
}
//
// Decrements the count of that item and erases it if the count reaches 0
//
void pop(const T& item)
{
// If the item is found
if(_tocc.find(item) != _tocc.end())
(_tocc.at(item) <= 1) ? _tocc.erase(item) : _tocc.at(item)--;
// Do nothing if the item isn't found
}
//
// Gets the count of a particular item
// Returns the count if the item exists, 0 otherwise
//
long int getCount(const T& item)
{
if(_tocc.find(item) == _tocc.end())
return 0;
return _tocc.at(item);
}
//
// Returns true if the item is present in the map, false otherwise
//
bool contains(const T& item)
{
return _tocc.find(item) != _tocc.end();
}
private:
std::map<T, long int> _tocc;
};
Here I put all the whole code in a single "file" for simplicity's sake, though I know it is best practice to separate files into headers and source files (which I normally do).
I'd like to know what can be improved about this implementation.
2 Answers 2
The rules of five/three/zero
Usually, the existence of a single constructor indicates that the others should get implemented too (or explicitly forbidden/deleted). This is called the rule of five (or three, depending on the standard). However, there is only one member in your class, _tocc
, and it has well-defined constructors for all the usual cases.
Here, we should follow the rule of zero: declare and define no constructors at all, as the default constructors will do the right thing. See also CppCoreGuidelines C.20.
Use already known data
While there is no bug in your code, there are some optimization flaws. Let's have a look at tocc<int>::push
, compiled via gcc 9.2 -S -O3
(CompilerExplorer; I had to [[gnu::noinline]]
to keep the assembly sane).
tocc<int>::push(int const&):
sub rsp, 24
mov rdx, QWORD PTR [rdi+16]
mov ecx, DWORD PTR [rsi]
test rdx, rdx
je .L31
lea r8, [rdi+8]
mov rax, rdx
mov rsi, r8
jmp .L32
.L48:
mov rsi, rax
mov rax, QWORD PTR [rax+16]
test rax, rax
je .L33
.L32:
cmp DWORD PTR [rax+32], ecx
jge .L48
mov rax, QWORD PTR [rax+24]
test rax, rax
jne .L32
.L33:
cmp rsi, r8
je .L31
cmp DWORD PTR [rsi+32], ecx
jg .L31
mov rax, r8
jmp .L36
.L50:
mov rax, rdx
mov rdx, QWORD PTR [rdx+16]
test rdx, rdx
je .L49
.L36:
cmp ecx, DWORD PTR [rdx+32]
jle .L50
mov rdx, QWORD PTR [rdx+24]
test rdx, rdx
jne .L36
.L49:
cmp r8, rax
je .L41
cmp ecx, DWORD PTR [rax+32]
jl .L41
add QWORD PTR [rax+40], 1
add rsp, 24
ret
.L31:
mov rsi, rsp
mov DWORD PTR [rsp], ecx
mov QWORD PTR [rsp+8], 1
call std::pair<std::_Rb_tree_iterator<std::pair<int const, long> >, bool> std::_Rb_tree<int, std::pair<int const, long>, std::_Select1st<std::pair<int const, long> >, std::less<int>, std::allocator<std::pair<int const, long> > >::_M_emplace_unique<std::pair<int, long> >(std::pair<int, long>&&)
add rsp, 24
ret
.L41:
mov edi, OFFSET FLAT:.LC0
call std::__throw_out_of_range(char const*)
That's a lot of conditional jumps. 11 jumps depend on test
or cmp
. However, we only have a single comparison. What happens here?
Well, first of all, a call to map<T>::at
isn't free. It always has some additional boundary check, which prevents us from undefined behaviour but exchanges this boon for some additional code and an potential exception.
However, if we're in the second branch in push
, then we already know that there is an element! After all, we found it beforehand:
void push(const T& item)
{
if(_tocc.find(item) == _tocc.end())
_tocc.insert(std::pair<T, long int>(item, 1));
else
_tocc.at(item)++; // find() did not return end()!
}
Instead of at
, we should use the iterator for several reasons:
- we already have the element at hand,
- we're guaranteed to have an element, so the boundary check in
at()
is not necessary, - we don't need to search the element a second time and therefore stay \$\mathcal O(1)\$ instead of \$\mathcal O(\log n)\$
So let's use the iterator instead and let's replace std::pair<T, long int>
with std::make_pair
while we're at it:
void push(const T& item)
{
const auto it = _tocc.find(item);
if(it == _tocc.end()) {
_tocc.insert(std::make_pair(item, 1));
} else {
it->second++;
}
}
What's the new assembly?
tocc<int>::push(int const&):
mov rax, QWORD PTR [rdi+16]
mov edx, DWORD PTR [rsi]
test rax, rax
je .L31
lea rsi, [rdi+8]
mov rcx, rsi
jmp .L32
.L44:
mov rcx, rax
mov rax, QWORD PTR [rax+16]
test rax, rax
je .L33
.L32:
cmp DWORD PTR [rax+32], edx
jge .L44
mov rax, QWORD PTR [rax+24]
test rax, rax
jne .L32
.L33:
cmp rsi, rcx
je .L31
cmp DWORD PTR [rcx+32], edx
jle .L36
.L31:
sub rsp, 24
lea rsi, [rsp+8]
mov DWORD PTR [rsp+8], edx
mov DWORD PTR [rsp+12], 1
call std::pair<std::_Rb_tree_iterator<std::pair<int const, long> >, bool> std::_Rb_tree<int, std::pair<int const, long>, std::_Select1st<std::pair<int const, long> >, std::less<int>, std::allocator<std::pair<int const, long> > >::_M_emplace_unique<std::pair<int, int> >(std::pair<int, int>&&)
add rsp, 24
ret
.L36:
add QWORD PTR [rcx+40], 1
ret
Only 6 conditional jumps, only ~55% the original amount. However, keep in mind that the reduction of asm instructions was not the goal of this section. Instead, we re-used already known values and didn't repeat ourselves (only one find()
call).
The same holds for pop()
's usage of map::erase()
, which can also take an iterator instead of a Key
, but that's left as an exercise.
Documentation and (internal) comments
Good job on the comments! However, keep in mind that Doxygen and other programs use special syntax to discern documentation comments and implementation comments.
Naming
The sole member of our class almost has the same name as your class. This makes it somewhat confusing, as we use _tocc
to actually implement tocc
. Naming is hard, though, and I cannot come up with a nicer name; counter
, key_counter
or key_counter_map
don't have the same ring to it, although the latter is the most descriptive variant.
Interface
At the moment, a user must know the name of all items in the tocc
to check their count afterwards. An iterator interface would be tremendously helpful.
We could even reuse std::map::const_iterator
, if you only want constant iteration:
using iterator_type = std::map<T, long int>::iterator_type;
iterator_type begin() { return _tocc.begin(); }
...
Your push
class has a big chunk of code and does two lookups in the map. It can be replaced with one line of code:
++_tocc[item];
since operator[]
will add a key/value pair if it does not exist, and default initialize the value (0 in the case of long int
).
pop
will do three lookups if the item is found. This can be reduced to one by saving the result of the _tocc.find
call and using the returned iterator in four places.
getCount
will do two lookups, and can also save the result of the find
call and use the iterator.
NeithergetCount
nor contains
modify the tocc
object, so they should be declared const
member functions.
std::multiset
, perhaps we should use the same public interface? That would make it easier to switch between this class and the standard one. \$\endgroup\$