1
\$\begingroup\$

So I wrote this class template for a custom container. Basically, it stores elements and their occurences. When an element is pushed into the container, if it already exists, all it does is increment its counter.

#pragma once
#include <map>
#include <string>
//
// Template class for a tocc, or Thing Occurence Counter Container
//
template <class T>
class tocc
{
public:
 tocc();
 //
 // Increments the count of that item or creates a new pair if it doesn't exist
 //
 void push(const T& item)
 {
 // If no the item isn't found, create new pair with a count of 1
 if(_tocc.find(item) == _tocc.end())
 _tocc.insert(std::pair<T, long int>(item, 1));
 else
 _tocc.at(item)++;
 }
 //
 // Decrements the count of that item and erases it if the count reaches 0
 //
 void pop(const T& item)
 {
 // If the item is found
 if(_tocc.find(item) != _tocc.end())
 (_tocc.at(item) <= 1) ? _tocc.erase(item) : _tocc.at(item)--;
 // Do nothing if the item isn't found
 }
 //
 // Gets the count of a particular item
 // Returns the count if the item exists, 0 otherwise
 //
 long int getCount(const T& item)
 {
 if(_tocc.find(item) == _tocc.end())
 return 0;
 return _tocc.at(item);
 }
 //
 // Returns true if the item is present in the map, false otherwise
 //
 bool contains(const T& item)
 {
 return _tocc.find(item) != _tocc.end();
 }
private:
 std::map<T, long int> _tocc;
};

Here I put all the whole code in a single "file" for simplicity's sake, though I know it is best practice to separate files into headers and source files (which I normally do).

I'd like to know what can be improved about this implementation.

asked Feb 20, 2020 at 13:22
\$\endgroup\$
1
  • \$\begingroup\$ Since we're implementing a std::multiset, perhaps we should use the same public interface? That would make it easier to switch between this class and the standard one. \$\endgroup\$ Commented Feb 20, 2020 at 13:33

2 Answers 2

4
\$\begingroup\$

The rules of five/three/zero

Usually, the existence of a single constructor indicates that the others should get implemented too (or explicitly forbidden/deleted). This is called the rule of five (or three, depending on the standard). However, there is only one member in your class, _tocc, and it has well-defined constructors for all the usual cases.

Here, we should follow the rule of zero: declare and define no constructors at all, as the default constructors will do the right thing. See also CppCoreGuidelines C.20.

Use already known data

While there is no bug in your code, there are some optimization flaws. Let's have a look at tocc<int>::push, compiled via gcc 9.2 -S -O3 (CompilerExplorer; I had to [[gnu::noinline]] to keep the assembly sane).

tocc<int>::push(int const&):
 sub rsp, 24
 mov rdx, QWORD PTR [rdi+16]
 mov ecx, DWORD PTR [rsi]
 test rdx, rdx
 je .L31
 lea r8, [rdi+8]
 mov rax, rdx
 mov rsi, r8
 jmp .L32
.L48:
 mov rsi, rax
 mov rax, QWORD PTR [rax+16]
 test rax, rax
 je .L33
.L32:
 cmp DWORD PTR [rax+32], ecx
 jge .L48
 mov rax, QWORD PTR [rax+24]
 test rax, rax
 jne .L32
.L33:
 cmp rsi, r8
 je .L31
 cmp DWORD PTR [rsi+32], ecx
 jg .L31
 mov rax, r8
 jmp .L36
.L50:
 mov rax, rdx
 mov rdx, QWORD PTR [rdx+16]
 test rdx, rdx
 je .L49
.L36:
 cmp ecx, DWORD PTR [rdx+32]
 jle .L50
 mov rdx, QWORD PTR [rdx+24]
 test rdx, rdx
 jne .L36
.L49:
 cmp r8, rax
 je .L41
 cmp ecx, DWORD PTR [rax+32]
 jl .L41
 add QWORD PTR [rax+40], 1
 add rsp, 24
 ret
.L31:
 mov rsi, rsp
 mov DWORD PTR [rsp], ecx
 mov QWORD PTR [rsp+8], 1
 call std::pair<std::_Rb_tree_iterator<std::pair<int const, long> >, bool> std::_Rb_tree<int, std::pair<int const, long>, std::_Select1st<std::pair<int const, long> >, std::less<int>, std::allocator<std::pair<int const, long> > >::_M_emplace_unique<std::pair<int, long> >(std::pair<int, long>&&)
 add rsp, 24
 ret
.L41:
 mov edi, OFFSET FLAT:.LC0
 call std::__throw_out_of_range(char const*)

That's a lot of conditional jumps. 11 jumps depend on test or cmp. However, we only have a single comparison. What happens here?

Well, first of all, a call to map<T>::at isn't free. It always has some additional boundary check, which prevents us from undefined behaviour but exchanges this boon for some additional code and an potential exception.

However, if we're in the second branch in push, then we already know that there is an element! After all, we found it beforehand:

void push(const T& item)
{
 if(_tocc.find(item) == _tocc.end()) 
 _tocc.insert(std::pair<T, long int>(item, 1));
 else
 _tocc.at(item)++; // find() did not return end()!
}

Instead of at, we should use the iterator for several reasons:

  • we already have the element at hand,
  • we're guaranteed to have an element, so the boundary check in at() is not necessary,
  • we don't need to search the element a second time and therefore stay \$\mathcal O(1)\$ instead of \$\mathcal O(\log n)\$

So let's use the iterator instead and let's replace std::pair<T, long int> with std::make_pair while we're at it:

void push(const T& item)
{
 const auto it = _tocc.find(item);
 if(it == _tocc.end()) { 
 _tocc.insert(std::make_pair(item, 1));
 } else {
 it->second++;
 }
}

What's the new assembly?

tocc<int>::push(int const&):
 mov rax, QWORD PTR [rdi+16]
 mov edx, DWORD PTR [rsi]
 test rax, rax
 je .L31
 lea rsi, [rdi+8]
 mov rcx, rsi
 jmp .L32
.L44:
 mov rcx, rax
 mov rax, QWORD PTR [rax+16]
 test rax, rax
 je .L33
.L32:
 cmp DWORD PTR [rax+32], edx
 jge .L44
 mov rax, QWORD PTR [rax+24]
 test rax, rax
 jne .L32
.L33:
 cmp rsi, rcx
 je .L31
 cmp DWORD PTR [rcx+32], edx
 jle .L36
.L31:
 sub rsp, 24
 lea rsi, [rsp+8]
 mov DWORD PTR [rsp+8], edx
 mov DWORD PTR [rsp+12], 1
 call std::pair<std::_Rb_tree_iterator<std::pair<int const, long> >, bool> std::_Rb_tree<int, std::pair<int const, long>, std::_Select1st<std::pair<int const, long> >, std::less<int>, std::allocator<std::pair<int const, long> > >::_M_emplace_unique<std::pair<int, int> >(std::pair<int, int>&&)
 add rsp, 24
 ret
.L36:
 add QWORD PTR [rcx+40], 1
 ret

Only 6 conditional jumps, only ~55% the original amount. However, keep in mind that the reduction of asm instructions was not the goal of this section. Instead, we re-used already known values and didn't repeat ourselves (only one find() call).

The same holds for pop()'s usage of map::erase(), which can also take an iterator instead of a Key, but that's left as an exercise.

Documentation and (internal) comments

Good job on the comments! However, keep in mind that Doxygen and other programs use special syntax to discern documentation comments and implementation comments.

Naming

The sole member of our class almost has the same name as your class. This makes it somewhat confusing, as we use _tocc to actually implement tocc. Naming is hard, though, and I cannot come up with a nicer name; counter, key_counter or key_counter_map don't have the same ring to it, although the latter is the most descriptive variant.

Interface

At the moment, a user must know the name of all items in the tocc to check their count afterwards. An iterator interface would be tremendously helpful.

We could even reuse std::map::const_iterator, if you only want constant iteration:

using iterator_type = std::map<T, long int>::iterator_type;
iterator_type begin() { return _tocc.begin(); }
...
answered Feb 20, 2020 at 19:34
\$\endgroup\$
2
\$\begingroup\$

Your push class has a big chunk of code and does two lookups in the map. It can be replaced with one line of code:

++_tocc[item];

since operator[] will add a key/value pair if it does not exist, and default initialize the value (0 in the case of long int).

pop will do three lookups if the item is found. This can be reduced to one by saving the result of the _tocc.find call and using the returned iterator in four places.

getCount will do two lookups, and can also save the result of the find call and use the iterator.

NeithergetCount nor contains modify the tocc object, so they should be declared const member functions.

answered Feb 20, 2020 at 20:52
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.