This is an implementation of Huffman Coding that works on ASCII values. I simplified main
to show an example of user input to the program. I only removed handling of non-printable ASCII characters, as that is not something I need reviewed. Initially, I tried to use std::priority_queue<std::unique_ptr<node>>
, but you cannot easily remove these pointers due to std::priority_queue::top
returning a const reference, so I had to use std::shared_ptr
. More or less, I just want to know if there are any glaring issue with how I used the standard library or guideline violations.
#include <iostream>
#include <map>
#include <memory>
#include <queue>
#include <string>
#include <unordered_map>
#include <utility>
#include <optional>
#include <sstream>
/// node
/// node in huffman tree
struct node {
node(char c, int freq, std::shared_ptr<node> left, std::shared_ptr<node> right)
: c {c}
, freq {freq}
, left {std::move(left)}
, right {std::move(right)}
{}
node(int freq, std::shared_ptr<node> left, std::shared_ptr<node> right)
: freq {freq}
, left {std::move(left)}
, right {std::move(right)}
{}
std::optional<char> c;
int freq;
std::shared_ptr<node> left;
std::shared_ptr<node> right;
};
/// build_huffman_codings
/// traverses a huffman tree and finds encodings for all characters
/// \param root root of huffman tree or subtree
/// \param accumulator prefix of huffman code gathered before root
/// \param codings mapping of character to its final encoding
void build_huffman_codings(const std::shared_ptr<node>& root,
std::map<char, std::string>& codings,
std::string accumulator = "") {
// leaf node adds to codings
if (!root->left && !root->right) {
codings[root->c.value()] = accumulator;
return;
}
// left branch
if (root->left) {
build_huffman_codings(root->left, codings, accumulator + "0");
}
// right branch
if (root->right) {
build_huffman_codings(root->right, codings, accumulator + "1");
}
}
/// huffman
/// compute huffman codings for given frequencies of characters
/// \param freq mapping of character and frequencies
/// \return mapping of character to a binary representation
std::map<char, std::string> huffman(const std::map<char, int>& freq) {
// pre-allocate nodes
std::vector<std::shared_ptr<node>> nodes;
for (const auto& pair : freq) {
nodes.emplace_back(std::make_shared<node>(pair.first, pair.second, nullptr, nullptr));
}
// compare freq of node pointers
auto compare_greater = [] (const auto& p1, const auto& p2) {
return p1->freq > p2->freq;
};
// priority queue holds nodes in increasing order by frequency
std::priority_queue<std::shared_ptr<node>,
std::vector<std::shared_ptr<node>>,
decltype(compare_greater)>
queue {compare_greater, std::move(nodes)};
const std::size_t size = queue.size();
// repeat size - 1 times
for (std::size_t i = 1; i < size; ++i) {
// remove first two nodes
std::shared_ptr<node> x = queue.top();
queue.pop();
std::shared_ptr<node> y = queue.top();
queue.pop();
// add new node
queue.emplace(std::make_shared<node>(x->freq + y->freq, x, y));
}
std::map<char, std::string> codings;
build_huffman_codings(queue.top(), codings);
return codings;
}
int main() {
// store character with its frequency
std::map<char, int> frequencies;
// example user input - real implementation handles non-printable ascii
std::stringstream freq_txt {
"a 5\n"
"b 25\n"
"c 7\n"
"d 15\n"
"e 4\n"
"f 12\n"
};
char c;
int f;
while (freq_txt) {
freq_txt >> c;
freq_txt >> f;
frequencies[c] = f;
}
// huffman code table
const auto codings = huffman(frequencies);
for (const auto& pair : codings) {
std::cout << pair.first << ' ' << pair.second << '\n';
}
}
1 Answer 1
At first glance, I see a a lot of good usage of the standard library. So, the remarks I have are rather small:
- in huffman(), you create a vector
nodes
, as you know exactly the amount of elements to store into it, I would reserve it. This reduces memory usage slightly and improves performance. (nodes.reserve(freq.size())
) - In the same function, you don't seem to have handling for
size == 0
, most likely because of the simplification? - a small optimization in the for-loop, you can use
std::move()
onx
andy
. - In build_huffman_condings, you have
std::string accumulator =""
. It's better to write= std::string{}
as this doesn't requirestrlen
to be called.
On the more high level, I am wondering about the used datastructures, especially std::map
. I know std::unordered_map
ain't an ideal hash-map, though, it does perform better than std::map
if you don't need ordering. I don't see in your code that need, hence I would recommend replacing it. (You can also use abseil ... for a better implementation, or as the values are small, boosts flat_map)
In short, your usage of the standard library looks fine.
-
\$\begingroup\$ Thanks for responding. I used map over unordered_map because my calling code needed to output the table with ascii values in order. \$\endgroup\$Brady Dean– Brady Dean2020年04月26日 12:59:24 +00:00Commented Apr 26, 2020 at 12:59
-
\$\begingroup\$ Even than, if you know the characters, depending on the length of the text, it might be beneficial to use hash_map and at the end loop over all characters and locate. Though that all depends on the use cases. \$\endgroup\$JVApen– JVApen2020年04月26日 15:52:41 +00:00Commented Apr 26, 2020 at 15:52
-
\$\begingroup\$ Since
queue.top()
returns aconst shared_ptr&
- and even if I madex
andy
const, should I still usemove
on them? I'm getting warnings to not move const variables. I'm assuming it will handle the reference counts correctly and the warning may not apply in this situation. \$\endgroup\$Brady Dean– Brady Dean2020年04月27日 02:08:36 +00:00Commented Apr 27, 2020 at 2:08 -
\$\begingroup\$ You indeed make a copy of the shared_ptr when creating variable x, which is needed. However, when passing it along to the new node, you could move it. \$\endgroup\$JVApen– JVApen2020年04月27日 10:54:45 +00:00Commented Apr 27, 2020 at 10:54