Proper practice using enums in header files

Question 1

Long time coder, but very very new to C++. I'm trying to put together my first real-sized project and figuring out what I need to change about my script-kiddie style of C++ to get there. Below is my working code.

#include <math.h>
#include <vector>
#include <iostream>
#include <stdexcept>
enum class Metric { Cosine, Euclidean};
double cosine(std::vector<double> A, std::vector<double> B, bool similarity){
 double mul = 0.0;
 double d_a = 0.0;
 double d_b = 0.0;
 std::vector<double>::iterator A_iter = A.begin();
 std::vector<double>::iterator B_iter = B.begin();
 for ( ; A_iter != A.end(); A_iter++ , B_iter++ ) {
 mul += *A_iter * *B_iter;
 d_a += *A_iter * *A_iter;
 d_b += *B_iter * *B_iter;
 }
 if (d_a == 0.0f || d_b == 0.0f) {
 throw std::logic_error("Vectors must have a length greater than zero");
 }
 double cos_similarity = mul / sqrt(d_a * d_b);
 if (similarity == false) {
 return acos(cos_similarity) / M_PI;
 } else {
 return cos_similarity;
 }
}
double euclidean(std::vector<double> A, std::vector<double>B, bool similarity) {
 double sq_dist = 0.0;
 std::vector<double>::iterator A_iter = A.begin();
 std::vector<double>::iterator B_iter = B.begin();
 for ( ; A_iter != A.end(); A_iter++, B_iter++) {
 sq_dist += pow(*A_iter - *B_iter, 2);
 }
 if (similarity == false) {
 return sqrt(sq_dist);
 } else {
 return 1.0/sqrt(sq_dist);
 }
}
double distance(std::vector<double> A, std::vector<double> B, Metric metric, bool similarity = false) {
 if (A.size() != B.size()) {
 throw std::logic_error("A and B must be the same size!");
 }
 if (A.size() < 1) {
 throw std::logic_error("Empty vectors belong to no metric space");
 }
 switch (metric) {
 case Metric::Cosine:
 return cosine(A, B, similarity);
 case Metric::Euclidean:
 return euclidean(A, B, similarity);
 }
}
double similarity(std::vector<double> A, std::vector<double> B, Metric metric) {
 return distance(A, B, metric, true);
}

My first question is around exactly how I should construct the header file. Specifically curious about the right way to handle the enum. Secondly, any general feedback, things that I could clean up, cut out, etc... would be greatly appreciated.

Question 2

Can you explain what your code is supposed to do? As there's no main(), there's very little to go on. See How to Ask for some advice to give your question a better title.

Question 3

@TobySpeight I appreciate the comment, and while feedback on general aspects of this code are appreciated, the majority of this is just math, which I'm not looking for help with. Unfortunately there's not a specific aspect of this from a performance perspective that I'm looking to improve. This is really a question of just using best practices. Any main here would orient feedback toward an aspect of this code that I'm not particularly curious about. There's an interesting meta question here, but as it stand I really believe I've asked an appropriate question.

Question 4

How to Ask says "State what your code does in your title, not your main concerns about it." That's what I'm asking you to to do. A main() would be nice (for example, if it included test cases), but it's not required.

Question 5

@TobySpeight, honestly this probably belongs on meta, but if I had titled my question: "calculating cosine and euclidean distance" it would be a complete bait and switch. It's probably just because it's a pretty math-centric question, but a title like that would almost certainly have ensured that I got a worse response and worse answers to the code that I actually wanted reviewed.

Question 6

Prefer C++ headers

If you #include <cmath> instead of <math.h>, the mathematical functions will be safely and unambiguously in the std namespace.

Pass read-only parameters by const reference

Instead of copying the input vectors, it's usually more efficient to pass const references to them:

double cosine(const std::vector<double>& A, const std::vector<double>& B, bool similarity);
double euclidean(const std::vector<double>& A, const std::vector<double>& B, bool similarity);
double distance(const std::vector<double>& A, const std::vector<double>& B,
 Metric metric, bool similarity = false);

Use `auto` to deduce types

Instead of writing

std::vector<double>::iterator A_iter = A.begin();
std::vector<double>::iterator B_iter = B.begin();

We can ask the compiler to deduce the correct type:

auto A_iter = A.begin();
auto B_iter = B.begin();

This also makes it easier if you ever want to change the type of A and B (perhaps even to a template argument type). In fact, by changing to const references, we already changed the type (to std::vector<double>::const_iterator), but auto isolates us from that.

Use the right kind of exception

You could be more specific, and use std::invalid_argument for the error type here:

if (d_a == 0.0 || d_b == 0.0) {
 throw std::invalid_argument("Vectors must have a magnitude greater than zero");
}

I've changed the wording above, because length usually means the number of elements when we're using standard collections.

I also changed the zero constant to be a double, the same as d_a and d_b, since it will get promoted for the comparison anyway. Only use float when you really need the reduced precision.

Don't compare booleans against `false`

This is hard to read:

if (similarity == false) {
 // negative case
} else {
 // positive case
}

You can re-write it as

if (similarity) {
 // positive case
} else {
 // negative case
}

This will normally produce identical code, but the cognitive overhead is reduced.

Avoid non-standard extensions

Although many implementations define a constant called M_PI, this isn't specified by any standard. Thankfully it's easy to make your own value, using the knowledge that tan(1⁄4π) is 1:

static const auto pi = 4*std::atan(1);

Avoid `std::pow(x, 2)`

(削除) To square a number, it's simpler and more efficient to multiply it by itself. pow() is very general (handling non-integer powers), and usually works by multiplication of logarithms. (削除ここまで)

Even though std::pow() has overloads for integer powers, I find that for squaring, it's easier to read a simple multiplication rather than a function call. It's worth introducing a temporary to avoid that:

 //sq_dist += pow(*A_iter - *B_iter, 2);
 auto dist = *A_iter - *B_iter;
 sq_dist += dist * dist;

Don't fall off the end of non-void functions

Here, we're assuming that we're always passed a valid enum constant:

switch (metric) {
 case Metric::Cosine:
 return cosine(A, B, similarity);
 case Metric::Euclidean:
 return euclidean(A, B, similarity);
}
// implicit return

Robust code checks that we don't flow out of the switch:

switch (metric) {
 case Metric::Cosine:
 return cosine(A, B, similarity);
 case Metric::Euclidean:
 return euclidean(A, B, similarity);
}
throw std::invalid_argument("Invalid metric");

Invalid enum values shouldn't happen, but it costs little to be defensive against a mistaken cast.

Design issues

It's not clear whether cosine() and euclidean() are intended to be part of the public interface. If they are not, then give them internal linkage (e.g. by use of the anonymous namespace, or with the static keyword), and consider moving the similarity parameter out of them.

I prefer to avoid booleans that change the meaning of functions like that; see if you agree when you see my full example.

Here's a full version with the above issues all addressed (and assuming that cosine() and euclidean() are intended to be internal detail).

Declarations

This is what I'd put in the header file (let's call it similarity.h):

#ifndef SIMILARITY_H
#define SIMILARITY_H
#include <vector>
enum class Metric { Cosine, Euclidean };
double distance(const std::vector<double>&, const std::vector<double>&, Metric);
double similarity(const std::vector<double>&, const std::vector<double>&, Metric);
#endif

Implementation

Having included the header, here's the C++ file:

#include "similarity.h"
#include <cmath>
#include <stdexcept>
namespace {
 double cosine_distance(const std::vector<double>& a, const std::vector<double>& b)
 {
 double mul = 0.0;
 double d_a = 0.0;
 double d_b = 0.0;
 auto ia = a.begin();
 auto ib = b.begin();
 for (; ia != a.end(); ia++, ib++) {
 mul += *ia * *ib;
 d_a += *ia * *ia;
 d_b += *ib * *ib;
 }
 if (d_a == 0.0 || d_b == 0.0) {
 throw std::invalid_argument("Vectors must have a magnitude greater than zero");
 }
 return mul / std::sqrt(d_a * d_b);
 }
 double euclidean_distance(const std::vector<double>& a, const std::vector<double>& b)
 {
 double sq_dist = 0.0;
 auto ia = a.begin();
 auto ib = b.begin();
 for (; ia != a.end(); ia++, ib++) {
 auto dist = *ia - *ib;
 sq_dist += dist * dist;
 }
 return std::sqrt(sq_dist);
 }
}
double distance(const std::vector<double>& a, const std::vector<double>& b, Metric metric)
{
 if (a.size() != b.size())
 throw std::invalid_argument("a and b must be the same size!");
 if (a.size() < 1)
 throw std::invalid_argument("Empty vectors belong to no metric space");
 switch (metric) {
 case Metric::Cosine:
 return cosine_distance(a, b);
 case Metric::Euclidean:
 return euclidean_distance(a, b);
 }
 throw std::invalid_argument("Invalid metric");
}
double similarity(const std::vector<double>& a, const std::vector<double>& b, Metric metric)
{
 static const auto pi = 4*std::atan(1);
 switch (metric) {
 case Metric::Cosine:
 return std::acos(distance(a, b, metric)) / pi;
 case Metric::Euclidean:
 return 1.0 / distance(a, b, metric);
 }
 throw std::invalid_argument("Invalid metric");
}

Note that I include our own header first. That's a helpful approach, to improve our chances of discovering if it accidentally depends on something it doesn't explicitly include.

Question 7

There is nothing wrong with std::pow(x, 2), it has overloads for integer exponents. As you can see here, the compiler will happily optimize that to a single multiplication, even at just -O1.

Question 8

Thanks for the heads-up there @G.Sliepen. I'd complete overlooked that, as a former C programmer.

Question 9

You should not copy all functions into your header file. Pick the functions you need to link with other classes in your application.

For example, if you don't need the cosine function elsewhere, leave it in your source file. However, I believe you will need to define similarity in your header file, otherwise the rest of your application can't compute the similarity.

A possibility would look like:

#ifndef MY_HEADER_FILE
#define MY_HEADER_FILE
enum class Metric { Cosine, Euclidean};
// Add your comments
double euclidean(std::vector<double> A, std::vector<double>B, bool similarity);
// Add your comments
double similarity(std::vector<double> A, std::vector<double> B, Metric metric); 
#endif

There is nothing wrong how you define the enum. You may also want to wrap everything in a struct, that's how OO should work.

Question 10

Functions that are used internally but not mentioned in the header file ought to be declared static so they really are private to that translation unit.

Question 11

Here, we're using an enum solely to select between different implementations. If we're not using it for anything else, then it may make more sense to pass a class object that provides those implementations.

We start with an abstract class that defines the interface:

#include <vector>
class metric_space
{
public:
 virtual double distance(const std::vector<double>&,
 const std::vector<double>&) const = 0;
 virtual double similarity(const std::vector<double>&,
 const std::vector<double>&) const = 0;
 virtual ~metric_space() = default;
};

The virtual ... = 0 are what make this class an abstract one. We can't create objects of this class, but only derive from it.

We add a virtual destructor, which is a good practice for all classes intended to be used as base classes. It ensures that the correct sub-class destructor is executed when a base-class reference is used. I'll not go into more details, as you can read more about this in any good C++ book or web-site.

Now we define a pair of classes that derive from this interface. This is very straightforward; we'll use the override keyword to ensure we get error messages if we fail to exactly match the base-class method signatures.

class cosine_metric_space : public metric_space
{
public:
 double distance(const std::vector<double>&,
 const std::vector<double>&) const override;
 double similarity(const std::vector<double>&,
 const std::vector<double>&) const override;
};
class euclidean_metric_space : public metric_space
{
public:
 double distance(const std::vector<double>&,
 const std::vector<double>&) const override;
 double similarity(const std::vector<double>&,
 const std::vector<double>&) const override;
};

That's the public header complete. It looks a bit more complicated than the API that we previously had, but it's still quite usable. Now, we need an instance of one or the other metric space classes instead of an enum value, and we use the method call syntax (e.g. space.distance(a, b)) rather than passing another value.

The implementation is where we have a real win, even here with just two methods and two different metric spaces (as soon as we increase that number, the benefits increase dramatically).

We start with the required headers, and a private utility function for some shared functionality (I'm basing this code from my already-improved version in my other answer:

#include "similarity.h"
#include <cmath>
#include <stdexcept>
namespace {
 void verify_metric_vectors(const std::vector<double>& a,
 const std::vector<double>& b) {
 if (a.size() != b.size())
 throw std::invalid_argument("a and b must be the same size!");
 if (a.size() < 1)
 throw std::invalid_argument("Empty vectors belong to no metric space");
 }
}

Then the implementations of the first class's methods:

// Cosine metric space
 
double cosine_metric_space::distance(const std::vector<double>& a,
 const std::vector<double>& b) const
{
 verify_metric_vectors(a, b);
 double mul = 0.0;
 double d_a = 0.0;
 double d_b = 0.0;
 auto ia = a.begin();
 auto ib = b.begin();
 for (; ia != a.end(); ia++, ib++) {
 mul += *ia * *ib;
 d_a += *ia * *ia;
 d_b += *ib * *ib;
 }
 if (d_a == 0.0 || d_b == 0.0) {
 throw std::invalid_argument("Vectors must have a magnitude greater than zero");
 // or perhaps we should remove this test, and just let the
 // division yield the correct ±∞ here?
 }
 return mul / std::sqrt(d_a * d_b);
}
double cosine_metric_space::similarity(const std::vector<double>& a,
 const std::vector<double>& b) const
{
 static const auto pi = 4*std::atan(1);
 return std::acos(distance(a, b)) / pi;
}

Followed by the second class's methods:

// Euclidean metric space
double euclidean_metric_space::distance(const std::vector<double>& a,
 const std::vector<double>& b) const
{
 verify_metric_vectors(a, b);
 double sq_dist = 0.0;
 auto ia = a.begin();
 auto ib = b.begin();
 for (; ia != a.end(); ia++, ib++) {
 auto dist = *ia - *ib;
 sq_dist += dist * dist;
 }
 return std::sqrt(sq_dist);
}
double euclidean_metric_space::similarity(const std::vector<double>& a,
 const std::vector<double>& b) const
{
 return 1.0 / distance(a, b);
}

See how the two implementations are now no longer entwined with each other, and we can reason about each one individually? And if we added a third, we could do that without any risk of breaking the existing ones.

Toby Speight Toby Speight 87.1k14 gold badges104 silver badges322 bronze badges · Accepted Answer · 2017-03-28 13:16:05Z

Prefer C++ headers

If you #include <cmath> instead of <math.h>, the mathematical functions will be safely and unambiguously in the std namespace.

Pass read-only parameters by const reference

Instead of copying the input vectors, it's usually more efficient to pass const references to them:

double cosine(const std::vector<double>& A, const std::vector<double>& B, bool similarity);
double euclidean(const std::vector<double>& A, const std::vector<double>& B, bool similarity);
double distance(const std::vector<double>& A, const std::vector<double>& B,
 Metric metric, bool similarity = false);

Use `auto` to deduce types

Instead of writing

std::vector<double>::iterator A_iter = A.begin();
std::vector<double>::iterator B_iter = B.begin();

We can ask the compiler to deduce the correct type:

auto A_iter = A.begin();
auto B_iter = B.begin();

This also makes it easier if you ever want to change the type of A and B (perhaps even to a template argument type). In fact, by changing to const references, we already changed the type (to std::vector<double>::const_iterator), but auto isolates us from that.

Use the right kind of exception

You could be more specific, and use std::invalid_argument for the error type here:

if (d_a == 0.0 || d_b == 0.0) {
 throw std::invalid_argument("Vectors must have a magnitude greater than zero");
}

I've changed the wording above, because length usually means the number of elements when we're using standard collections.

I also changed the zero constant to be a double, the same as d_a and d_b, since it will get promoted for the comparison anyway. Only use float when you really need the reduced precision.

Don't compare booleans against `false`

This is hard to read:

if (similarity == false) {
 // negative case
} else {
 // positive case
}

You can re-write it as

if (similarity) {
 // positive case
} else {
 // negative case
}

This will normally produce identical code, but the cognitive overhead is reduced.

Avoid non-standard extensions

Although many implementations define a constant called M_PI, this isn't specified by any standard. Thankfully it's easy to make your own value, using the knowledge that tan(1⁄4π) is 1:

static const auto pi = 4*std::atan(1);

Avoid `std::pow(x, 2)`

(削除) To square a number, it's simpler and more efficient to multiply it by itself. pow() is very general (handling non-integer powers), and usually works by multiplication of logarithms. (削除ここまで)

Even though std::pow() has overloads for integer powers, I find that for squaring, it's easier to read a simple multiplication rather than a function call. It's worth introducing a temporary to avoid that:

 //sq_dist += pow(*A_iter - *B_iter, 2);
 auto dist = *A_iter - *B_iter;
 sq_dist += dist * dist;

Don't fall off the end of non-void functions

Here, we're assuming that we're always passed a valid enum constant:

switch (metric) {
 case Metric::Cosine:
 return cosine(A, B, similarity);
 case Metric::Euclidean:
 return euclidean(A, B, similarity);
}
// implicit return

Robust code checks that we don't flow out of the switch:

switch (metric) {
 case Metric::Cosine:
 return cosine(A, B, similarity);
 case Metric::Euclidean:
 return euclidean(A, B, similarity);
}
throw std::invalid_argument("Invalid metric");

Invalid enum values shouldn't happen, but it costs little to be defensive against a mistaken cast.

Design issues

It's not clear whether cosine() and euclidean() are intended to be part of the public interface. If they are not, then give them internal linkage (e.g. by use of the anonymous namespace, or with the static keyword), and consider moving the similarity parameter out of them.

I prefer to avoid booleans that change the meaning of functions like that; see if you agree when you see my full example.

Here's a full version with the above issues all addressed (and assuming that cosine() and euclidean() are intended to be internal detail).

Declarations

This is what I'd put in the header file (let's call it similarity.h):

#ifndef SIMILARITY_H
#define SIMILARITY_H
#include <vector>
enum class Metric { Cosine, Euclidean };
double distance(const std::vector<double>&, const std::vector<double>&, Metric);
double similarity(const std::vector<double>&, const std::vector<double>&, Metric);
#endif

Implementation

Having included the header, here's the C++ file:

#include "similarity.h"
#include <cmath>
#include <stdexcept>
namespace {
 double cosine_distance(const std::vector<double>& a, const std::vector<double>& b)
 {
 double mul = 0.0;
 double d_a = 0.0;
 double d_b = 0.0;
 auto ia = a.begin();
 auto ib = b.begin();
 for (; ia != a.end(); ia++, ib++) {
 mul += *ia * *ib;
 d_a += *ia * *ia;
 d_b += *ib * *ib;
 }
 if (d_a == 0.0 || d_b == 0.0) {
 throw std::invalid_argument("Vectors must have a magnitude greater than zero");
 }
 return mul / std::sqrt(d_a * d_b);
 }
 double euclidean_distance(const std::vector<double>& a, const std::vector<double>& b)
 {
 double sq_dist = 0.0;
 auto ia = a.begin();
 auto ib = b.begin();
 for (; ia != a.end(); ia++, ib++) {
 auto dist = *ia - *ib;
 sq_dist += dist * dist;
 }
 return std::sqrt(sq_dist);
 }
}
double distance(const std::vector<double>& a, const std::vector<double>& b, Metric metric)
{
 if (a.size() != b.size())
 throw std::invalid_argument("a and b must be the same size!");
 if (a.size() < 1)
 throw std::invalid_argument("Empty vectors belong to no metric space");
 switch (metric) {
 case Metric::Cosine:
 return cosine_distance(a, b);
 case Metric::Euclidean:
 return euclidean_distance(a, b);
 }
 throw std::invalid_argument("Invalid metric");
}
double similarity(const std::vector<double>& a, const std::vector<double>& b, Metric metric)
{
 static const auto pi = 4*std::atan(1);
 switch (metric) {
 case Metric::Cosine:
 return std::acos(distance(a, b, metric)) / pi;
 case Metric::Euclidean:
 return 1.0 / distance(a, b, metric);
 }
 throw std::invalid_argument("Invalid metric");
}

Note that I include our own header first. That's a helpful approach, to improve our chances of discovering if it accidentally depends on something it doesn't explicitly include.

There is nothing wrong with std::pow(x, 2), it has overloads for integer exponents. As you can see here, the compiler will happily optimize that to a single multiplication, even at just -O1.
Thanks for the heads-up there @G.Sliepen. I'd complete overlooked that, as a former C programmer.

Stack Exchange Network

Proper practice using enums in header files

3 Answers 3

Prefer C++ headers

Pass read-only parameters by const reference

Use `auto` to deduce types

Use the right kind of exception

Don't compare booleans against `false`

Avoid non-standard extensions

Avoid `std::pow(x, 2)`

Don't fall off the end of non-void functions

Design issues

Declarations

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Proper practice using enums in header files

3 Answers 3

Prefer C++ headers

Pass read-only parameters by const reference

Use auto to deduce types

Use the right kind of exception

Don't compare booleans against false

Avoid non-standard extensions

Avoid std::pow(x, 2)

Don't fall off the end of non-void functions

Design issues

Declarations

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `auto` to deduce types

Don't compare booleans against `false`

Avoid `std::pow(x, 2)`