I am a C++ programmer with limited experience.
Supposing I want to use an STL map
to store and manipulate some data, I would like to know if there is any meaningful difference (also in performance) between those 2 data structure approaches:
Choice 1:
map<int, pair<string, bool> >
Choice 2:
struct Ente {
string name;
bool flag;
}
map<int, Ente>
Specifically, is there any overhead using a struct
instead of a simple pair
?
3 Answers 3
Choice 1 is ok for small "used only once" things. Essentially std::pair
is still a struct.
As stated by this comment choice 1 will lead to really ugly code somewhere down the rabbit hole like thing.second->first.second->second
and no one really wants to decipher that.
Choice 2 is better for everything else, because it is easier to read what the meaning of the things in the map are. It is also more flexible if you want to change the data (for example when Ente suddenly needs another flag). Performance should not be an issue here.
Performance:
It depends.
In your particular case there will be no performance difference because the two will be similarly laid out in memory.
In a very specific case (if you were using an empty struct as one of the data members) then the std::pair<>
could potentially make use of Empty Base Optimization (EBO) and have a lower size than the struct equivalent. And lower size generally means higher performance:
struct Empty {};
struct Thing { std::string name; Empty e; };
int main() {
std::cout << sizeof(std::string) << "\n";
std::cout << sizeof(std::tuple<std::string, Empty>) << "\n";
std::cout << sizeof(std::pair<std::string, Empty>) << "\n";
std::cout << sizeof(Thing) << "\n";
}
Prints: 32, 32, 40, 40 on ideone.
Note: I am not aware of any implementation who actually uses the EBO trick for regular pairs, however it is generally used for tuples.
Readability:
Apart from micro-optimizations, however, a named structure is more ergonomic.
I mean, map[k].first
is not that bad while get<0>(map[k])
is barely intelligible. Contrast with map[k].name
which immediately indicates what we are reading from.
It's all the more important when the types are convertible to one another, since swapping them inadvertently becomes a real concern.
You might also want to read about Structural vs Nominal Typing. Ente
is a specific type that can only be operated on by things that expect Ente
, anything that can operate on std::pair<std::string, bool>
can operate on them... even when the std::string
or bool
does not contain what they expect, because std::pair
has no semantics associated with it.
Maintenance:
In terms of maintenance, pair
is the worst. You cannot add a field.
tuple
fairs better in that regard, as long as you append the new field all existing fields are still accessed by the same index. Which is as inscrutable as before but at least you don't need to go and update them.
struct
is the clear winner. You can add fields wherever you feel like it.
In conclusion:
pair
is the worst of both worlds,tuple
may have a slight edge in a very specific case (empty type),- use
struct
.
Note: if you use getters, then you can use the empty base trick yourself without the clients having to know about it as in struct Thing: Empty { std::string name; }
; which is why Encapsulation is the next topic you should concern yourself with.
-
3You cannot use EBO for pairs, if you are following the Standard. Elements of pair are stored in members
first
andsecond
, there is no place for Empty Base Optimisation to kick in.Revolver_Ocelot– Revolver_Ocelot2017年03月24日 15:47:28 +00:00Commented Mar 24, 2017 at 15:47 -
2@Revolver_Ocelot: Well, you cannot write a C++
pair
that would use EBO, but a compiler could provide a built-in. Since those are supposed to be members, however, it may be observable (checking their addresses, for example) in which case it would not be conforming.Matthieu M.– Matthieu M.2017年03月24日 15:52:26 +00:00Commented Mar 24, 2017 at 15:52 -
1C++20 adds
[[no_unique_address]]
, which enables the equivalent of EBO for members.underscore_d– underscore_d2019年01月01日 21:45:39 +00:00Commented Jan 1, 2019 at 21:45
Pair shines the most when used as the return type of a function together with destructured assignment using std::tie and C++17's structured binding. Using std::tie:
struct Ente {/*...*/};
std::map<int, Ente> map;
auto inserted_position = map.end();
auto was_inserted = false;
std::tie(inserted_position, was_inserted) = map.emplace(1, Ente{});
if (!was_inserted) {
//handle insertion error
}
Using C++17's structured binding:
struct Ente {/*...*/};
std::map<int, Ente> map;
auto [inserted_position, was_inserted] = map.emplace(1, Ente{});
if (!was_inserted) {
//handle insertion error
}
A bad example of usage of a std::pair (or tuple) would be something like this:
using player_data = std::tuple<std::string, uint64_t, double>;
player_data player{};
/* ... */
auto health = std::get<2>(player);
/* ... */
because it is not clear when calling std::get<2>(player_data) what is stored at position index 2. Remember readability and making it obvious for the reader what the code is doing is important. Consider that this is much more readable:
struct player_data
{
std::string name;
uint64_t player_id;
double current_health;
};
player_data player{};
/* ... */
auto health = player.current_health;
/* ... */
In general you should think about std::pair and std::tuple as ways to return more than 1 object from a function. The rule of thumb that I use (and have seen many others use as well) is that objects returned in a std::tuple or std::pair are only "related" within the context of making a call to a function that returns them or in the context of data structure that links them together (e.g. std::map uses std::pair for its storage type). If the relationship exists elsewhere in your code you should use a struct.
Related sections of the Core Guidelines:
std::pair
is a struct.std::pair
is a template.std::pair<string, bool>
is a struct.pair
is entirely devoid of semantics. Nobody reading your code (including you in the future) will know thate.first
is the name of something unless you explicitly point it out. I am a firm believer in thatpair
was a very poor and lazy addition tostd
, and that when it was conceived nobody thought "but some day, everybody is going to use this for everything that is two things, and nobody will know what anybody's code means".map
iterators aren't valid exceptions. ("first" = key and "second" = value... really,std
? Really?)