Function for splitting an integer into smaller values

Question 1

This is the function I've implemented in an answer to this question. I've tried to make this as simple and idiomatic as possible, using C++11. Could this still be improved in any way?

#include <algorithm>
#include <cstdint>
#include <iostream>
#include <iterator> 
#include <vector>
typedef std::vector<int> SplitValues;
SplitValues splitter(std::int32_t number)
{
 SplitValues values(8);
 for (auto& iter : values)
 {
 iter = static_cast<int>(number & 0xF);
 number >>= 4;
 }
 return values;
}
int main()
{
 std::int32_t number = 432214123;
 SplitValues values = splitter(number);
 std::reverse_copy(values.begin(), values.end(),
 std::ostream_iterator<int>(std::cout, " "));
}

Test run

Questions:

Is SplitValues an appropriate name for a typedef, or does it sound more like a variable?
If I pass the hard-coded number instead of the number variable, will it still be treated as the same type as the parameter?

Question 2

The first problem I see is that you're doing a right shift on a signed integer type (std::int32_t).

If a negative value is passed, the result will be implementation defined. When you're going to treat something as a collection of bits instead of treating the whole thing together as a number, you usually want to use an unsigned type, not signed.

I'd also prefer to get rid of the magic (and inter-related) numbers sprinkled throughout the code, such as the 8, 4 and the 0xf. These three are inter-related--the 8 being the number of 4-bit fields contained in a single 32-bit item, and 0xf being a mask with 4 bits set. I'd prefer to specify only one of these values in one place, and compute the others from that one.

As to the specific questions you've asked:

I don't think SplitValues does a good job of indicating a type. This has been a problem for a long time, and I don't think I have a silver bullet to finally put it to rest. In this case, I'd rather just avoid it completely (about which, more below).
If you pass a literal to a function instead of passing a variable, that literal will start with some type--in this case, since it's digits and can fit in an int, its type will be int. That will then be converted to the function's parameter type (std::int32_t). In this case, at least on a typical compiler where int is a 32-bit type, that "conversion" will really be a NOP because the source and target types are basically identical.

As to how to avoid the problem with having to define names, C++ has two mechanisms that can help quite a bit: templates (especially function templates) and auto. Both of these can deduce a result type based on input types, to avoid having to explicitly specify types throughout much of the code.

I'd also generally prefer to write splitter as a generic algorithm. While we're at it, its name should be changed from a noun to a verb. Non-functor classes should have nouns for names, but functors and functions do things, so their names should normally be verbs.

template <classs T, class OutIt>
void split(T number, OutIt result) {
 unsigned bits = 4;
 unsigned mask = (1<<bits) - 1;
 unsigned iter_count = sizeof(T) * CHAR_BIT / bits;
 for (unsigned i=0; i<iter_count; i++) {
 *result++ = number & mask;
 number >>= bits;
 }
}

This part is a little longer than the code in your answer, but most of that extra length stems from specifying the number of bits, then computing the mask and iteration count from that shift value instead of having interrelated magic numbers throughout.

With this code, we can change the type of the input (e.g., to a 64-bit unsigned long long) or the shift count (e.g., to 8 bits) independently of each other, without having to patch the rest of the code to compensate.

Although it's a rather minor point, all else being equal, I prefer to avoid creating variables with default values, then filling in the real values later. This is fairly innocuous when the values involved are at least initialized as they are in a vector, but it's still better to just initialize values when possible.

That leads us to the calling code. Since we changed the function to look like a standard algorithm, we need to change the calling code to suit.

int main() { 
 std::deque<int> values;
 split(432214123, std::front_inserter(values));
 std::copy(values.begin(), values.end(), std::ostream_iterator<int>(std::cout, " "));
}

Of course, there are a lot of other ways this job could be handled, but I guess that's enough for the moment anyway.

Question 3

So I should be using std::uint32_t here, then?

Question 4

@Jamal: Yes (probably anyway).

Question 5

Should I still cast to an int, or should it be a signed type from <cstdint>?

Question 6

@Jamal: Doesn't really matter--you're grabbing only 4 bits at a time, and you're guaranteed that every standard integer type can represent at least 7 bits worth of non-negative numbers.

Question 7

@Jamal - no, you should not use std::uint32_t here. Use unsigned long, which will exist on all systems and will always have at least 32 bits. There is no need to restrict your code to systems that have a built-in 32-bit integral type.

Question 8

I don't think that splitting into a vector is a right way to go. I'd rather opt for an operator[], along the lines of

struct Splitter {
 unsigned long value;
 Splitter(unsigned long value): value(value) {}
 const unsigned long operator[](int i) const {
 return (value >> (i*4)) & 0x0f;
 }
};

with all necessary bells and whistles added upon request.

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges144 bronze badges · Accepted Answer · 2014-06-18 04:37:22Z

The first problem I see is that you're doing a right shift on a signed integer type (std::int32_t).

If a negative value is passed, the result will be implementation defined. When you're going to treat something as a collection of bits instead of treating the whole thing together as a number, you usually want to use an unsigned type, not signed.

I'd also prefer to get rid of the magic (and inter-related) numbers sprinkled throughout the code, such as the 8, 4 and the 0xf. These three are inter-related--the 8 being the number of 4-bit fields contained in a single 32-bit item, and 0xf being a mask with 4 bits set. I'd prefer to specify only one of these values in one place, and compute the others from that one.

As to the specific questions you've asked:

I don't think SplitValues does a good job of indicating a type. This has been a problem for a long time, and I don't think I have a silver bullet to finally put it to rest. In this case, I'd rather just avoid it completely (about which, more below).
If you pass a literal to a function instead of passing a variable, that literal will start with some type--in this case, since it's digits and can fit in an int, its type will be int. That will then be converted to the function's parameter type (std::int32_t). In this case, at least on a typical compiler where int is a 32-bit type, that "conversion" will really be a NOP because the source and target types are basically identical.

As to how to avoid the problem with having to define names, C++ has two mechanisms that can help quite a bit: templates (especially function templates) and auto. Both of these can deduce a result type based on input types, to avoid having to explicitly specify types throughout much of the code.

I'd also generally prefer to write splitter as a generic algorithm. While we're at it, its name should be changed from a noun to a verb. Non-functor classes should have nouns for names, but functors and functions do things, so their names should normally be verbs.

template <classs T, class OutIt>
void split(T number, OutIt result) {
 unsigned bits = 4;
 unsigned mask = (1<<bits) - 1;
 unsigned iter_count = sizeof(T) * CHAR_BIT / bits;
 for (unsigned i=0; i<iter_count; i++) {
 *result++ = number & mask;
 number >>= bits;
 }
}

This part is a little longer than the code in your answer, but most of that extra length stems from specifying the number of bits, then computing the mask and iteration count from that shift value instead of having interrelated magic numbers throughout.

With this code, we can change the type of the input (e.g., to a 64-bit unsigned long long) or the shift count (e.g., to 8 bits) independently of each other, without having to patch the rest of the code to compensate.

Although it's a rather minor point, all else being equal, I prefer to avoid creating variables with default values, then filling in the real values later. This is fairly innocuous when the values involved are at least initialized as they are in a vector, but it's still better to just initialize values when possible.

That leads us to the calling code. Since we changed the function to look like a standard algorithm, we need to change the calling code to suit.

int main() { 
 std::deque<int> values;
 split(432214123, std::front_inserter(values));
 std::copy(values.begin(), values.end(), std::ostream_iterator<int>(std::cout, " "));
}

Of course, there are a lot of other ways this job could be handled, but I guess that's enough for the moment anyway.

Should I still cast to an int, or should it be a signed type from <cstdint>?
@Jamal: Doesn't really matter--you're grabbing only 4 bits at a time, and you're guaranteed that every standard integer type can represent at least 7 bits worth of non-negative numbers.
@Jamal - no, you should not use std::uint32_t here. Use unsigned long, which will exist on all systems and will always have at least 32 bits. There is no need to restrict your code to systems that have a built-in 32-bit integral type.

Stack Exchange Network

Function for splitting an integer into smaller values

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Function for splitting an integer into smaller values

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions