De/Serialize uint32_t from/to ASCII to use it in Arduino code

Question 1

Despite the intention is to use it in a C++ project I put the c tag also because there is nothing C++-only (except references :)) in the code (I did not compile it by a C compiler though).

Preliminaries

I'm working on an automation system the main function of which is to control positional table. The table has 5 degrees of freedom each of which has a stepper motor attached to. Also I have the Arduino Nano board to control the motors. The communication between Arduino and PC is carried out by means of commands. Some commands contain numbers. An example of a command :

mz>100

where

m - means move
z - degree of freedom (motion along the z axis)
> - means clockwise
100 - number of steps

These are some objectives useful to better understand the context:

Other numeric fields are also positive so uint32_t.
I want to be able to easily write and debug commands manually so ASCII.
Also a number can only be at the end (the last token) of a command.

Signatures

Serialization :

bool sr( uint32_t number, uint8_t* buff, const uint8_t size, uint8_t& pos)

number - number to serialize
buff - buffer to serialize to
size - size of buff
pos - number of used bytes to store the result

The role of pos is to be able to use this function in a cascaded maner so having an overload for other types: sr( number, ..., pos ); sr( enum, ..., pos ); ...

Deserialization is similar :

bool dsr( uint32_t& number, const uint8_t* buff, const uint8_t size, uint8_t pos )

pos - initial character of the number in buff

Review Context

This code is intended for a -std=c++17 compiler.

Code

#define MAX_UINT32 0xffffffffU
#define MAX_UINT32_STR "4294967295"
#define BUFF_SIZE 20
bool sr( uint32_t number,
 uint8_t* buff,
 const uint8_t size,
 uint8_t& pos )
{
 uint32_t maxDivisor = 1;
 for( ; number / maxDivisor >= 10; maxDivisor *= 10 ) { }
 for( uint32_t divisor = maxDivisor; divisor > 0; divisor /= 10 )
 {
 if( pos >= size )
 return false;
 uint8_t digit = (number / divisor);
 number = number % divisor;
 buff[pos] = digit + '0';
 pos++;
 }
 return true;
}
bool dsr( uint32_t& number,
 const uint8_t* buff,
 const uint8_t size,
 uint8_t pos )
{
 number = 0;
 uint32_t maxNumberOfTens = MAX_UINT32 / 10;
 uint32_t maxLastDigit = MAX_UINT32 % 10;
 for( ; pos < size; pos++ )
 {
 uint8_t digit = buff[pos];
 if( digit < '0' or digit > '9' )
 return false;
 digit -= '0';
 if( number > maxNumberOfTens )
 return false;
 if( number == maxNumberOfTens and digit > maxLastDigit )
 return false;
 number = number * 10 + digit;
 }
 return true;
}

Test

Tested both in C++ and Arduino :

bool compare( const char* string, const uint8_t* buff, const uint8_t size )
{
 if( size != strlen( string ) )
 return false;
 for( uint8_t i = 0; i < size; ++i )
 if( buff[i] != string[i] )
 return false;
 return true;
}
void testSR( uint32_t input, const char* output )
{
 uint8_t buff[BUFF_SIZE];
 uint8_t pos = 0;
 assert( sr( input, buff, BUFF_SIZE, pos ) );
 assert( pos == strlen(output) );
 assert( compare( output, buff, pos ) );
}
void testDSR( const char* input, const uint32_t output, const bool expected )
{
 uint32_t result;
 uint8_t buff[BUFF_SIZE] = { 0 };
 if( not expected )
 {
 assert( not dsr( result, buff, BUFF_SIZE, 0 ) );
 }
 else
 {
 uint8_t i = 0;
 for( ; i < strlen( input ); ++i )
 buff[i] = input[i];
 assert( dsr( result, buff, i, 0 ) );
 assert( result == output );
 }
}

int main()
{
 testSR( 0U, "0" );
 testSR( MAX_UINT32, MAX_UINT32_STR );
 testSR( 9, "9" );
 testSR( 10, "10" );
 testSR( 999, "999" );
 testSR( 10000, "10000" );
 testSR( 1234567890, "1234567890" );
 testSR( 1234567890, "1234567890" );
 testDSR( "0", 0, true );
 testDSR( MAX_UINT32_STR, MAX_UINT32, true );
 testDSR( "9", 9, true );
 testDSR( "10", 10, true );
 testDSR( "999", 999, true );
 testDSR( "10000", 10000, true );
 testDSR( "1234567890", 1234567890, true );
 testDSR( "-1", 0, false );
 testDSR( "100a100", 100100, false );
 testDSR( "12345678901", 0, false );
 testDSR( "4294967296", 4294967295, false );
}

The sr and dsr functions are compiled to ~ 1.5 KB so not so much :)

Any help will be appreciated ! Thanks in advance !

Question 2

"C++-only (except references)" ===> That makes it non-C code.

Question 3

@Harith, agreed. I think it transforms to C very easy. I also think it can be useful for those who searches for the same in C. Honestly while searching for a solution I found much of C code which wasn't complete in one way or another. If you insist I can remove the c tag :)

Question 4

Could you please explain why any of the standard C++ functions were not chosen?

Question 5

@Incomputable what are those ? If you're talking about the strto... familiy I think I could. What others standard functions did you mean ? Could you provide the names ? Note, I want to use it both at the MC and at the PC levels. More over, I want to serialize other data such as structs and therefore it is better IMHO to have similar signatures, for example. And one more thing, I want it be exactly uint32_t not long or unsigned.

Question 6

@LRDPRDX there is also <charconv>, functions like from_chars and to_chars.

Question 7

design of Public API

bool sr( ... )

That is the Public API that you chose to define. That's what you're telling potential callers about your single responsibility. No comments, and hardly any name, not even as much as ser. A cryptic 2-character identifier can be a good local variable name when surrounded by enough context. But put yourself in the shoes of some poor maintenance engineer who is staring at an ld diagnostic about the global symbol sr not being found. WTF? This is just the wrong name for a public interface, it is too short.

As an engineer who is considering calling into this library routine, how would I know what promise it makes, how could I verify that it did the Right Thing? You haven't described it at a high level at all. It simply does what the low level instructions do, that's the entire contract offered to a caller, it's up to caller to verify those instructions work as intended.

Even if you're not going to run Doxygen, you might offer a /// sentence of explanation, just in case someone else runs it.

 const uint8_t size,
 uint8_t& pos )

Rather than uint8_t, please just call it a size_t. What's relevant is that it's a buffer offset, not that it fits within a byte. Your C++ compiler can readily prove that BUFF_SIZE constrains its range.

On which topic, rather than use a second language (C preprocessor), please write constexpr for such constants.

The choice of 20 is a bit arbitrary, with no comment explaining it. It seems indirectly related to a specification for "3-char + digits" commands. I was expecting 3 + 7 == 10 bytes allocated for a valid command. But maybe commands allow optional whitespace which should be skipped over? Or maybe some components in this system are willing to process a buffer containing an invalid command, which they then reject? In which case we'd want to know about max size of invalid command. The specification seems too loose to be able to effectively reason about correctness of the code.

I'm sad that we're missing some Review Context, like the #include's.

buffer bounds

The (buff, size, pos) tuple isn't used consistently between serialize and deserialize. It makes sense on the serialize side. When deserializing it is surprising to see size < 20, which describes end-of-number (since there's no NUL terminator).

Consider using a NUL terminator. Alternatively, discuss the design rationale in a comment. I'm concerned the lack of symmetry in the API may lead to maintenance engineers accidentally introducing code defects.

buffer description

Consider removing the size parameter, and storing it at the beginning of each buffer.

log base 10

 for( ; number / maxDivisor >= 10; maxDivisor *= 10 ) { }

Would you please Extract Helper? Then we could separately /// document, and unit test, this fragment.

 uint8_t digit = (number / divisor);
 number = number % divisor;

Naïvely this suggests doing the same division operation twice. I imagine the compiler optimized one of them away? If https://godbolt.org says otherwise, you may wish to rephrase this. Similarly, some implementors choose repeated multiplies instead of slow divides when computing logarithm.

We don't NUL terminate the buffer?!? Ok. Then the pos position is really more of a len length.

branch mis-predictions

 if( digit < '0' or digit > '9' )

We don't like isdigit()? OK. Do consider using it, as it is table-driven, therefore it never causes an Intel CPU to mis-predict.

test suite

Thank you kindly for including automated tests. They increase our confidence in the target code.

I like the serialization test, and its edge cases. However, it only tests Happy Path. It does not e.g. offer a too-small buffer and verify a False return. It could also offer a few more cases near 2^32. Not sure why we test 1234567890 twice.

I do like the not expected clause for deserialize testing.

Roundtripping through serialize / deserialize, and looping through generated test values, would be a natural extension to the existing "constant" tests.

This code achieves its design goals. It does not appear to suffer from UB.

I would be willing to delegate or accept maintenance tasks on it.

Question 8

Instead of the helper function, I'd suggest using the standard div() that "computes both the quotient and the remainder of the division of the numerator x by the denominator y."

Question 9

First of all, thank you very much for your answer. Second of all, I should say that I developed this in the context of my project so some things are dependend on the environment I use there. Here is my response. design of public API. Documentation is, of course, to be done. I should have given some contract indeed: one of the points is that buff cannot be NULL. uint8_t as a holder for the position in the buffer. I use this lib (developed also for my project primarily): github.com/LRDPRDX/DataLinkSerialProtocol . About macros instead of constexpr - agreed also...contd

Question 10

I think if I had chosen constexpr constants I would have had to put them in some namespace and so on. I just wanted to focus on the serialization algorithm. Again - I agree with this point. About the buff size: 20 is just a "random" number. Just long enough to store everything :)

Question 11

log base 10. Yes I think I could have used a better approach. I thought about it: at least I could have used subtraction to eliminate the first division (using the resutl of the second division). Yes I don't need to terminate the string with NUL. I didn't understand your statement about the pos at the end of this seciotn, sorry. branch misprediction. Using the isdigit. Yes, probably I should have. test suite. I agree with everything here. Thank you.

Question 12

buffer bounds. I haven't used it yet in the field (I mean my project) and maybe you're right about this. Let me think a bit. Maybe indeed it would be better to store the length in front of the number string.

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Answer 1 · 2024-02-26 23:05:05Z

design of Public API

bool sr( ... )

That is the Public API that you chose to define. That's what you're telling potential callers about your single responsibility. No comments, and hardly any name, not even as much as ser. A cryptic 2-character identifier can be a good local variable name when surrounded by enough context. But put yourself in the shoes of some poor maintenance engineer who is staring at an ld diagnostic about the global symbol sr not being found. WTF? This is just the wrong name for a public interface, it is too short.

As an engineer who is considering calling into this library routine, how would I know what promise it makes, how could I verify that it did the Right Thing? You haven't described it at a high level at all. It simply does what the low level instructions do, that's the entire contract offered to a caller, it's up to caller to verify those instructions work as intended.

Even if you're not going to run Doxygen, you might offer a /// sentence of explanation, just in case someone else runs it.

 const uint8_t size,
 uint8_t& pos )

Rather than uint8_t, please just call it a size_t. What's relevant is that it's a buffer offset, not that it fits within a byte. Your C++ compiler can readily prove that BUFF_SIZE constrains its range.

On which topic, rather than use a second language (C preprocessor), please write constexpr for such constants.

The choice of 20 is a bit arbitrary, with no comment explaining it. It seems indirectly related to a specification for "3-char + digits" commands. I was expecting 3 + 7 == 10 bytes allocated for a valid command. But maybe commands allow optional whitespace which should be skipped over? Or maybe some components in this system are willing to process a buffer containing an invalid command, which they then reject? In which case we'd want to know about max size of invalid command. The specification seems too loose to be able to effectively reason about correctness of the code.

I'm sad that we're missing some Review Context, like the #include's.

buffer bounds

The (buff, size, pos) tuple isn't used consistently between serialize and deserialize. It makes sense on the serialize side. When deserializing it is surprising to see size < 20, which describes end-of-number (since there's no NUL terminator).

Consider using a NUL terminator. Alternatively, discuss the design rationale in a comment. I'm concerned the lack of symmetry in the API may lead to maintenance engineers accidentally introducing code defects.

buffer description

Consider removing the size parameter, and storing it at the beginning of each buffer.

log base 10

 for( ; number / maxDivisor >= 10; maxDivisor *= 10 ) { }

Would you please Extract Helper? Then we could separately /// document, and unit test, this fragment.

 uint8_t digit = (number / divisor);
 number = number % divisor;

Naïvely this suggests doing the same division operation twice. I imagine the compiler optimized one of them away? If https://godbolt.org says otherwise, you may wish to rephrase this. Similarly, some implementors choose repeated multiplies instead of slow divides when computing logarithm.

We don't NUL terminate the buffer?!? Ok. Then the pos position is really more of a len length.

branch mis-predictions

 if( digit < '0' or digit > '9' )

We don't like isdigit()? OK. Do consider using it, as it is table-driven, therefore it never causes an Intel CPU to mis-predict.

test suite

Thank you kindly for including automated tests. They increase our confidence in the target code.

I like the serialization test, and its edge cases. However, it only tests Happy Path. It does not e.g. offer a too-small buffer and verify a False return. It could also offer a few more cases near 2^32. Not sure why we test 1234567890 twice.

I do like the not expected clause for deserialize testing.

Roundtripping through serialize / deserialize, and looping through generated test values, would be a natural extension to the existing "constant" tests.

This code achieves its design goals. It does not appear to suffer from UB.

I would be willing to delegate or accept maintenance tasks on it.

Instead of the helper function, I'd suggest using the standard div() that "computes both the quotient and the remainder of the division of the numerator x by the denominator y."
First of all, thank you very much for your answer. Second of all, I should say that I developed this in the context of my project so some things are dependend on the environment I use there. Here is my response. design of public API. Documentation is, of course, to be done. I should have given some contract indeed: one of the points is that buff cannot be NULL. uint8_t as a holder for the position in the buffer. I use this lib (developed also for my project primarily): github.com/LRDPRDX/DataLinkSerialProtocol . About macros instead of constexpr - agreed also...contd
I think if I had chosen constexpr constants I would have had to put them in some namespace and so on. I just wanted to focus on the serialization algorithm. Again - I agree with this point. About the buff size: 20 is just a "random" number. Just long enough to store everything :)
log base 10. Yes I think I could have used a better approach. I thought about it: at least I could have used subtraction to eliminate the first division (using the resutl of the second division). Yes I don't need to terminate the string with NUL. I didn't understand your statement about the pos at the end of this seciotn, sorry. branch misprediction. Using the isdigit. Yes, probably I should have. test suite. I agree with everything here. Thank you.
buffer bounds. I haven't used it yet in the field (I mean my project) and maybe you're right about this. Let me think a bit. Maybe indeed it would be better to store the length in front of the number string.

Stack Exchange Network

De/Serialize uint32_t from/to ASCII to use it in Arduino code

Preliminaries

Signatures

Review Context

Code

Test

1 Answer 1

design of Public API

buffer bounds

buffer description

log base 10

branch mis-predictions

test suite

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

De/Serialize uint32_t from/to ASCII to use it in Arduino code

Preliminaries

Signatures

Review Context

Code

Test

1 Answer 1

design of Public API

buffer bounds

buffer description

log base 10

branch mis-predictions

test suite

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions