Despite the intention is to use it in a C++ project I put the c
tag also because there is nothing C++-only (except references :)) in the code (I did not compile it by a C compiler though).
Preliminaries
I'm working on an automation system the main function of which is to control positional table. The table has 5 degrees of freedom each of which has a stepper motor attached to. Also I have the Arduino Nano board to control the motors. The communication between Arduino and PC is carried out by means of commands. Some commands contain numbers. An example of a command :
mz>100
where
m
- means movez
- degree of freedom (motion along the z axis)>
- means clockwise100
- number of steps
These are some objectives useful to better understand the context:
- Other numeric fields are also positive so
uint32_t
. - I want to be able to easily write and debug commands manually so ASCII.
- Also a number can only be at the end (the last token) of a command.
Signatures
Serialization :
bool sr( uint32_t number, uint8_t* buff, const uint8_t size, uint8_t& pos)
number
- number to serializebuff
- buffer to serialize tosize
- size ofbuff
pos
- number of used bytes to store the result
The role of pos
is to be able to use this function in a cascaded maner so having an overload for other types: sr( number, ..., pos ); sr( enum, ..., pos ); ...
Deserialization is similar :
bool dsr( uint32_t& number, const uint8_t* buff, const uint8_t size, uint8_t pos )
pos
- initial character of the number inbuff
Review Context
This code is intended for a -std=c++17 compiler.
Code
#define MAX_UINT32 0xffffffffU
#define MAX_UINT32_STR "4294967295"
#define BUFF_SIZE 20
bool sr( uint32_t number,
uint8_t* buff,
const uint8_t size,
uint8_t& pos )
{
uint32_t maxDivisor = 1;
for( ; number / maxDivisor >= 10; maxDivisor *= 10 ) { }
for( uint32_t divisor = maxDivisor; divisor > 0; divisor /= 10 )
{
if( pos >= size )
return false;
uint8_t digit = (number / divisor);
number = number % divisor;
buff[pos] = digit + '0';
pos++;
}
return true;
}
bool dsr( uint32_t& number,
const uint8_t* buff,
const uint8_t size,
uint8_t pos )
{
number = 0;
uint32_t maxNumberOfTens = MAX_UINT32 / 10;
uint32_t maxLastDigit = MAX_UINT32 % 10;
for( ; pos < size; pos++ )
{
uint8_t digit = buff[pos];
if( digit < '0' or digit > '9' )
return false;
digit -= '0';
if( number > maxNumberOfTens )
return false;
if( number == maxNumberOfTens and digit > maxLastDigit )
return false;
number = number * 10 + digit;
}
return true;
}
Test
Tested both in C++ and Arduino :
bool compare( const char* string, const uint8_t* buff, const uint8_t size )
{
if( size != strlen( string ) )
return false;
for( uint8_t i = 0; i < size; ++i )
if( buff[i] != string[i] )
return false;
return true;
}
void testSR( uint32_t input, const char* output )
{
uint8_t buff[BUFF_SIZE];
uint8_t pos = 0;
assert( sr( input, buff, BUFF_SIZE, pos ) );
assert( pos == strlen(output) );
assert( compare( output, buff, pos ) );
}
void testDSR( const char* input, const uint32_t output, const bool expected )
{
uint32_t result;
uint8_t buff[BUFF_SIZE] = { 0 };
if( not expected )
{
assert( not dsr( result, buff, BUFF_SIZE, 0 ) );
}
else
{
uint8_t i = 0;
for( ; i < strlen( input ); ++i )
buff[i] = input[i];
assert( dsr( result, buff, i, 0 ) );
assert( result == output );
}
}
int main()
{
testSR( 0U, "0" );
testSR( MAX_UINT32, MAX_UINT32_STR );
testSR( 9, "9" );
testSR( 10, "10" );
testSR( 999, "999" );
testSR( 10000, "10000" );
testSR( 1234567890, "1234567890" );
testSR( 1234567890, "1234567890" );
testDSR( "0", 0, true );
testDSR( MAX_UINT32_STR, MAX_UINT32, true );
testDSR( "9", 9, true );
testDSR( "10", 10, true );
testDSR( "999", 999, true );
testDSR( "10000", 10000, true );
testDSR( "1234567890", 1234567890, true );
testDSR( "-1", 0, false );
testDSR( "100a100", 100100, false );
testDSR( "12345678901", 0, false );
testDSR( "4294967296", 4294967295, false );
}
The sr
and dsr
functions are compiled to ~ 1.5 KB so not so much :)
Any help will be appreciated ! Thanks in advance !
1 Answer 1
design of Public API
bool sr( ... )
That is the Public API that you chose to define.
That's what you're telling potential callers about your
single responsibility.
No comments, and hardly any name, not even as much as ser
.
A cryptic 2-character identifier can be a good
local variable name when surrounded by enough context.
But put yourself in the shoes of some poor maintenance
engineer who is staring at an ld
diagnostic
about the global symbol sr
not being found.
WTF?
This is just the wrong name for a public interface,
it is too short.
As an engineer who is considering calling into this library routine, how would I know what promise it makes, how could I verify that it did the Right Thing? You haven't described it at a high level at all. It simply does what the low level instructions do, that's the entire contract offered to a caller, it's up to caller to verify those instructions work as intended.
Even if you're not going to run Doxygen,
you might offer a ///
sentence of explanation,
just in case someone else runs it.
const uint8_t size,
uint8_t& pos )
Rather than uint8_t
, please just call it a size_t
.
What's relevant is that it's a buffer offset,
not that it fits within a byte.
Your C++ compiler can readily prove that BUFF_SIZE constrains its range.
On which topic, rather than use a second language (C preprocessor),
please write constexpr
for such constants.
The choice of 20
is a bit arbitrary, with no comment explaining it.
It seems indirectly related to a specification for "3-char + digits"
commands.
I was expecting 3 + 7 == 10
bytes allocated for a valid command.
But maybe commands allow optional whitespace which
should be skipped over?
Or maybe some components in this system
are willing to process a buffer containing an invalid command,
which they then reject?
In which case we'd want to know about max size of invalid command.
The specification seems too loose to be able to
effectively reason about correctness of the code.
I'm sad that we're missing some Review Context, like the #include's.
buffer bounds
The (buff, size, pos) tuple isn't used consistently
between serialize and deserialize.
It makes sense on the serialize side.
When deserializing it is surprising to see size < 20
,
which describes end-of-number (since there's no NUL terminator).
Consider using a NUL terminator. Alternatively, discuss the design rationale in a comment. I'm concerned the lack of symmetry in the API may lead to maintenance engineers accidentally introducing code defects.
buffer description
Consider removing the size
parameter,
and storing it at the beginning of each buffer.
log base 10
for( ; number / maxDivisor >= 10; maxDivisor *= 10 ) { }
Would you please Extract Helper?
Then we could separately ///
document, and unit test, this fragment.
uint8_t digit = (number / divisor);
number = number % divisor;
Naïvely this suggests doing the same division operation twice. I imagine the compiler optimized one of them away? If https://godbolt.org says otherwise, you may wish to rephrase this. Similarly, some implementors choose repeated multiplies instead of slow divides when computing logarithm.
We don't NUL terminate the buffer?!? Ok.
Then the pos
position is really more of a len
length.
branch mis-predictions
if( digit < '0' or digit > '9' )
We don't like isdigit()? OK. Do consider using it, as it is table-driven, therefore it never causes an Intel CPU to mis-predict.
test suite
Thank you kindly for including automated tests. They increase our confidence in the target code.
I like the serialization test, and its edge cases. However, it only tests Happy Path. It does not e.g. offer a too-small buffer and verify a False return. It could also offer a few more cases near 2^32. Not sure why we test 1234567890 twice.
I do like the not expected
clause for deserialize testing.
Roundtripping through serialize / deserialize, and looping through generated test values, would be a natural extension to the existing "constant" tests.
This code achieves its design goals. It does not appear to suffer from UB.
I would be willing to delegate or accept maintenance tasks on it.
-
2\$\begingroup\$ Instead of the helper function, I'd suggest using the standard div() that "computes both the quotient and the remainder of the division of the numerator x by the denominator y." \$\endgroup\$Madagascar– Madagascar2024年02月27日 09:38:58 +00:00Commented Feb 27, 2024 at 9:38
-
\$\begingroup\$ First of all, thank you very much for your answer. Second of all, I should say that I developed this in the context of my project so some things are dependend on the environment I use there. Here is my response. design of public API. Documentation is, of course, to be done. I should have given some contract indeed: one of the points is that
buff
cannot be NULL.uint8_t
as a holder for the position in the buffer. I use this lib (developed also for my project primarily): github.com/LRDPRDX/DataLinkSerialProtocol . About macros instead ofconstexpr
- agreed also...contd \$\endgroup\$LRDPRDX– LRDPRDX2024年02月27日 16:10:04 +00:00Commented Feb 27, 2024 at 16:10 -
\$\begingroup\$ I think if I had chosen
constexpr
constants I would have had to put them in some namespace and so on. I just wanted to focus on the serialization algorithm. Again - I agree with this point. About the buff size:20
is just a "random" number. Just long enough to store everything :) \$\endgroup\$LRDPRDX– LRDPRDX2024年02月27日 16:13:46 +00:00Commented Feb 27, 2024 at 16:13 -
\$\begingroup\$ log base 10. Yes I think I could have used a better approach. I thought about it: at least I could have used subtraction to eliminate the first division (using the resutl of the second division). Yes I don't need to terminate the string with NUL. I didn't understand your statement about the
pos
at the end of this seciotn, sorry. branch misprediction. Using theisdigit
. Yes, probably I should have. test suite. I agree with everything here. Thank you. \$\endgroup\$LRDPRDX– LRDPRDX2024年02月27日 16:20:22 +00:00Commented Feb 27, 2024 at 16:20 -
\$\begingroup\$ buffer bounds. I haven't used it yet in the field (I mean my project) and maybe you're right about this. Let me think a bit. Maybe indeed it would be better to store the length in front of the number string. \$\endgroup\$LRDPRDX– LRDPRDX2024年02月27日 16:30:56 +00:00Commented Feb 27, 2024 at 16:30
c
tag :) \$\endgroup\$strto...
familiy I think I could. What others standard functions did you mean ? Could you provide the names ? Note, I want to use it both at the MC and at the PC levels. More over, I want to serialize other data such as structs and therefore it is better IMHO to have similar signatures, for example. And one more thing, I want it be exactlyuint32_t
notlong
orunsigned
. \$\endgroup\$<charconv>
, functions likefrom_chars
andto_chars
. \$\endgroup\$