I'm implementing a communications protocol in an embedded system and the protocol uses a subset of the ASN.1 BER (Basic Encoding Rules).
In a nutshell, this class simply represents a TLV triplet (Tag, Length, Value) in which the Tag is some label identifying the type of data, the Length is the length of the Value (in bytes) and the Value is simply a set of bytes.
Because it's going to be running on an embedded system, it will not be using any of the following:
- exceptions
- Runtime Type Identification (RTTI)
- most of the Standard Template Library (STL)
- any of the Boost library components
- C++14 (the compiler's not yet smart enough)
Additionally, there are some restrictions built into the project which are important for understanding the context of this class:
- all Tag values fit in a single 8-bit byte
- the maximum message size is 32k; the user of this class must assure this
- minimizing total memory footprint is very important
- assuring adequate performance (speed) is important
- the underlying microprocessor may be 16-bit, or 32-bit or 64-bit
- the underlying microprocessor may be little-endian or big-endian
- there is a macro (not shown) that correctly defines
SHORT_IS_MORE_THAN_TWO_BYTES
for the target architecture. - the
Ber
class will not be derived from (which is why the destructor is not virtual)
I'm interested in a general review, but particularly for clarity and performance of the code. The test code is only for convenience on a computer; the embedded system has no iostream
support.
bertest.cpp
#include <iostream>
#include <iomanip>
#include "ber.h"
std::ostream& operator<<(std::ostream &out, const Ber &b)
{
unsigned short len = b.totallength(b.msglength());
out << "totallength = " << std::dec << len << '\n';
for (unsigned short i=0; i < len; ++i) {
out << std::setfill('0') << std::setw(2) <<
std::hex << (unsigned short)b[i] << ' ';
if (0xf == (i & 0xf)) out << '\n';
}
return out;
}
int main()
{
uint8_t m1[]{ 71, 72, 73, 74, 75 };
Ber message(0x22, sizeof(m1), m1);
std::cout << message << '\n';
Ber message2('e');
Ber message3 = Ber(0x60) + message;
std::cout << message3 << '\n';
for (int i=0; i < 100; ++i) {
message2 += message;
std::cout << message2 << '\n';
}
std::cout << "tag for message2 is " << (int)message2[0] << "\n";
std::cout << "out-of-bounds byte for message2 is " << (int)message2[1000] << "\n";
}
ber.h
#ifndef BER_H
#define BER_H
/*!
* Basic Encoding Rules class for TLV (Tag, Length, Value) messages
*/
class Ber
{
public:
//! constructor
Ber(uint8_t tag, unsigned short length=0, const uint8_t *message=nullptr);
//! copy constructor
Ber(const Ber& b);
//! move constructor
Ber(Ber &&b);
//! destructor
~Ber();
//! returns the number of bytes required to encode the length
static unsigned short encodedlen(unsigned short n);
//! returns the total number of bytes required to store a Ber object with the indicated message length
static unsigned short totallength(unsigned short len);
//! create a new Ber object from an existing one
Ber operator=(const Ber& rhs);
//! append the passed Ber data to the existing Ber object
Ber &operator+=(const Ber& rhs);
//! return the length of the message for this Ber object
unsigned short msglength() const;
//! return the byte at the indicated offset or NUL if out of range
uint8_t operator[](size_t idx) const;
private:
//! encodes and stores the length at **ptr and updates and returns pointer
static uint8_t *lencode(uint8_t **ptr, unsigned short length);
//! internal data pointer
uint8_t * _data;
};
inline Ber operator+(Ber lhs, const Ber& rhs)
{
lhs += rhs;
return lhs;
}
#endif
ber.cpp
#include <algorithm>
#include "ber.h"
Ber::Ber(uint8_t tag, unsigned short length, const uint8_t *message)
: _data(new uint8_t[totallength(length)])
{
uint8_t *ptr = _data;
*ptr++ = tag;
std::copy(message, message+length, lencode(&ptr, length));
}
Ber::Ber(const Ber& b)
: _data(new uint8_t[totallength(b.msglength())])
{
std::copy(b._data, b._data+totallength(b.msglength()), _data);
}
Ber::Ber(Ber &&b)
: _data(nullptr)
{
std::swap(_data, b._data);
}
Ber::~Ber() {
delete _data;
}
uint8_t *Ber::lencode(uint8_t **ptr, unsigned short length)
{
switch(encodedlen(length)) {
case 1:
*(*ptr)++ = (length & 0x7fu);
break;
case 2:
*(*ptr)++ = 0x81u;
*(*ptr)++ = (length & 0xffu);
break;
case 3:
*(*ptr)++ = 0x82u;
*(*ptr)++ = ((length >> 8) & 0xffu);
*(*ptr)++ = (length & 0xffu);
break;
}
return *ptr;
}
unsigned short Ber::encodedlen(unsigned short n)
{
if (n <= 0x7fu)
return 1;
if (n <= 0xffu)
return 2;
#if SHORT_IS_MORE_THAN_TWO_BYTES
// assert(sizeof(unsigned short)>2);
if (n <= 0xffffu)
return 3;
return 0; // error -- encoded length can't be > 3
#else
// assert(sizeof(unsigned short)==2);
return 3;
#endif
}
unsigned short Ber::totallength(unsigned short len)
{
return 1u+len+encodedlen(len);
}
Ber Ber::operator=(const Ber& rhs)
{
return Ber(rhs);
}
Ber& Ber::operator+=(const Ber& rhs)
{
unsigned short datalen = msglength();
unsigned short rhslen = totallength(rhs.msglength());
unsigned short mynewlen = totallength(datalen+rhslen);
uint8_t *data_copy(new uint8_t[mynewlen]);
uint8_t *ptr = data_copy;
*ptr++ = *_data;
uint8_t *dataptr = &_data[1+encodedlen(datalen)];
ptr = std::copy(dataptr, dataptr + datalen, lencode(&ptr, datalen+rhslen));
std::copy(rhs._data, rhs._data+rhslen, ptr);
std::swap(data_copy, _data);
delete data_copy;
return *this;
}
unsigned short Ber::msglength() const
{
unsigned short len = _data[1] & 0x7f;
if (_data[1] != len) {
switch(len) {
case 2:
len = (_data[2] << 8) | _data[3];
break;
case 1:
len = _data[2];
}
}
return len;
}
uint8_t Ber::operator[](size_t idx) const
{
return (idx > totallength(msglength())) ? '0円' : _data[idx];
}
3 Answers 3
- You said
Ber
will not be derived from. Consider writingclass Ber final{...}
to make inheriting fromBer
a compilation error. - Remove useless comments such as
//! constructor
. Either people already know that from looking at the code or they do not understand the comment. - Flesh out existing comments such as
//! append the passed Ber data to the existing Ber object
foroperator +=
. What is Ber data? The message? If you append data from oneBer
-object to another, did you effectively concatenate the messages or replace them? If it is not totally obvious what a user defined operator does it is better to use a named member function instead. Ber(uint8_t tag, unsigned short length=0, const uint8_t *message=nullptr);
allows calling it likeBer ber(42, 8);
which should be a compilation error since one should not have a nonzero length andnullptr
as the message. You can fix that with minimal code duplication by writing 2 constructors,Ber(uint8_t tag, unsigned short length, const uint8_t *message);
andBer(uint8_t tag) : Ber(tag, 0, nullptr){}
.(削除) YourMy bad. Thanks to glampert for pointing it out.operator +
behaves different than normal which is bad. Usually you have an expression such asa = b + c;
wherea
gets a value andb
andc
are not changed. Your operator, however, changesb
, so you could just writeb + c;
which looks like it has no effect, but instead it does the same asb += c;
. (削除ここまで)new []
goes together withdelete []
, you only usedelete
in your destructor. Consider usingstd::unique_ptr<uint8_t[]>
to manage the memory for you.- Replace
// assert(sizeof(unsigned short)>2);
with something likestatic_assert(sizeof(unsigned short)>2, "Macro SHORT_IS_MORE_THAN_TWO_BYTES failed");
. lencode
will have a less scary and error prone implementation if you change the declaration ofptr
touint8 *&ptr
.
I did not look at encodedlen
, msglength
and totallength
in detail, but feel that it can be expressed simpler. Also without really checking I have a feeling that if you append a message and then make a copy, the appended message is gone.
-
\$\begingroup\$
operator +
will work properly. If you look closely, the first param is passed by value, so only the local copy is changed. \$\endgroup\$glampert– glampert2014年10月19日 01:52:11 +00:00Commented Oct 19, 2014 at 1:52 -
\$\begingroup\$ The comments are intended at least in part for automatic documentation generation using Doxygen which is configured to complain about undocumented public members; hence the short comments such as
//! constructor
. \$\endgroup\$Edward– Edward2014年10月22日 02:21:55 +00:00Commented Oct 22, 2014 at 2:21
You have used an instance of Yoda notation in your test code (inside operator <<
):
if (0xf == (i & 0xf)) out << '\n';
This is an arcane habit that should be avoided.
You also have uses of C-style casts in your test code. Don't use those. Use the appropriate C++ cast operators, i.e: static_cast
, reinterpret_cast
, etc.
Names starting with _
. Take a read on this thread. _data
is not technically illegal, but I would just avoid using that notation altogether.
Your use of new
and delete
is wrong! Data allocated with new
is freed with delete
. Data allocated with new[]
is freed with delete[]
. Those are two different functions. In some runtimes, you should get a crash if trying to free data allocated with new[]
using delete
. In your case, however, you should just switch to a smart pointer and don't bother managing memory by hand.
Personal suggestion:
I find tightlypacked
names a lot harder to read. Your methods encodedlen()
, totallength()
and others would be easier on the eyes if written using camelCase
or snake_case
notation:
encoded_length();
total_length();
etc...
As per request in the comments, here is one way to adjust operator +=
to use std::unique_ptr
. This assumes that _data
is also a unique_ptr
:
Ber& Ber::operator+=(const Ber& rhs)
{
unsigned short datalen = msglength();
unsigned short rhslen = totallength(rhs.msglength());
unsigned short mynewlen = totallength(datalen + rhslen);
std::unique_ptr<uint8_t[]> data_copy(new uint8_t[mynewlen]);
uint8_t * ptr = data_copy.get();
*ptr++ = *_data;
uint8_t *dataptr = &_data[1 + encodedlen(datalen)];
ptr = std::copy(dataptr, dataptr + datalen, lencode(&ptr, datalen + rhslen));
std::copy(rhs._data.get(), rhs._data.get() + rhslen, ptr);
std::swap(data_copy, _data);
return *this;
}
If there are any silly mistakes in there I apologise, this was not compiled. The changes are minimal, since unique_ptr
overloads the subscript operator []
and the dereference operator *
, so that remains unchanged. If you need to grab the underlaying pointer to the base of the array, the get()
method is provided. Cleanup is automatic once leaving the scope. So the explicit call to delete
is removed.
-
\$\begingroup\$ Good call on
delete[]
-- it was an error I overlooked. As for the "arcane habit," putting the constant before a==
operator avoids a silent but erroneousx = 5
whenx == 5
was intended. IMHO, it should be encouraged rather than avoided. As for a smart pointer, I had usedstd::unique_ptr
but found it unwieldy within theoperator+=
. Maybe you could show how you'd write it? I could be overlooking something. \$\endgroup\$Edward– Edward2014年10月19日 04:03:14 +00:00Commented Oct 19, 2014 at 4:03 -
\$\begingroup\$ @Edward, If at the current date your compiler is not emitting a warning for something like
if (x = 5)
, then it is time to re-evaluate your choice;)
. Both GCC and Clang will warn for that in the default warning level. \$\endgroup\$glampert– glampert2014年10月19日 04:11:52 +00:00Commented Oct 19, 2014 at 4:11 -
\$\begingroup\$ I'll see to add an example with
unique_ptr
. \$\endgroup\$glampert– glampert2014年10月19日 04:12:15 +00:00Commented Oct 19, 2014 at 4:12 -
2\$\begingroup\$ I don't see any reason to avoid Yoda notation, and you haven't provided any justification for doing so. \$\endgroup\$200_success– 200_success2014年10月19日 08:47:34 +00:00Commented Oct 19, 2014 at 8:47
-
2\$\begingroup\$ @glampert: Sure,
<=
comparisions are something different, I use Yoda strictly for equality or non-equality (and most often withmemcmp
orstrcmp
). \$\endgroup\$user52292– user522922014年10月19日 13:59:17 +00:00Commented Oct 19, 2014 at 13:59
Not much left to review
Your static totallength
and encodedlen
looks like helpers that should in my opinion be private
, have a bit different names (already pointed out in other review) and there could possibly be non-static public
members taking no parametr for this.
A bit about the design, you have mentioned:
- the underlying microprocessor may be 16-bit, or 32-bit or 64-bit
- the underlying microprocessor may be little-endian or big-endian
How are you going to solve that? What I am missing is some operator <<
(or push_back
or append
) that would be used for building the packet. That could requiere a whole different class if you really need to minimize the total memory footprint, but I would not care for one more int
or size_t
member called e.g. position
.
Packet& operator << (unsigned v) {
reserve(4);
_data[position++] = v >> 24;
_data[position++] = v >> 16;
_data[position++] = v >> 8;
_data[position++] = v;
}
Note: Maybe I did not get it right and BER is not designed for complex packets, but for simple tagged values. In that case, all could be done in constructor.
Here you can see I have renamed your class from Ber
to Packet
, while using some reserve
method that would handle possible reallocation. The data is always encoded as big-endian (and packed), independent on the MCU architecture. (Note: Some prefer to use data_
instead of _data
, but the later is allowed within classes, forbidden in global space).
The same could be done for reading, possibly creating two other classes: PacketReader
and PacketWriter
, but you can have it both embedded in the Packet
(or class Ber
).
The way you have it now, you have to create all the data at once (while handling the big/little endian) and pass to constructor:
Ber(uint8_t tag, unsigned short length=0, const uint8_t *message=nullptr)
What I am proposing is to have it this way:
auto pkt = Packet(MSG_TAG) << (byte)data1 << (word)data2 << (uint)data3;
auto pkt2 = Packet(MSG_TAG, 1+2+4) << ....the same as above - space reserved ahead
auto pkt3 = Packet::construct(MSG_TAG, (byte)data1, ....);
Much easier to work with (the last one needs variadic templates, may not be accessible in your compiler).
Finally, my proposals for naming:
Ber -> Packet
totallength -> total_length or full_length
msglength -> length (most used I think) or payload_length / data_length or message_length
lencode -> encode_length
encodedlen -> length_size (well, this one is hard to properly name)
-
\$\begingroup\$ Your guess about the
Ber
class only handling simple tagged values is correct. Its only (possibly) multibyte value responsibility is the length encoding. Other classes are responsible for creating the message portion, and so are responsible for the endian-ness of their data. As for naming, I actually originally hadmsglength
calledlength
but found that the ambiguity led to misuse. Alsoencodedlen
was originallylenlen
which was awful. It is hard to properly name! I like your other naming suggestions and will probably use them. Thanks! \$\endgroup\$Edward– Edward2014年10月19日 13:11:50 +00:00Commented Oct 19, 2014 at 13:11 -
\$\begingroup\$ The question here is which one you use more? full_length or data_length? One of them could be shortened, but if it is not clear, use longer name for both. ... still I would think again about the coding-responsibility. It would be nice to have it inside (and make it real packet) with all the formatting I have suggested (
operator <<
). \$\endgroup\$user52292– user522922014年10月19日 13:19:30 +00:00Commented Oct 19, 2014 at 13:19
operator +=
is creating some packet chain? It seems to produce[tag1][len1][load1][tag2][len2][load2]
. Did I get it right? \$\endgroup\$Ber
instances. So it would be[tag1][len1+len2][msg1][tag2][len2][msg2]
. Typical use would be to start with an emptyBer
instance (just a tag) and then add multipleBer
instances to it. \$\endgroup\$