After reading this article, I've decided to take a crack at implementing a "safe_strcpy" function that would remove some of the errors that the author speaks about. Obviously, my safe_strcpy()
function is inspired by his. Below, I've created a simple class that houses a std::array
of one of the many different kinds of strings. My safe_strcpy()
functions rely on always null-terminating the array given.
I have a few worries about my implementation:
The lines with
out[N - 1] = static_cast<T>(0)
seem wrong when the underlying character type iswchar_t
. In this sense, I want the array to terminate with something to the effect ofL"0"
. Will this work? If not, how can I fix it?Should I be concerned with any performance issues or any "gotchas" that I didn't think of?
Is this a viable solution to the
safe_strcpy
conundrum? In other words, would you use this class in production code? What kind of improvements or extensions should this class use?
#include <cstdlib>
#include <cstdio>
#include <cstring>
#include <array>
#include <iostream>
#include <typeinfo>
using namespace std;
template <size_t N, class T = char>
struct char_array
{
typedef T value_type;
typedef const T const_value_type;
typedef T& reference;
typedef const T& const_reference;
typedef std::array<T, N> container;
typedef typename container::size_type size_type;
typedef typename container::iterator iterator;
typedef typename container::const_iterator const_iterator;
typedef typename container::reverse_iterator reverse_iterator;
typedef typename container::const_reverse_iterator const_reverse_iterator;
/* Constructor */
char_array()
{
static_assert((is_same<unsigned char, T>::value || is_same<char, T>::value ||
is_same<wchar_t, T>::value || is_same<signed char, T>::value ||
is_same<char16_t, T>::value || is_same<char32_t, T>::value) &&
N > 0,
"char_array initialized with invalid type or size");
}
/* Destructor */
~char_array() {}
/* Copy constructor and assignment */
char_array(const char_array<N,T> &other) = default;
char_array& operator=(const char_array<N,T> &other) = default;
/* Move constructor and assignment */
char_array(char_array<N,T> &&other) { buf_ = std::move(other.buf_); }
char_array& operator=(char_array<N,T> &&other)
{
buf_ = std::move(other.buf_);
return *this;
}
/* Delegate members of internal container */
iterator begin() { return buf_.begin(); }
const_iterator begin() const { return buf_.begin(); }
iterator end() { return buf_.end(); }
const_iterator end() const { return buf_.end(); }
const_iterator cbegin() const { return buf_.cbegin(); }
const_iterator cend() const { return buf_.cend(); }
reverse_iterator rbegin() { return buf_.rbegin(); }
const_reverse_iterator rbegin() const { return buf_.rbegin(); }
reverse_iterator rend() { return buf_.rend(); }
const_reverse_iterator rend() const { return buf_.rend(); }
const_reverse_iterator crbegin() const { return buf_.crbegin(); }
const_reverse_iterator crend() const { return buf_.crend(); }
reference operator[](size_type i) { return buf_[i]; }
const_reference operator[](size_type i) const { return buf_[i]; }
size_type size() const { return buf_.size(); }
size_type max_size() const { return buf_.max_size(); }
bool empty() const { return buf_.empty(); }
reference at(size_type n) { return buf_.at(n); }
const_reference at(size_type n) const { return buf_.at(n); }
reference front(size_type n) { return buf_.front(n); }
const_reference front(size_type n) const { return buf_.front(n); }
reference back(size_type n) { return buf_.back(n); }
const_reference back(size_type n) const { return buf_.back(n); }
value_type* data() { return buf_.data(); }
const value_type* data() const { return buf_.data(); }
void fill(const value_type &val) { buf_.fill(val); }
void swap(char_array &x) { buf_.swap(x.buf_); }
private:
container buf_;
};
/*
Allow "template magic" to automatically deduce
the size of the given array. NOTE: this only works with
statically-allocated arrays (e.g. arrays with a size known
at compile time)
@param[in] - out - a reference to the array
@param[in] - src - the source string
Usage:
char buf[10];
safe_strcpy(buf, "This is a string > 10 chars");
*/
template <class T, size_t N>
void safe_strcpy(T (&out)[N], const T *src)
{
static_assert((is_same<wchar_t, T>::value || is_same<signed char, T>::value ||
is_same<unsigned char, T>::value || is_same<char, T>::value ||
is_same<char16_t, T>::value || is_same<char32_t, T>::value) &&
N > 0, "Incompatible type for safe_strcpy");
memcpy(out, src, N * sizeof(T));
out[N - 1] = static_cast<T>(0);
}
template <class T, size_t N, class S>
void safe_strcpy(char_array<N,T> &out, const S *src)
{
static_assert(is_same<T, S>::value, "Source content type different from char_array destination");
memcpy(out.data(), src, N * sizeof(T));
out[N - 1] = static_cast<T>(0);
}
template <class Os, size_t N, class T>
Os& operator<<(Os &os, const char_array<N,T> &arr)
{
for (const auto &c : arr) os << c;
return os;
}
int main()
{
wchar_t buf[10];
safe_strcpy(buf, L"really long string that's greater than 10 freaking characters!!!!");
wcout << buf << "\n";
char_array<10> buffer;
safe_strcpy(buffer, "This is a string longer than 10 chars");
cout << "Max size is: " << buffer.max_size() << "\n";
cout << buffer << "\n";
}
3 Answers 3
Break out your type trait
You have this large amount of is_same
checks in a couple places. Break it out in its own trait, and call it something like is_char
:
template <class T> is_char : std::false_type { };
template <> is_char<wchar_t> : std::true_type { };
template <> is_char<char> : std::true_type { };
template <> is_char<signed char> : std::true_type { };
...
What way, in your char_array
, you can just have:
static_assert(is_char<T>::value, "invalid type, expected a char");
static_assert(N > 0, "invalid size, expected N > 0");
Furthermore, those asserts don't need to be in the constructor, they can be in the body of the class.
Lots of code Repetition
Pretty much all of your char_array
code is just duplicating what std::array
does. What does char_array
actually give you? It costs you aggregate-initialization, and that's not nothing. Ultimateyl, it's really just a std::array<>
with some extra conditions on T
and N
.
But we can just use those same conditions to SFINAE the safe_strcpy
:
template <class T, size_t N, class S,
class = std::enable_if_t<is_char<T>::value &&
(N > 0) &&
is_same<T, S>::value>>
void safe_strcpy(std::array<T,N> &out, const S *src)
{
memcpy(out.data(), src, N * sizeof(T));
out[N - 1] = 0;
}
And now we have no need for char_array
at all.
If you really want to keep the type, I'd just inherit:
template <size_t N, class T=char>
struct char_array : std::array<T, N>
{
static_assert(is_char<T>::value, "invalid type, expected a char");
static_assert(N > 0, "invalid size, expected N > 0");
};
memcpy to std::copy
Instead of copying raw bytes, and having to remember to multiply, you can just use std::copy
- which will under-the-hood do the same thing anyway, but will be less error-prone. So instead of:
memcpy(out.data(), src, N * sizeof(T));
Write:
std::copy(src, src+N, out.data());
-
\$\begingroup\$ These points are good, but unfortunately your versions of the code still all overrun the source. \$\endgroup\$Ben Voigt– Ben Voigt2015年12月22日 16:44:06 +00:00Commented Dec 22, 2015 at 16:44
-
\$\begingroup\$ @BenVoigt That's already covered in a different review. \$\endgroup\$Barry– Barry2015年12月22日 16:57:53 +00:00Commented Dec 22, 2015 at 16:57
Gotcha
There is a potential for illegal access. While the code guarantees that the destination buffer is not overflown, it will however access beyond the source buffer if it is smaller than the destination.
memcpy
is just a wrong tool here.Will I use it?
Given a very limited use case of this approach (static buffers only) and a template size specialization (a serious potential for code bloat) I will probably stay with the good old
strncpy
. Rumors of its un-safety are highly exaggerated.
-
\$\begingroup\$ You're right about your first point. What would you recommend instead? \$\endgroup\$Bizkit– Bizkit2015年12月22日 01:23:19 +00:00Commented Dec 22, 2015 at 1:23
-
3\$\begingroup\$ strncpy is absolutely disastrously unsafe. The design is just horrible. When an overflow happens, you are left with something that isn't a C string. \$\endgroup\$gnasher729– gnasher7292015年12月22日 11:20:01 +00:00Commented Dec 22, 2015 at 11:20
One hard cold truth is that C-Strings cannot be fixed: not knowing the size of the buffer (either inbound or outbound) is a recipe for failure, and the band-aids (strncpy
, strlcpy
, ...) are just attempting to patch the symptoms.
Another hard cold truth is that wchar_t
is broken by design as it is non-portable (16 bits on Windows and 32 bits on Unix). Therefore you should not use it at all and instead rely on explicitly sized types char16_t
and char32_t
.
Your own code is subject to run-time issues as a result:
- you never guarantee that
src
is not null, at the very least anassert
would be useful and the documentation should reflect it, some compilers also have specific annotations (gcc's__attribute__((nonnull(2)))
for example) - you never check that
src
is long enough and blindly copyN
bytes, even though in general there is no guarantee that the source is longer than the destination
And has a number of design limitations:
- you do not allow copying piecemeal (useful to concatenate...)
- you only allow copying into statically allocated buffer
- you treat the strings as a meaningless sequence of bytes, and truncate without regard for its encoding (but then,
std::string
does so too...)
If you cannot afford std::string
, or wish to design an alternative where buffer ownership can be external (so as to use fixed-sized buffer), then I advise using a class approach and simply make it compatible with C-Strings for interoperability:
- by allowing construction from a C-String
- by keeping it NUL-terminated for cheap conversion into a C-String
And then, this class can easily implement safe copy!
Last note: std::string
may not be as costly as you think it is, if you are avoiding it for performance reason, measure first...
A rough example (untested!) of a safer alternative to C-String:
//
// Fixed-Capacity String
//
// It never allocates or re-allocates, but instead uses the
// externally provided buffer.
//
template <typename T>
class fcstring {
public:
// Construction
explicit fcstring(T* cstring):
_capacity(cstring ? std::strlen(cstring) : 0),
_size(_capacity),
_buffer(cstring)
{}
template <size_t N>
explicit fcstring(T (&str)[N]):
_capacity(N-1),
_size(N-1),
_buffer(str)
{}
// "buffer" is assumed to already point at a valid C-String
// of size "size" (not counting the terminating NUL-byte).
// "capacity" may be additionally provided if the buffer is known
// to be larger than "size", it then represents the number of "T"
// than the "buffer" may contain, not counting the NUL-byte.
fcstring(T* buffer, size_t size, size_t capacity = 0):
_capacity(capacity ? capacity : size),
_size(size),
_buffer(buffer)
{}
// Copy
void assign(fcstring const& other) {
size_t const copied = std::min(_capacity, other._size);
std::memcpy(_buffer, other._buffer, copied);
_buffer[copied] = T();
_size = copied;
}
// Conversion to C-String
// /!\ Be mindful that any embedded NUL character will
// /!\ result in a truncated output
T const* c_str() const { return _size ? _buffer : ""; }
private:
size_t _capacity;
size_t _size;
T* _buffer;
}; // class fcstring
L"0"
" You probably meantL"0円"
. \$\endgroup\$wchar
. For example:null_terminator<T>::value
is either"0円"
orL"0円"
depending on whether T ischar
orwchar
, respectively. \$\endgroup\$;)
. I think something in the lines ofsafe_strcpy(char * dest, size_t destMax, const char * src, size_t scrLen)
is already a big improvement, without 100 lines of template boilerplate involved. But then again, strcopying in C++ is pretty rare.std::string
covers 99% of the cases for you already! \$\endgroup\$use namespace
in a header file! stackoverflow.com/questions/1452721/… \$\endgroup\$