I am dealing with some older C style APIs, like Posix execve
that take a char**
. In the rest of my code, I prefer to use a fairly modern C++ style, so I have vector of std::string
. Is this the best way to handle the glue between them?
char ** strlist(std::vector<std::string> &input) {
char** result = new char*[input.size()];
result[input.size()] = nullptr;
for (int i=0; i<input.size(); i++) {
char *temp = new char[input[i].size()];
strcpy(temp, input[i].c_str());
result[i] = temp;
}
return result;
}
bool del_strlist(char ** strings, int limit=1024) {
bool finished = false;
for (int i=0; i<limit; i++) {
char *temp = strings[i];
if (temp == nullptr) {
finished = true;
break;
}
delete temp;
}
delete strings;
return !finished;
}
6 Answers 6
Since you already have a std::vector<std::string>
, it's much simpler to let that own the memory, and build a parallel std::vector<char*>
which just keeps pointers into the original strings. The only difficulty is ensuring this isn't used after the owning vector goes out of scope.
The simplest implementation is something like:
std::vector<char*> strlist(std::vector<std::string> &input) {
std::vector<char*> result;
// remember the nullptr terminator
result.reserve(input.size()+1);
std::transform(begin(input), end(input),
std::back_inserter(result),
[](std::string &s) { return s.data(); }
);
result.push_back(nullptr);
return result;
}
where the caller is responsible for managing the lifetimes of the two vectors. You get the required char **
by calling result.data()
.
If you want to avoid the potential lifetime problems, you can write a class to own both vectors for you so their lifetimes are synchronized.
Note that I deliberately haven't taken the vector by const ref, because execve
takes an array of pointers to non-const char, but it doesn't mutate the contents so there's no need to make a copy. This lack of const-safety is common in old C interfaces.
PS. I didn't force myself to use transform
when a simple loop would do hoping for some magic optimization. I used transform
instead of a simple loop because it more clearly expressed the semantics of my code.
A "simple" loop is a powerful multipurpose language construct which could do anything. By comparison transform
is a simple, special-purpose algorithm, and much easier to reason about as a consequence.
Not least, since I had to take a non-const reference to the argument despite having no intention of mutating it, anything that makes it easier to see at a glance that no mutation is possible, is helpful.
-
4\$\begingroup\$ I much prefer this approach. One minor point - I think you need to
push_back
anullptr
at the end. Also - if you go down the class path, you'll need to disable or do the right thing for copy construction and assignment. \$\endgroup\$GuyRT– GuyRT2018年10月10日 16:06:34 +00:00Commented Oct 10, 2018 at 16:06 -
\$\begingroup\$ Good points both! \$\endgroup\$Useless– Useless2018年10月10日 16:36:25 +00:00Commented Oct 10, 2018 at 16:36
-
\$\begingroup\$ Oh - another thing. I think you should use
c_str()
instead ofdata()
(which isn't guaranteed to null terminate). \$\endgroup\$GuyRT– GuyRT2018年10月10日 16:42:45 +00:00Commented Oct 10, 2018 at 16:42 -
1\$\begingroup\$ ... and all known implementations null-terminated the result of
data()
before C++11. \$\endgroup\$Martin Bonner supports Monica– Martin Bonner supports Monica2018年10月10日 16:56:38 +00:00Commented Oct 10, 2018 at 16:56 -
2\$\begingroup\$ @nurettin, the null pointer is there because the users need it to delimit the array (it's the interface of
exec()
family of functions). \$\endgroup\$Toby Speight– Toby Speight2018年10月10日 17:16:24 +00:00Commented Oct 10, 2018 at 17:16
I can see a few potential problems here:
- Since you allocated a
char*
array ofinput.size()
elements,result[input.size()]
is out of bounds. - Similarly,
std::string
'ssize()
is the number of characters - it doesn't include the trailing0円
needed for C-style strings. So everystrcpy
here risks doing a buffer overflow (risks, because it is possible for C++std::strings
to contain a null in the middle, terminating thestrcpy
mid way). - You have set a limit on the number of elements of
strings
youdelete
, but thendelete strings
irrespective of whether that limit was breached. This risks a memory leak if the limit was exceeded. - You use the array version of new
new <type>[<size>];
This means you need to use the array version of deletedelete [] <object>
(Note the square brackets).
Different approach
The approach in the question is somewhat procedural. My aim is to use RAII to facilitate ease of use, as the code currently can leak memory and relies on programmer to free it.
Usage cases
Lets first have a look at the usage:
Create cstring style array from
std::string
s.Sink the created array into
exec
family of functionsWait until child process exits
Reclaim the memory
Now, it clearly looks like constructor and destructor calls, as they operate on the same data, possibly even unmodified.
Code
Here is rough sketch of the class I had in mind:
class owned_cstrings {
std::vector<char*> cstring_array;
public:
owned_cstrings(const std::vector<std::string>& source) :
cstring_array(source.size())
{
std::transform(source.begin(), source.end(), cstring_array.begin(), [](const auto& elem_str) {
char* buffer = new char[elem_str.size() + 1];
std::copy(elem_str.begin(), elem_str.end(), buffer);
buffer[elem_str.size()] = 0;
return buffer;
});
cstring_array.push_back(nullptr);
}
owned_cstrings(const owned_cstrings& other) = delete;
owned_cstrings& operator=(const owned_cstrings& other) = delete;
owned_cstrings(owned_cstrings&& other) = default;
owned_cstrings& operator=(owned_cstrings&& other) = default;
char** data() {
return cstring_array.data();
}
~owned_cstrings() {
for (char* elem : cstring_array) {
delete[] elem;
}
}
};
Design decisions
The code above would've been much more dangerous footgun without some thought put into it. First of all, it is not copyable, although it could do deep copy, I believe it is not intended. Not doing deep copy will result in more than one delete, which is disastrous. Second, the data access is somewhat limited, because the only usage case covered is sinking into exec
family of functions.
Demo
#include <vector>
#include <string>
#include <algorithm>
class owned_cstrings {
std::vector<char*> cstring_array;
public:
owned_cstrings(const std::vector<std::string>& source) :
cstring_array(source.size())
{
std::transform(source.begin(), source.end(), cstring_array.begin(), [](const auto& elem_str) {
char* buffer = new char[elem_str.size() + 1];
std::copy(elem_str.begin(), elem_str.end(), buffer);
buffer[elem_str.size()] = 0;
return buffer;
});
cstring_array.push_back(nullptr);
}
owned_cstrings(const owned_cstrings& other) = delete;
owned_cstrings& operator=(const owned_cstrings& other) = delete;
owned_cstrings(owned_cstrings&& other) = default;
owned_cstrings& operator=(owned_cstrings&& other) = default;
char** data() {
return cstring_array.data();
}
~owned_cstrings() {
for (char* elem : cstring_array) {
delete[] elem;
}
}
};
#include <iostream>
template <typename T>
std::ostream& operator<<(std::ostream& os, const std::vector<T>& v) {
if (v.empty()) {
return os;
}
os << v.front();
for (std::size_t i = 1; i < v.size(); ++i) {
os << ' ' << v[i];
}
return os;
}
std::ostream& operator<<(std::ostream& os, char** words) {
while (*words) {
os << *words++ << ' ';
}
return os;
}
int main() {
std::vector<std::string> words = { "What", "a", "beautiful", "world" };
std::cout << words << '\n';
owned_cstrings cstring_words(words);
std::cout << cstring_words.data() << '\n';
}
-
1\$\begingroup\$ Once you have a class like this, it's a pity not to make the
std::vector<std::string>
useless by providing insertion/deletion capabilities. \$\endgroup\$papagaga– papagaga2018年10月10日 07:35:09 +00:00Commented Oct 10, 2018 at 7:35 -
\$\begingroup\$ I'm a bit unsure what you mean. I believe you meant some sort modification capabilities into the class? In that case, I would start from
cstring
class, as that would let them combine it in their own way. Anyway, please write down the idea and I'll try to implement/discuss it later this evening. \$\endgroup\$Incomputable– Incomputable2018年10月10日 07:38:18 +00:00Commented Oct 10, 2018 at 7:38 -
\$\begingroup\$ I meant very simply to give
owned_cstrings
apush_back
method, so you don't need to create astd::vector<std::string>
before converting it into aowned_cstrings
. \$\endgroup\$papagaga– papagaga2018年10月10日 10:12:29 +00:00Commented Oct 10, 2018 at 10:12 -
\$\begingroup\$ @papagaga, perhaps range constructor along with initializer list would fit better? \$\endgroup\$Incomputable– Incomputable2018年10月10日 12:44:35 +00:00Commented Oct 10, 2018 at 12:44
-
\$\begingroup\$
new char[]
withdelete p
? \$\endgroup\$StoryTeller - Unslander Monica– StoryTeller - Unslander Monica2018年10月10日 17:14:46 +00:00Commented Oct 10, 2018 at 17:14
Firstly, there's a memory leak possible, because strlist
performs two allocations using new[]
. If the first succeeds, but the second throws std::bad_alloc
, then we have no reference to *result
with which to delete[]
it.
The answer by Incomputable shows how the interface can be greatly improved.
I have a minor point that might be overlooked when reading that answer, and an efficiency improvement that can be incorporated into the original or into the proposed RAII object.
The minor point is that strlist()
requires a reference to a mutable vector for no good reason - the signature should be
char ** strlist(const std::vector<std::string> &input);
// ^^^^^
The efficiency improvement is that we know the total storage requirement for all the strings at the start of the function/constructor, so we can make a single allocation and place all our strings within that block instead of making separate allocations for each string to be accessed. See example code below.
From C++11 onwards, we could go further, and make our object be a view object, simply storing pointers to the data()
of the input strings (which would now need to be mutable - consider pass-by-value, and call it using std::move()
where that's useful).
Finally, is there a good reason that this should work only with std::vector
and not with other containers?
Single-allocation method
Here's how to make two passes over input
to save making several small allocations. I'm keeping (nearly) the original interface to make the changes more obvious, but I really recommend you create a type to ensure that the memory management is automatic.
#include <cstring>
#include <string>
#include <vector>
char *const *strlist(const std::vector<std::string>& input)
{
char **result = new char*[input.size() + 1];
std::size_t storage_size = 0;
for (auto const& s: input) {
storage_size += s.size() + 1;
}
try {
char *storage = new char[storage_size];
char *p = storage;
char **q = result;
for (auto const& s: input) {
*q++ = std::strcpy(p, s.c_str());
p += s.size() + 1;
}
*q = nullptr; // terminate the list
return result;
}
catch (...) {
delete[] result;
throw;
}
}
void del_strlist(char *const *strings)
{
// First string is the allocated storage
delete[] strings[0];
delete[] strings;
}
#include <iostream>
int main()
{
std::vector<std::string> args{ "/bin/ls", "ls", "-l" };
auto v = strlist(args);
for (auto p = v; *p; ++p) {
std::cout << '\'' << *p << "'\n";
}
del_strlist(v);
}
Single-allocation method, with smart pointer
We can use std::unique_ptr
to hold our data, if we don't mind using a custom deleter:
#include <cstring>
#include <memory>
#include <string>
#include <vector>
auto strlist(const std::vector<std::string>& input)
{
static auto const deleter = [](char**p) {
// First string is the allocated storage
delete[] p[0];
delete[]p;
};
std::unique_ptr<char*[], decltype(deleter)>
result{new char*[input.size() + 1], deleter};
// Ensure that destructor is safe (in case next 'new[]' fails)
result[0] = nullptr;
std::size_t storage_size = 0;
for (auto const& s: input) {
storage_size += s.size() + 1;
}
char *p = result[0] = new char[storage_size];
char **q = result.get();
for (auto const& s: input) {
*q++ = std::strcpy(p, s.c_str());
p += s.size() + 1;
}
*q = nullptr; // terminate the list
return result;
}
#include <iostream>
int main()
{
std::vector<std::string> args{ "/bin/ls", "ls", "-l" };
auto v = strlist(args);
for (auto p = v.get(); *p; ++p) {
std::cout << '\'' << *p << "'\n";
}
}
You should see that this is a bit simpler to write and to use than my first version.
P.S. Both demos compile with g++ -std=c++17 -Wall -Wextra -Wwrite-strings -Wno-parentheses -Wpedantic -Warray-bounds -Weffc++
and run under Valgrind with no warnings or errors, in case that's not already assumed.
-
\$\begingroup\$ Agreed. I want to draw more focus towards
std::string_view
. If you don't need manage the memory of a string (read only and probably most size-preserving operations) should work on astring_view
instead of astring&
. \$\endgroup\$WorldSEnder– WorldSEnder2018年10月10日 14:47:09 +00:00Commented Oct 10, 2018 at 14:47 -
\$\begingroup\$ May or may not be appropriate in this case - if we accept
std::string_view
as input, we'll need to copy the contents, in order to get the terminating NUL. So it's a case of swings and roundabouts here. \$\endgroup\$Toby Speight– Toby Speight2018年10月10日 14:51:47 +00:00Commented Oct 10, 2018 at 14:51 -
\$\begingroup\$ if this were rust you could probably safely get away with temporarily replacing that zero byte in just for the duration of the c call. The same could be done here but it's questionable practice :D \$\endgroup\$WorldSEnder– WorldSEnder2018年10月10日 14:55:04 +00:00Commented Oct 10, 2018 at 14:55
-
\$\begingroup\$ It should be noted however that if you were to pass this to the exec() family you must terminate it with a NULL pointer. \$\endgroup\$Pryftan– Pryftan2018年10月14日 18:00:51 +00:00Commented Oct 14, 2018 at 18:00
-
\$\begingroup\$ @Pryftan - that's what the
*q = nullptr
is for, immediately before thereturn
. I've even commented it to explain! \$\endgroup\$Toby Speight– Toby Speight2018年10月15日 07:37:24 +00:00Commented Oct 15, 2018 at 7:37
There are some implicit problematic assumptions...
You must not forget to call
del_strlist
, and call it exactly once.You clearly have to transport some data via unknown means from
strlist
call to the point wheredel_strlist
takes place, namely the size of the vector.The user has to know how each string was allocated. As your
char**
data structure is read/write, someone outside even might change an element to another, and try to usemalloc
/free
.
All this data aggregation/"use the correct methods"-requirements strongly suggest to write a class.
You could avoid these problems by providing a class
constructed from a either a std::vector<std::string>
or a const char**
, which would store the data in a local char**
(and automatically deallocate it in the destructor). This already solves problem #1.
Then you could add a size_t
member to your class containing the array size - this solves problem #2 - now the destructor knows the array size.
To tackle #3, you could even add some operator[]
s which are able to access or even replace a char*
at an index (and reallocate the memory in a correct way).
If you provide an operator char**()
which returns the internal (private) member in which the allocated strings are stored, you may use this class anywhere where a char**
is expected :)
If you need the stored data as C++ vector again, you might want to add a std::vector<std::string> get() const
member to this class.
Functions like
execve
actually take achar* const *
, so an array ofconst
pointers, to mutable strings. Sostrlist
could returnchar* const *
instead.strlist
should take theinput
argument by const reference, because it does not modify it.char** result = new char*[input.size() + 1];
is needed, because an additional element (the trailing null pointer) is appended. (array[3]
is an array of 3 items, with indices 0, 1, 2.)Similarily, the strings need to allocated as
char *temp = new char[input[i].size() + 1];
, becausestd::string::size
(orstd::string::length
) does not count the trailing 0 byte of the string.std::strcpy
is needed, except if ausing
declaration was used before. In the C++ headers like<cstring>
, the C functions are in thestd
namespace.in
del_strlist
,delete[]
needs to be used two times instead ofdelete
, because the memory was allocated usingoperator new[]
.Because
del_strlist
is supposed to always finish (return false), it may be better to throw an exception if it does not, so that the user does not need to check the return value.
To avoid needing to have a deleting function del_strlist
, it may be better to have a class that contains the allocated array, and deallocates it in the destructor. For example:
class strlist {
private:
std::vector<char*> cstrings_;
public:
template<typename It>
strlist(It begin, It end) { // does not need std::vector as input
for(It it = begin; it != end; ++it) {
const std::string& string = *it;
char* cstring = new char[string.length() + 1];
std::strcpy(cstring, string.c_str());
cstrings_.push_back(cstring);
}
cstrings_.push_back(nullptr);
}
~strlist() {
for(char* cstring : cstrings_)
if(cstring != nullptr) delete[] cstring;
}
char* const * data() const {
return cstring_.data();
}
};
And used like:
void f() {
std::vector<std::string> original_strings = ....
strlist cstrings(original_strings.begin(), original_strings.end());
execve("program", cstrings.data(), nullptr);
// no need to call function to delete cstrings
}
Some older C APIs may take char*
arrays as input, but not actually modify the array contents. In that case it is simpler to not copy new strings, but take pointers to strings in the original array, using const_cast
. For example:
std::vector<char*> strlist(const std::vector<std::string>& strings) {
std::vector<char*> cstrings;
for(const std::string& string : strings)
cstrings.push_back(const_cast<char*>(string.c_str()));
cstrings.push_back(nullptr);
return cstrings;
}
But this is a hack, and mostly useful only to gain performance / save memory.
del_strlist
returns a boolean? What is the result used for? \$\endgroup\$vector<string>
) rather than trying to maintain a parallelvector<string>
\$\endgroup\$char**
you pass? Do the pointers need to remain valid throughout, or can they be discarded after the call? \$\endgroup\$