This is sort of the complementary question to How to best protect from 0 passed to std::string parameters?. Basically, I'm trying to figure out whether there is a way to have the compiler warn me if a code path would unconditionally try to call std::string
's char*
constructor using NULL
.
Run time checks are all well and good, but for a case like:
std::string get_first(const std::string& foo) {
if (foo.empty()) return 0; // Or NULL, or nullptr
return foo.substr(0, 1);
}
it's annoying that, even though the code is guaranteed to fail if that code path is exercised, and the system header is usually annotated with the precondition saying that the pointer must not be null, this still passes compilation under gcc
, clang
, etc., even with -std=c++11 -Wall -Wextra -pedantic -Werror
. I can block the specific case of 0
on gcc
with -Werror=zero-as-null-pointer-constant
, but that doesn't help with NULL
/nullptr
and it's sort of tackling the related but dissimilar problem. The major issue is that a programmer can make this mistake with 0
, NULL
or nullptr
and not notice it if the code path isn't exercised.
Is it possible to force this check to be compile time, covering an entire code base, without nonsense like replacing std::string
with a special subclass throughout the code?
2 Answers 2
There are different approaches, depending on whether it should work on all compilers, in very restricted circumstances and with some side-effects, or only on some, but in exchange far more broadly:
Adding overloads which complain when used. There is
[[deprecated]]
since C++11 which will complain at the immediate call-site, as long as it's not suppressed, like normally in a system-header.GCC and CLANG provide a better suited custom attribute,
__attribute__((error("message")))
, which will always break the build if the function is used and name the call-site.The problem with adding overloads accepting all those things which could be nullpointer-literals, is that it might confuse other template's SFINAE, thus breaking code, and cannot catch an argument already of type
char*
which unfortunately just happens to be a nullpointer:// added overload, common lines: template<class X, class = typename std::enable_if<std::is_arithmetic<X>() && !std::is_pointer<X>()> // best on GCC / clang: string(X) __attribute__((error("nullpointer"))); // best on others: [[deprecated("UB")]] string(X) /* no definition, link breaks */;
The preferable alternative is marking the argument as non-null, and letting the compilers optimizer figure it out. You are asking for and heeding such warnings, right?
That's only an option in GCC and CLANG, but it avoids the disadvantage of additional overloads and catches all cases the compiler can figure out, which means it works better with more optimization.basic_string(const CharT* s, size_type count, const Allocator& alloc = Allocator()) __attribute__((nonnull)); basic_string(const CharT* s, const Allocator& alloc = Allocator()) __attribute__((nonnull));
Generally, one can ask GCC / clang whether it can determine the value of any expression at compile-time, using
__builtin_constant_p
.
It wouldn't be done at compile time, but it would be fairly easy to define a string type that handled this more gracefully than (at least most implementations of) std::string
currently does.
The basis would be the fact that NULL
has type int
and nullptr
has type nullptr_t
. As such, it would be fairly easy to overload on these types, and have them fail "fast and loud" when passed the wrong types of arguments:
#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <vector>
#include <numeric>
#include <string>
class test_string {
std::vector<char> data;
public:
test_string(nullptr_t) { throw std::runtime_error("Bad input data"); }
test_string(int) { throw std::runtime_error("Bad input data"); }
test_string(void *) { throw std::runtime_error("Bad input data"); }
test_string(char const *s) { size_t length = strlen(s); data.resize(length); std::copy_n(s, length, &data[0]); }
size_t length() const { return data.size(); }
friend std::ostream &operator<<(std::ostream &os, test_string const &s) {
std::copy(s.data.begin(), s.data.end(), std::ostream_iterator<char>(os));
return os;
}
};
int main() {
// These will all immediately throw if uncommmented:
//test_string x(NULL);
//test_string y(nullptr);
//test_string z(0);
// But this still works fine:
test_string a("This is a string");
std::cout << a;
}
Doing this with std::string
might be a little more difficult. The problem is that std::string
already has quite a few overloads of its constructor. The overload for nullptr_t
should always be safe (the only thing of that type is nullptr
, and you clearly never want that).
Having an overload to take int
in place of each charT const *
might be more difficult. The problem is that some of the overloads already distinguish the intent based on (for example) the position of the pointer vs. a count indicating the length. You'd have to look pretty carefully to be sure you were catching invocation with NULL, but not interfering with an existing (legitimate) use that takes (for example) a char
and a size_t
. I'm not convinced it's impossible, but it would take some care (and you might end up having to live with leaving a few cases that could still slip through).
To be very practical, however, this would probably have to be a modification to std::string
though--having it as a separate class leads to some fairly serious impracticality (at least in my opinion).
-
NULL
can be any nullpointer-constant which can be implicitly converted to any pointer type, not only anint
. And your code is covering those other possibilites reasonably well. Still, you depend on being called directly with the constant.Deduplicator– Deduplicator2016年02月12日 03:59:54 +00:00Commented Feb 12, 2016 at 3:59 -
@Deduplicator: Yes, if you do something like
char *x=NULL; std::string y(x);
, there's not much that can be done, at least on the basis of type. As far as the integer literal having a type other thanint
: we don't need an exact match, just a better match than a conversion from integer to pointer (though I'm not at all sure what's above is quite enough to ensure that).Jerry Coffin– Jerry Coffin2016年02月12日 05:29:05 +00:00Commented Feb 12, 2016 at 5:29 -
` test_string(nullptr_t) = delete;` should be even better.alfC– alfC2023年02月14日 23:57:37 +00:00Commented Feb 14, 2023 at 23:57
string
's requirements). You can't really hope for a guarantee of...anything after that. As far as I can see, nearly the only reasonable approach is to ensure against that happening in the first place.0
/NULL
/nullptr
, we're invoking said behavior so directly that compilers with headers that mention the restriction should be able to see that we've violated the constraints of the function. I'm sort of hoping the compiler can catch the problem before we actually run and invoke said undefined behavior.std::string
for constructors takingnullptr_t
andint
parameters in place ofcharT const *
. It would at least be easy to make these fail "fast and noisy", such as throwingstd::invalid_argument
from the ctor, so no other code could ever see such a string. OTOH,string
already has lots of constructors, so it would take some care to be sure these couldn't be invoked for valid inputs. Thenullptr_t
overload should be safe though.