So it's well known that C does not have any array bounds checking when accessing memory. Nowadays, if you call myArray[7]
when you initialised it as int myArray[3]
, your program will get a segfault and crash thanks to protected memory.
Now, if you have an argument in a function such as myFunc(int *yourArray)
, but you know you need at least 8 slots in the array, is it possible to check if myArray[7]
is illegal beforehand in order to throw a custom error:
"Sorry, yourArray is too small for this function. We need 8 ints of space."
rather than
"Segmentation fault."
5 Answers 5
Checking array bounds like you want is implementation specific, because buffer overflow is an example of undefined behavior (and this explains why UB can be really bad).
It is also an undecidable problem in general. You can easily show that statically finding (by static program analysis, e.g. of the C++ source code, without actually running the program) every buffer overflow is equivalent to the halting problem. Read also about Rice's theorem.
However, several (partial) practical tools exist (notably on Linux):
you could add
assert
orstatic_assert
-s in your code, and/or runtime checks.you might find and use a static code analyzer à la Frama-C (it works for C code currently).
you could customize your GCC compiler using MELT.
You should compile your code with all warnings & debug info, e.g.
g++ -Wall -Wextra -g
if using GCC.You might run your program with valgrind, at least for tests.
you could use the address sanitizer, e.g. add
-fsanitize=address
to your compilation flags (when testing)notably in C (and sometimes in C++) it is a good convention to pass both array pointers and their size (like e.g. snprintf(3) or strncmp(3) do). In C, you might also use flexible array members in
struct
and store the flexible array's size inside thestruct
BTW C and C++ pointer arithmetic abilities make finding buffer overflow even harder.
In C++11 you'll better avoid plain arrays and raw pointers and use standard containers and smart pointers.
-
3The word statically is key: The reduction to the halting problem only works if we want to rule out all programs that go out of bounds, but no more, and without running the program. If run-time checks are permitted, the problem is almost trivial, it just has runtime overhead (and a quite large one, for a naive solution). Likewise, it is easy to reject all programs that go out of bounds as well as some that don't (the hard part is not rejecting practically useful ones).user7043– user70432015年08月19日 20:16:03 +00:00Commented Aug 19, 2015 at 20:16
-
4The only thing I'd add: In C, always pass the array's length along with the array itself (a la strcpy vs strncpy), for the same reason you'd use a container instead of a raw array in C++.Ixrec– Ixrec2015年08月19日 20:28:31 +00:00Commented Aug 19, 2015 at 20:28
-
1@Ixrec: You know
strcpy
vsstrncpy
is an atrociously bad example?strncpy
does not copy strings, it copies a maximum of n non-0 bytes from a source-string and 0-pads to n bytes. And anyway, iff you provide a buffer length, you must either be ok with truncation (often very much not the case), or you need away to signal failure.Deduplicator– Deduplicator2015年08月19日 22:43:20 +00:00Commented Aug 19, 2015 at 22:43 -
1@Ixrec:
strlcpy
is considered a "safer for some scenarios" version ofstrcpy
,strncpy
is just a very specialized completely different tool. Though yes, there are many who just don't know that, which makes it an even worse example.Deduplicator– Deduplicator2015年08月19日 22:47:19 +00:00Commented Aug 19, 2015 at 22:47 -
1@Deduplicator You're right, clearly I need to brush up on my C. Basile picked a better example anyway so at least the actual goal of the comment was achieved.Ixrec– Ixrec2015年08月19日 22:48:30 +00:00Commented Aug 19, 2015 at 22:48
The answer is really fairly simple: if you want safety, use something that actually provides it--and that's not C, and not raw C-style arrays.
Without departing too far from the basic style of C and raw arrays, you can use C++ and an std::vector
with [i]
replaced by .at(i)
, and get bounds checking.
Using std::vector
instead makes most of the problems with arrays easy. You can check the current size of the vector with its .size()
member function. Most of the time, you don't need to do that though, because when you want to add something to it you just use its .push_back()
member function.
At least in theory, you can sort of do most of the same sorts of things in C, but doing so gets relatively ugly. Although it's not terribly difficult to define a wrapper that (for example) puts a pointer and a current allocation size into a struct
, you have to define functions to do all the manipulation on it, and even then you have to live with the fact that existing code won't know how to use it or deal with it. I've done this a few times, and if you need it badly enough you can make it work--but I long ago decided it just wasn't worth the pain.
A function that receives a pointer does not know of the length of the corresponding array. You must pass in as a parameter yourself explicitly:
void myFunc(int *yourArray, size_t yourArrayLen)
Once you've done that, throwing an error is trivial.
Of course, this still leaves the possibility that your caller might give you the wrong length. You can't really prevent that without either:
- implementing a custom data type to store arrays and then making sure the length stays in-sync with the true length at all times using encapsulation, or
allowing static arrays only, e.g.
void myFunc(int (*yourArray)[8]);
There is no way in C(++) to get the length of an array from a pointer to its first element. (There are platform-specific functions like _msize in MSVCRT, but that only works on malloc
ed pointers.)
What's typically done when passing arrays to functions is to pass the length along with the pointer so that bounds-checking can be done at runtime.
void myFunc(int* yourArray, int length)
{
if (length < 8)
{
puts("Sorry, yourArray is too small for this function. We need 8 ints of space.");
return;
}
// ...
}
void caller()
{
int arr[LEN];
myFunc(arr, LEN);
}
-
1I cannot find any guarantee that
malloc
and the rest won't ever provide more space than requested.Deduplicator– Deduplicator2015年08月19日 23:05:27 +00:00Commented Aug 19, 2015 at 23:05 -
It's in fact common due to rounding, but you have to realize that such extra space (when present) is always uninitialized.MSalters– MSalters2015年08月20日 10:31:01 +00:00Commented Aug 20, 2015 at 10:31
Use a custom wrapper for malloc (or write your own) that keeps additional information about the blocks it allocates. The one I use adds a few "guard bytes" to every allocation, embeds the length of the allocation as the a[-1], and checks the guard bytes and other things upon deallocation.
-
1That will only help if no array ever only uses only part of a memory-block...Deduplicator– Deduplicator2015年08月19日 23:06:23 +00:00Commented Aug 19, 2015 at 23:06
-
If you are in the habit of allocating chunks of memory then using parts of it in ad hoc ways that lose contact with the original pointer, you need better methodology.ddyer– ddyer2015年08月20日 06:47:36 +00:00Commented Aug 20, 2015 at 6:47
-
If you mean I forget when/where I should free what, or don't keep track of any other data I need later, you are trivially correct. But that doesn't seem to be the way you use it, which means you simply wrong.Deduplicator– Deduplicator2015年08月22日 01:47:49 +00:00Commented Aug 22, 2015 at 1:47
-
The op's request is to help prevent trashing arrays allocated by C. The first step is to avoid bad practices such as losing track of what part of the program "owns" the memory in question. Writing code that uses an anonymous chunk of memory it is passed is laying landmines.ddyer– ddyer2015年08月22日 23:35:08 +00:00Commented Aug 22, 2015 at 23:35
Explore related questions
See similar questions with these tags.
std::array