Checking array size in C/C++ to avoid segmentation faults

Question 1

So it's well known that C does not have any array bounds checking when accessing memory. Nowadays, if you call myArray[7] when you initialised it as int myArray[3], your program will get a segfault and crash thanks to protected memory.

Now, if you have an argument in a function such as myFunc(int *yourArray), but you know you need at least 8 slots in the array, is it possible to check if myArray[7] is illegal beforehand in order to throw a custom error:

"Sorry, yourArray is too small for this function. We need 8 ints of space."

rather than

"Segmentation fault."

Question 2

You might want to check out std::array

Question 3

Regarding the close vote: I believe this is on-topic both here and at SO. On SO, the answer would probably be that C/C++ does or does not have support for this. Here, the answer is about whether such a feature is conceptually possible, and if not, how you could achieve the same goal.

Question 4

@lxrec: other way around. C/C++ does not/does have support for it respectively.

Question 5

Checking array bounds like you want is implementation specific, because buffer overflow is an example of undefined behavior (and this explains why UB can be really bad).

It is also an undecidable problem in general. You can easily show that statically finding (by static program analysis, e.g. of the C++ source code, without actually running the program) every buffer overflow is equivalent to the halting problem. Read also about Rice's theorem.

However, several (partial) practical tools exist (notably on Linux):

you could add assert or static_assert-s in your code, and/or runtime checks.
you might find and use a static code analyzer à la Frama-C (it works for C code currently).
you could customize your GCC compiler using MELT.
You should compile your code with all warnings & debug info, e.g. g++ -Wall -Wextra -g if using GCC.
You might run your program with valgrind, at least for tests.
you could use the address sanitizer, e.g. add -fsanitize=address to your compilation flags (when testing)
notably in C (and sometimes in C++) it is a good convention to pass both array pointers and their size (like e.g. snprintf(3) or strncmp(3) do). In C, you might also use flexible array members in struct and store the flexible array's size inside the struct

BTW C and C++ pointer arithmetic abilities make finding buffer overflow even harder.

In C++11 you'll better avoid plain arrays and raw pointers and use standard containers and smart pointers.

Question 6

The word statically is key: The reduction to the halting problem only works if we want to rule out all programs that go out of bounds, but no more, and without running the program. If run-time checks are permitted, the problem is almost trivial, it just has runtime overhead (and a quite large one, for a naive solution). Likewise, it is easy to reject all programs that go out of bounds as well as some that don't (the hard part is not rejecting practically useful ones).

Question 7

The only thing I'd add: In C, always pass the array's length along with the array itself (a la strcpy vs strncpy), for the same reason you'd use a container instead of a raw array in C++.

Question 8

@Ixrec: You know strcpy vs strncpy is an atrociously bad example? strncpy does not copy strings, it copies a maximum of n non-0 bytes from a source-string and 0-pads to n bytes. And anyway, iff you provide a buffer length, you must either be ok with truncation (often very much not the case), or you need away to signal failure.

Question 9

@Ixrec: strlcpy is considered a "safer for some scenarios" version of strcpy, strncpy is just a very specialized completely different tool. Though yes, there are many who just don't know that, which makes it an even worse example.

Question 10

@Deduplicator You're right, clearly I need to brush up on my C. Basile picked a better example anyway so at least the actual goal of the comment was achieved.

Question 11

The answer is really fairly simple: if you want safety, use something that actually provides it--and that's not C, and not raw C-style arrays.

Without departing too far from the basic style of C and raw arrays, you can use C++ and an std::vector with [i] replaced by .at(i), and get bounds checking.

Using std::vector instead makes most of the problems with arrays easy. You can check the current size of the vector with its .size() member function. Most of the time, you don't need to do that though, because when you want to add something to it you just use its .push_back() member function.

At least in theory, you can sort of do most of the same sorts of things in C, but doing so gets relatively ugly. Although it's not terribly difficult to define a wrapper that (for example) puts a pointer and a current allocation size into a struct, you have to define functions to do all the manipulation on it, and even then you have to live with the fact that existing code won't know how to use it or deal with it. I've done this a few times, and if you need it badly enough you can make it work--but I long ago decided it just wasn't worth the pain.

Question 12

A function that receives a pointer does not know of the length of the corresponding array. You must pass in as a parameter yourself explicitly:

void myFunc(int *yourArray, size_t yourArrayLen)

Once you've done that, throwing an error is trivial.

Of course, this still leaves the possibility that your caller might give you the wrong length. You can't really prevent that without either:

implementing a custom data type to store arrays and then making sure the length stays in-sync with the true length at all times using encapsulation, or
allowing static arrays only, e.g.
```
void myFunc(int (*yourArray)[8]);
```

Question 13

There is no way in C(++) to get the length of an array from a pointer to its first element. (There are platform-specific functions like _msize in MSVCRT, but that only works on malloced pointers.)

What's typically done when passing arrays to functions is to pass the length along with the pointer so that bounds-checking can be done at runtime.

void myFunc(int* yourArray, int length)
{
 if (length < 8)
 {
 puts("Sorry, yourArray is too small for this function. We need 8 ints of space.");
 return;
 }
 // ...
}
void caller()
{
 int arr[LEN];
 myFunc(arr, LEN);
}

Question 14

I cannot find any guarantee that malloc and the rest won't ever provide more space than requested.

Question 15

It's in fact common due to rounding, but you have to realize that such extra space (when present) is always uninitialized.

Question 16

Use a custom wrapper for malloc (or write your own) that keeps additional information about the blocks it allocates. The one I use adds a few "guard bytes" to every allocation, embeds the length of the allocation as the a[-1], and checks the guard bytes and other things upon deallocation.

Question 17

That will only help if no array ever only uses only part of a memory-block...

Question 18

If you are in the habit of allocating chunks of memory then using parts of it in ad hoc ways that lose contact with the original pointer, you need better methodology.

Question 19

If you mean I forget when/where I should free what, or don't keep track of any other data I need later, you are trivially correct. But that doesn't seem to be the way you use it, which means you simply wrong.

Question 20

The op's request is to help prevent trashing arrays allocated by C. The first step is to avoid bad practices such as losing track of what part of the program "owns" the memory in question. Writing code that uses an anonymous chunk of memory it is passed is laying landmines.

score 9 · Accepted Answer · 2015-08-19 20:11:24Z

Checking array bounds like you want is implementation specific, because buffer overflow is an example of undefined behavior (and this explains why UB can be really bad).

It is also an undecidable problem in general. You can easily show that statically finding (by static program analysis, e.g. of the C++ source code, without actually running the program) every buffer overflow is equivalent to the halting problem. Read also about Rice's theorem.

However, several (partial) practical tools exist (notably on Linux):

you could add assert or static_assert-s in your code, and/or runtime checks.
you might find and use a static code analyzer à la Frama-C (it works for C code currently).
you could customize your GCC compiler using MELT.
You should compile your code with all warnings & debug info, e.g. g++ -Wall -Wextra -g if using GCC.
You might run your program with valgrind, at least for tests.
you could use the address sanitizer, e.g. add -fsanitize=address to your compilation flags (when testing)
notably in C (and sometimes in C++) it is a good convention to pass both array pointers and their size (like e.g. snprintf(3) or strncmp(3) do). In C, you might also use flexible array members in struct and store the flexible array's size inside the struct

BTW C and C++ pointer arithmetic abilities make finding buffer overflow even harder.

In C++11 you'll better avoid plain arrays and raw pointers and use standard containers and smart pointers.

The word statically is key: The reduction to the halting problem only works if we want to rule out all programs that go out of bounds, but no more, and without running the program. If run-time checks are permitted, the problem is almost trivial, it just has runtime overhead (and a quite large one, for a naive solution). Likewise, it is easy to reject all programs that go out of bounds as well as some that don't (the hard part is not rejecting practically useful ones).
The only thing I'd add: In C, always pass the array's length along with the array itself (a la strcpy vs strncpy), for the same reason you'd use a container instead of a raw array in C++.
@Ixrec: You know strcpy vs strncpy is an atrociously bad example? strncpy does not copy strings, it copies a maximum of n non-0 bytes from a source-string and 0-pads to n bytes. And anyway, iff you provide a buffer length, you must either be ok with truncation (often very much not the case), or you need away to signal failure.
@Ixrec: strlcpy is considered a "safer for some scenarios" version of strcpy, strncpy is just a very specialized completely different tool. Though yes, there are many who just don't know that, which makes it an even worse example.
@Deduplicator You're right, clearly I need to brush up on my C. Basile picked a better example anyway so at least the actual goal of the comment was achieved.

Stack Exchange Network

Checking array size in C/C++ to avoid segmentation faults

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Checking array size in C/C++ to avoid segmentation faults

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions