Implementing realloc in C

Question 1

I have an assignment to implement realloc in C, This's my code please review it.

void *my_realloc(void *ptr, size_t len)
{
 void *real;
 real = malloc(len);
 memset(real, 0, len);
 if (real)
 memcpy(real, ptr, len);
 free(ptr);
 return (real);
}

Question 2

I'd say that's not possible, since you cannot know the size of an allocated area by just providing the pointer. The assignment is pointless, unless you also override malloc and free so you can store allocated size, for instance in the first bytes of the block. But reimplementing realloc alone is impossible.

Question 3

Did you read the realloc specification before starting to write code? That should have been your first step.

Question 4

Does your assignment including reimplementing malloc, free and calloc? Is your realloc meant to work on a block returned by the platform's malloc? If so, the answers below tell you you're stuck!

Question 5

The code leaves a bad first impression because of how it is formatted. In particular, the extra whitespace between the return type and the function name, and that between the variable's type and its name, looks odd.
Unless you are writing code that needs to be compatible with pre-C99 compilers, you should break the habit of declaring variables at the top of the block and initializing them later. Instead, postpone the declaration of variables until the point when they can be initialized (to the extent possible). This helps to minimize bugs caused by uninitialized variables.
I recommend always using curly braces to create block scope after if statements, even if you currently only have one statement there. This minimizes the potential for serious bugs when you later go back and add some additional logic.
The return value of malloc must be checked immediately after the call to malloc! If malloc fails, it will return a null pointer, and any attempt to use a null pointer results in undefined behavior. Therefore, you need to move the call to memset to after you check the result of malloc, otherwise you risk undefined behavior.
In fact, you do not need to call memset at all. If you check the C language specification for information on the realloc function, you will see that it does not initialize the additional memory in the case where the new size is larger than the old size.
You are breaking the realloc specification in several other ways, too. If your goal is to simulate/reinvent realloc, then you absolutely must ensure that your implementation behaves in exactly the same way as realloc.

In fact, realloc is really a complete memory management subsystem wrapped in a single function—it does everything that malloc and free do—and more—making its semantics very complicated and confusing. Here's a summary of the relevant requirements for realloc from the standard:
- If the requested block size is smaller than the original size of the block, realloc either frees the unwanted memory at the end of the block and returns the input pointer unchanged, or allocates a new block of the appropriate size, frees the original block, and returns a pointer to this newly-allocated block.
- If the requested block size is larger than the original size of the block, realloc may allocate an expanded block at a new address and copy the contents of the original block into the new location. In this case, a pointer to the expanded block is returned, and the extended portion of the block is left uninitialized. Or, if possible, it may expand the original block in-place and return the input pointer unchanged.
- If realloc cannot satisfy a request to expand a block, it returns a null pointer and does not free the original block. (realloc will always succeed when you request to shrink a block.)
- If the input pointer is null, then realloc behaves exactly as if you had called malloc(size), returning a pointer to the newly-allocated block of the requested size, or a null pointer if the request could not be satisfied.
- If the requested size is 0 and the input pointer is non-null, then realloc behaves exactly as if you had called free(ptr), and always returns a null pointer.
- If the input pointer is null and the requested size is 0, then the result is undefined!
Whew! To be correct, your code needs to implement all of these conditions, precisely as documented.

Conversely, if you are not simulating realloc (and that's perfectly fine—maybe you're reinventing the wheel because you want different behavior in your program), then you need to use a different name so that other programmers are not fooled into thinking that it is identical to realloc and won't expect it to behave in the same way.
In fact, as you can see from the list of requirements above, it is actually impossible to reinvent realloc as required by the specification, since you do not have insider knowledge about how the standard library implements memory management. You need to be able to determine the original size of the memory block pointed to by ptr, and there is no portable way to do so.

Why do you need this information? Two reasons. First, you need to know whether the caller is requesting to shrink or expand the memory block so that you can follow the correct semantics. Second, you need to know how many bytes to copy when you call memcpy—right now, you are copying past the end of the original buffer, which is undefined behavior!

The first problem could be solved by altering the semantics of our function so that it only grows the size of a block, renaming it something like GrowBlock or ExpandMemory to make it clear that it is not identical to realloc. Unfortunately, there would be no reliable way to assert this requirement that len is greater than or equal to the current size of the memory block pointed to by ptr within the body of the function, so documentation alone would have to suffice, which is a very weak guarantee. Even more unfortunately, this won't solve the second problem—we still have no way to correctly call memcpy!

Therefore, our only real option is to modify the function's signature to accept the original size of ptr as a parameter (in addition to the desired new size that you already accept as a parameter).

All of that considered, here's how I'd write it:

// This function is similar to realloc() and implements the same semantics,
// except that the caller must explicitly pass the original size of the
// memory block pointed to by 'ptr', as well as the desired size.
void *my_realloc(void *ptr, size_t originalLength, size_t newLength)
{
 // Note that because we cannot rely on implementation details of the standard library,
 // we can never grow a block in place like realloc() can. However, it is semantically
 // equivalent to allocate a new block of the appropriate size, copy the original data
 // into it, and return a pointer to that new block. In fact, realloc() is allowed to
 // do this, as well. So we lose a possible performance optimization (that is rarely
 // possible in practice anyway), but correctness is still ensured, and the caller
 // never need be the wiser.
 // 
 // Likewise, we cannot actually shrink a block of memory in-place, so we either
 // have to return the block unchanged (which is legal, because a block of memory
 // is always allowed to be *larger* than necessary), or allocate a new smaller
 // block, copy the portion of the original data that will fit, and return a
 // pointer to this new shrunken block. The latter would actually be slower,
 // so we'll avoid doing this extra work when possible in the current implementation.
 // (You might need to change this if memory pressure gets out of control.)
 if (newLength == 0)
 {
 free(ptr);
 return NULL;
 }
 else if (!ptr)
 {
 return malloc(newLength);
 }
 else if (newLength <= originalLength)
 {
 return ptr;
 }
 else
 {
 assert((ptr) && (newLength > originalLength));
 void *ptrNew = malloc(newLength);
 if (ptrNew)
 {
 memcpy(ptrNew, ptr, originalLength);
 free(ptr);
 }
 return ptrNew;
 }
}

Indeed, this looks quite complicated—and it it isn't even as complicated as a real implementation of realloc! It's a good lesson about why realloc is actually bad design. You should not write massive multi-purpose functions, and you certainly shouldn't implement an entire subsystem in a single function! Instead, break the important behaviors out into separate functions. In this case, you might have four separate functions that allocate a block (AllocateMemory), free a block (FreeMemory), expand a block (ExpandMemory), and shrink a block (ShrinkMemory). This division of labor makes the implementation(s) much easier to write, easier to reason about, easier to include error-checking, and therefore easier to maintain and less likely to contain bugs. I realize your assignment was to write realloc, but there is a broader (perhaps inadvertent) lesson here that you should not miss.

Question 6

Comments are not for extended discussion; this conversation has been moved to chat.

Question 7

This answer has been selected as the winner of Best of Code Review 2016 — Best Newcomer (answer).

Question 8

Wow, thanks for the recognition and the donation of your own hard-earned reputation, @200!

Question 9

When realloc is called with a smaller than the original size then there is no requirement at all that the same pointer should be returned. Depending on the implementation, a different pointer may very well be returned. It's even possible to fail.

Question 10

Yes, that's true, @gnasher. I said as much: "we cannot actually shrink a block of memory in-place, so we either have to return the block unchanged (which is legal, because a block of memory is always allowed to be larger than necessary), or allocate a new smaller block, copy the portion of the original data that will fit, and return a pointer to this new shrunken block. The latter would actually be slower, so we'll avoid doing this extra work when possible in the current implementation." Oh, I see what you're talking about. That bullet point says the same pointer is returned. I'll fix it.

Question 11

Generally, malloc, realloc and free are all part of the same library. One of the things this allows is some 'behind the scenes' meta-data chicanery.

For example if you wanted to call malloc(16), the memory library might allocate 20 bytes of space, with the first 4 bytes containing the length of the allocation and then returning a pointer to 4 bytes past the start of the block.

When calling free, the library will subtract 4 bytes from the pointer passed in, to find the pointer to the length of the allocation.

realloc can also use this trick to find out how long the original allocation was.

So, in answer to your question, you can't only implement realloc, you must implement malloc and free also.

Also, if you have an original K&R, I believe you would find some realloc source in there too.

Question 12

The metadata is not a portable solution. C does not specify that "the library will subtract 4 bytes from the pointer passed in".

Question 13

Who said anything about portable? Notice, I said 'might' allocate. Of course the C specification doesn't specify this particular trick, but many implementations DO use this idea to include memory allocation meta-data. For example, the debug build of Microsoft's malloc includes things like, which file/line allocated the memory, so if you forget to free it, at program exit, it will list all the code that have hanging allocations. A particular library will implement it in it's own way, the only portable bit is the interface.

Question 14

I just checked chapter 8 of K&R and their implementation of malloc DOES use this trick, but their meta-data is a pointer (to the next block in the chain) and an unsigned (size of the block). ime.usp.br/~pf/Kernighan-Ritchie/C-Programming-Ebook.pdf

Question 15

As an example, MacOS X and iOS have an implementation that rounds up the size to the next multiple of 16 and returns a block of that size, without any overhead and no metadata stored near the block.

Question 16

You are checking if malloc succeeded before calling memcpy, but not before calling memset.

You are copying len bytes from ptr to the new memory location, but don't know how many bytes ptr actually points to. It probably points to less, this being a major reason why one would want to call realloc in the first place.

Question 17

Here are some simple annotations to highlight a major problem:

real = malloc(len); // This may or may not succeed. On failure, real is NULL
memset(real, 0, len); // This uses real on the assumption that it succeeded!
if (real) // Here, you acknowledge that it may have failed, even though you already used it!
 memcpy(real, ptr, len);

Question 18

There's a major bug: When realloc fails and returns a null pointer, the original pointer must be untouched. And certainly must not be free'd. So when realloc fails, the caller at least has the original data. Without that guarantee, realloc is basically useless.

Cody Gray Cody GrayCody Gray 4,56719 silver badges30 bronze badges · Accepted Answer · 2016-12-28 12:51:12Z

The code leaves a bad first impression because of how it is formatted. In particular, the extra whitespace between the return type and the function name, and that between the variable's type and its name, looks odd.
Unless you are writing code that needs to be compatible with pre-C99 compilers, you should break the habit of declaring variables at the top of the block and initializing them later. Instead, postpone the declaration of variables until the point when they can be initialized (to the extent possible). This helps to minimize bugs caused by uninitialized variables.
I recommend always using curly braces to create block scope after if statements, even if you currently only have one statement there. This minimizes the potential for serious bugs when you later go back and add some additional logic.
The return value of malloc must be checked immediately after the call to malloc! If malloc fails, it will return a null pointer, and any attempt to use a null pointer results in undefined behavior. Therefore, you need to move the call to memset to after you check the result of malloc, otherwise you risk undefined behavior.
In fact, you do not need to call memset at all. If you check the C language specification for information on the realloc function, you will see that it does not initialize the additional memory in the case where the new size is larger than the old size.
You are breaking the realloc specification in several other ways, too. If your goal is to simulate/reinvent realloc, then you absolutely must ensure that your implementation behaves in exactly the same way as realloc.

In fact, realloc is really a complete memory management subsystem wrapped in a single function—it does everything that malloc and free do—and more—making its semantics very complicated and confusing. Here's a summary of the relevant requirements for realloc from the standard:
- If the requested block size is smaller than the original size of the block, realloc either frees the unwanted memory at the end of the block and returns the input pointer unchanged, or allocates a new block of the appropriate size, frees the original block, and returns a pointer to this newly-allocated block.
- If the requested block size is larger than the original size of the block, realloc may allocate an expanded block at a new address and copy the contents of the original block into the new location. In this case, a pointer to the expanded block is returned, and the extended portion of the block is left uninitialized. Or, if possible, it may expand the original block in-place and return the input pointer unchanged.
- If realloc cannot satisfy a request to expand a block, it returns a null pointer and does not free the original block. (realloc will always succeed when you request to shrink a block.)
- If the input pointer is null, then realloc behaves exactly as if you had called malloc(size), returning a pointer to the newly-allocated block of the requested size, or a null pointer if the request could not be satisfied.
- If the requested size is 0 and the input pointer is non-null, then realloc behaves exactly as if you had called free(ptr), and always returns a null pointer.
- If the input pointer is null and the requested size is 0, then the result is undefined!
Whew! To be correct, your code needs to implement all of these conditions, precisely as documented.

Conversely, if you are not simulating realloc (and that's perfectly fine—maybe you're reinventing the wheel because you want different behavior in your program), then you need to use a different name so that other programmers are not fooled into thinking that it is identical to realloc and won't expect it to behave in the same way.
In fact, as you can see from the list of requirements above, it is actually impossible to reinvent realloc as required by the specification, since you do not have insider knowledge about how the standard library implements memory management. You need to be able to determine the original size of the memory block pointed to by ptr, and there is no portable way to do so.

Why do you need this information? Two reasons. First, you need to know whether the caller is requesting to shrink or expand the memory block so that you can follow the correct semantics. Second, you need to know how many bytes to copy when you call memcpy—right now, you are copying past the end of the original buffer, which is undefined behavior!

The first problem could be solved by altering the semantics of our function so that it only grows the size of a block, renaming it something like GrowBlock or ExpandMemory to make it clear that it is not identical to realloc. Unfortunately, there would be no reliable way to assert this requirement that len is greater than or equal to the current size of the memory block pointed to by ptr within the body of the function, so documentation alone would have to suffice, which is a very weak guarantee. Even more unfortunately, this won't solve the second problem—we still have no way to correctly call memcpy!

Therefore, our only real option is to modify the function's signature to accept the original size of ptr as a parameter (in addition to the desired new size that you already accept as a parameter).

All of that considered, here's how I'd write it:

// This function is similar to realloc() and implements the same semantics,
// except that the caller must explicitly pass the original size of the
// memory block pointed to by 'ptr', as well as the desired size.
void *my_realloc(void *ptr, size_t originalLength, size_t newLength)
{
 // Note that because we cannot rely on implementation details of the standard library,
 // we can never grow a block in place like realloc() can. However, it is semantically
 // equivalent to allocate a new block of the appropriate size, copy the original data
 // into it, and return a pointer to that new block. In fact, realloc() is allowed to
 // do this, as well. So we lose a possible performance optimization (that is rarely
 // possible in practice anyway), but correctness is still ensured, and the caller
 // never need be the wiser.
 // 
 // Likewise, we cannot actually shrink a block of memory in-place, so we either
 // have to return the block unchanged (which is legal, because a block of memory
 // is always allowed to be *larger* than necessary), or allocate a new smaller
 // block, copy the portion of the original data that will fit, and return a
 // pointer to this new shrunken block. The latter would actually be slower,
 // so we'll avoid doing this extra work when possible in the current implementation.
 // (You might need to change this if memory pressure gets out of control.)
 if (newLength == 0)
 {
 free(ptr);
 return NULL;
 }
 else if (!ptr)
 {
 return malloc(newLength);
 }
 else if (newLength <= originalLength)
 {
 return ptr;
 }
 else
 {
 assert((ptr) && (newLength > originalLength));
 void *ptrNew = malloc(newLength);
 if (ptrNew)
 {
 memcpy(ptrNew, ptr, originalLength);
 free(ptr);
 }
 return ptrNew;
 }
}

Indeed, this looks quite complicated—and it it isn't even as complicated as a real implementation of realloc! It's a good lesson about why realloc is actually bad design. You should not write massive multi-purpose functions, and you certainly shouldn't implement an entire subsystem in a single function! Instead, break the important behaviors out into separate functions. In this case, you might have four separate functions that allocate a block (AllocateMemory), free a block (FreeMemory), expand a block (ExpandMemory), and shrink a block (ShrinkMemory). This division of labor makes the implementation(s) much easier to write, easier to reason about, easier to include error-checking, and therefore easier to maintain and less likely to contain bugs. I realize your assignment was to write realloc, but there is a broader (perhaps inadvertent) lesson here that you should not miss.

Comments are not for extended discussion; this conversation has been moved to chat.
This answer has been selected as the winner of Best of Code Review 2016 — Best Newcomer (answer).
Wow, thanks for the recognition and the donation of your own hard-earned reputation, @200!
When realloc is called with a smaller than the original size then there is no requirement at all that the same pointer should be returned. Depending on the implementation, a different pointer may very well be returned. It's even possible to fail.
Yes, that's true, @gnasher. I said as much: "we cannot actually shrink a block of memory in-place, so we either have to return the block unchanged (which is legal, because a block of memory is always allowed to be larger than necessary), or allocate a new smaller block, copy the portion of the original data that will fit, and return a pointer to this new shrunken block. The latter would actually be slower, so we'll avoid doing this extra work when possible in the current implementation." Oh, I see what you're talking about. That bullet point says the same pointer is returned. I'll fix it.

Stack Exchange Network

Implementing realloc in C

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implementing realloc in C

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions