Doubly linked list first fit free list malloc/free in Python

Question 1

As an exercise I've implemented malloc and free in Python as a first fit free list as described here. This tracks which blocks are free in a doubly linked list that is sorted by the address of the first byte of the block. It also keeps track of the size of each allocated block in a dictionary. The time complexity is \$O(N)\$ for both malloc and free, where \$N\$ is the number of free blocks.

from typing import Optional
class Block():
 def __init__(self, size: int, address: int):
 self.address: int = address
 self.size = size
 self.prev: Optional[Block] = None
 self.next: Optional[Block] = None
 def __str__(self):
 return f"(start {self.address}, size {self.size})"
 def __repr__(self):
 return f"(start {self.address}, size {self.size})"
class Heap():
 def __init__(self, size: int):
 self.size = size
 self.head: Optional[Block] = Block(size, 0)
 self.allocation_headers: dict[int, int] = {}
 def num_free_blocks(self) -> int:
 block = self.head
 total = 0
 while block:
 total += 1
 block = block.next
 return total
 def free_size(self) -> int:
 block = self.head
 total = 0
 while block:
 total += block.size
 block = block.next
 return total
 def total_size(self) -> int:
 free = self.free_size()
 allocated = sum(self.allocation_headers.values())
 return allocated + free
 def malloc(self, size: int) -> int:
 if size <= 0:
 raise Exception
 # Sort the linked list by address
 block = self.head
 while block:
 if block.size >= size:
 self.allocation_headers[block.address] = size
 return_address = block.address
 if block.size == size: # remove the block
 if block.prev:
 block.prev.next = block.next
 if block.next:
 block.next.prev = block.prev
 if self.head == block:
 self.head = block.next
 else: # make the block smaller
 block.address += size
 block.size -= size
 return return_address
 block = block.next
 raise Exception
 def free(self, ptr: int):
 """
 Take an address pointer ptr and free it.
 All we need to do to free is to add an element to the free list.
 """
 free_size = self.allocation_headers[ptr]
 del self.allocation_headers[ptr]
 prev = None
 block = self.head
 while block and block.address < ptr:
 prev = block
 block = block.next
 new_block = Block(free_size, ptr)
 if not self.head:
 self.head = new_block
 if prev:
 prev.next = new_block
 new_block.prev = prev
 if block:
 block.prev = new_block
 new_block.next = block
 if self.head == block:
 self.head = new_block
 # coalesce next
 if new_block.next and new_block.next.address == (new_block.address +
 new_block.size):
 new_block.size += new_block.next.size
 new_block.next = new_block.next.next
 if new_block.next:
 new_block.next.prev = new_block
 # coalesce prev
 if new_block.prev and new_block.address == (new_block.prev.address +
 new_block.prev.size):
 new_block.prev.size += new_block.size
 new_block.prev.next = new_block.next
 if new_block.prev.next:
 new_block.prev.next.prev = new_block.prev

Example usage:

def test_case_1():
 heap = Heap(1000)
 assert (heap.total_size() == 1000)
 assert (heap.num_free_blocks() == 1)
 assert (heap.free_size() == 1000)
 a = heap.malloc(100)
 assert (heap.num_free_blocks() == 1)
 assert (heap.free_size() == 900)
 assert (a == 0)
 b = heap.malloc(500)
 assert (heap.num_free_blocks() == 1)
 assert (heap.free_size() == 400)
 assert (b == 100)
 try:
 heap.malloc(950)
 except:
 pass
 heap.free(b)
 assert (heap.num_free_blocks() == 1)
 assert (heap.free_size() == 900)
 heap.free(a)
 try:
 heap.free(a)
 except:
 pass
 heap.malloc(950)
 print("Test case 1 succeeded!")

I believe this implementation is correct as I've fuzzed it with hundreds of millions of random inputs and all of these conditions were maintained:

 def perform_checks(self):
 forward_blocks: list[Block] = []
 forward_addresses: list[int] = []
 reverse_blocks: list[Block] = []
 reverse_addresses: list[int] = []
 tail = None
 block = self.head
 forward_length = 0
 forward_free = 0
 last_address = -1
 while block: # zoom to the end
 assert (block.address > last_address)
 forward_addresses.append(block.address)
 forward_blocks.append(block)
 forward_length += 1
 forward_free += block.size
 if not block.next:
 tail = block
 last_address = block.address
 block = block.next
 reverse_length = 0
 reverse_free = 0
 last_address = 1000000000
 if self.head is not None:
 assert (tail is not None)
 while tail:
 assert (tail.address < last_address)
 reverse_blocks.append(tail)
 reverse_addresses.append(tail.address)
 reverse_length += 1
 reverse_free += tail.size
 last_address = tail.address
 tail = tail.prev
 assert (
 forward_length == reverse_length
 ), f"Forward length of {forward_length}, Reverse length of {reverse_length}"
 reverse_blocks.reverse()
 assert (forward_blocks == reverse_blocks)
 assert (forward_free == reverse_free)
 assert (self.total_size() == self.size
 ), f"Total size {self.total_size()}, size {self.size}"
 assert (forward_length == len(forward_addresses))

However, I would like suggestions on improving the simplicity of the code.

The time complexity and block fragmentation could be improved by switching to a best-fit red/black tree, but let's ignore that. Assuming a flat left-to-right heap, how can we improve this and maintain the \$O(N)\$ time complexity? Could the code be dramatically cleaned up by using a Python built-in data structure?

Question 2

Trivial DRY: __repr__() should simply return __str__(). If in future they diverge, then so be it.

Type hint nit:

 self.address: int = address

Maybe we don't need the int hint? Since address very helpfully offers it.

And yes, definitely keep the hint in the signature. It is most visible and most helpful there.

In general, I am loving your optional type hints, keep it up! I imagine mypy would be happy with such an input file.

If I could criticize the whole Optional[Block] thing for a moment. Maybe have your machinery always allocate a 1-byte block? So there is always something in there? (Yeah, I know, we leaked one byte of storage. We'll get over it.)

 self.allocation_headers: dict[int, int] = {}

I confess I do not understand the meaning of that identifier. It seems to me that self.allocation would be the natural name for such a mapping.

I have no idea what the usage patterns on num_free_blocks() and free_size() are. If, in some realistic workload, profiling reveals they are called "a lot", consider making them take O(1) constant time. That is, we might choose to have the mutators maintain statistics which could be immediately returned.

But let's return to the current O(n) linear implementation. The two functions are nearly identical. I am a little bit sad that we don't have some convenient iterator or other helper that they could rely on.

I am reading malloc.

 if size <= 0:

I am reading the spec

If the size of the space requested is 0, the behavior is implementation-defined: the value returned shall be either a null pointer or a unique pointer.

Recommend you name your function something other than malloc if you're not keen to support malloc semantics.

Also, we can do much better than raising a generic Exception. Go to the trouble of subclassing it so caller will see a diagnostic error specific to your module. In particular, caller's try clause should be able to focus on just an AllocationError.

This is correct:

 if block.size == size: # remove the block

But consider relaxing that equality criterion.

Suppose that caller is creating random strings with length between 10 and 20 characters that must be allocated. If we discretize allocations to, say, multiple of 4, then the if would find more opportunities to remove a block.

In deployed systems, discretizing to logarithmic boundaries tends to be more practical.

In free there is a while loop and several ifs. Please push them down into a helper function which computes prev and block and does some mutation.

Thank you for providing automated unit tests.

Please express the +inf constant as 1_000_000_000 or int(1e9).

Please improve PEP-8 readability, perhaps by running the source through black.

I've fuzzed it with various random inputs

Potential off by one errors abound.

Consider making hypothesis part of your test suite. Certainly it has taught me amazing things I never would have believed before.

Overall?

This is good code that achieves its design goals.

I would be happy to accept or delegate maintenance tasks for it.

Coalescing allocated chunks, at logarithmic granularity, seems the biggest opportunity for combating the somewhat ugly O(n) "linear cost with size of free list" overhead. There is a rich literature describing slab, buddy, and related allocators, likely far beyond the interests of this project.

Pick an example workload, benchmark these library functions, describe the results, and see where you'd like to go from there!

Question 3

What a phenomenal review, thank you very much @J_H ! All of these suggestions are great. The num_free_blocks(), total_size(), and free_size() methods were intended only for testing that the linked list and dictionary maintained sanity in the unit tests, I should've noted this in the code. I did run the code through YAPF using the default config, this may deviate from PEP8 / I should've added a docstring to every function/method. The hypothesis library looks very interesting, I'll try it out.

Question 4

The idea of discretizing the allocations is super interesting. Are we capturing this case with the else block: else: # make the block smaller? Here we don't remove a block, but we do break up a larger block into the allocated portion and the leftover free space. I believe the buddy allocation method still has a malloc time complexity of O(N). Next I'll try the red/black tree implementation of the free list, which should be O(log N) for both malloc() and free()

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2023-02-01 02:11:22Z

Trivial DRY: __repr__() should simply return __str__(). If in future they diverge, then so be it.

Type hint nit:

 self.address: int = address

Maybe we don't need the int hint? Since address very helpfully offers it.

And yes, definitely keep the hint in the signature. It is most visible and most helpful there.

In general, I am loving your optional type hints, keep it up! I imagine mypy would be happy with such an input file.

If I could criticize the whole Optional[Block] thing for a moment. Maybe have your machinery always allocate a 1-byte block? So there is always something in there? (Yeah, I know, we leaked one byte of storage. We'll get over it.)

 self.allocation_headers: dict[int, int] = {}

I confess I do not understand the meaning of that identifier. It seems to me that self.allocation would be the natural name for such a mapping.

I have no idea what the usage patterns on num_free_blocks() and free_size() are. If, in some realistic workload, profiling reveals they are called "a lot", consider making them take O(1) constant time. That is, we might choose to have the mutators maintain statistics which could be immediately returned.

But let's return to the current O(n) linear implementation. The two functions are nearly identical. I am a little bit sad that we don't have some convenient iterator or other helper that they could rely on.

I am reading malloc.

 if size <= 0:

I am reading the spec

If the size of the space requested is 0, the behavior is implementation-defined: the value returned shall be either a null pointer or a unique pointer.

Recommend you name your function something other than malloc if you're not keen to support malloc semantics.

Also, we can do much better than raising a generic Exception. Go to the trouble of subclassing it so caller will see a diagnostic error specific to your module. In particular, caller's try clause should be able to focus on just an AllocationError.

This is correct:

 if block.size == size: # remove the block

But consider relaxing that equality criterion.

Suppose that caller is creating random strings with length between 10 and 20 characters that must be allocated. If we discretize allocations to, say, multiple of 4, then the if would find more opportunities to remove a block.

In deployed systems, discretizing to logarithmic boundaries tends to be more practical.

In free there is a while loop and several ifs. Please push them down into a helper function which computes prev and block and does some mutation.

Thank you for providing automated unit tests.

Please express the +inf constant as 1_000_000_000 or int(1e9).

Please improve PEP-8 readability, perhaps by running the source through black.

I've fuzzed it with various random inputs

Potential off by one errors abound.

Consider making hypothesis part of your test suite. Certainly it has taught me amazing things I never would have believed before.

Overall?

This is good code that achieves its design goals.

I would be happy to accept or delegate maintenance tasks for it.

Coalescing allocated chunks, at logarithmic granularity, seems the biggest opportunity for combating the somewhat ugly O(n) "linear cost with size of free list" overhead. There is a rich literature describing slab, buddy, and related allocators, likely far beyond the interests of this project.

Pick an example workload, benchmark these library functions, describe the results, and see where you'd like to go from there!

What a phenomenal review, thank you very much @J_H ! All of these suggestions are great. The num_free_blocks(), total_size(), and free_size() methods were intended only for testing that the linked list and dictionary maintained sanity in the unit tests, I should've noted this in the code. I did run the code through YAPF using the default config, this may deviate from PEP8 / I should've added a docstring to every function/method. The hypothesis library looks very interesting, I'll try it out.
The idea of discretizing the allocations is super interesting. Are we capturing this case with the else block: else: # make the block smaller? Here we don't remove a block, but we do break up a larger block into the allocated portion and the leftover free space. I believe the buddy allocation method still has a malloc time complexity of O(N). Next I'll try the red/black tree implementation of the free list, which should be O(log N) for both malloc() and free()

Stack Exchange Network

Doubly linked list first fit free list malloc/free in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Doubly linked list first fit free list malloc/free in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions