malloc & free

The post does walkthrough of a basic malloc/free implementation. It won’t contain all the optimizations and concerns which the excellent Douglas Lea’s Malloc aka dlmalloc covers. But the aim is to highlight the underlying system calls and internal structures which drives these memory management APIs.

brk and sbrk system calls

Resizing the heap (for allocating or freeing memory) is all about getting the kernel to adjust its idea of the process’s program break – which is essentially the current limit of the heap. brk() and sbrk() system calls allow to manipulate the program break, thereby creating or freeing space on the heap.

#include <unistd.h>

void *
brk(const void *addr);

void *
sbrk(int incr);

In short, brk() sets the program break to the input addr. sbrk() accepts offset and increments the current program break by incr. It returns the location of the last program break i.e. start of the incremented heap segment.
You can get the current program break by using sbrk(0) – something we’ll be using in the discussed implementation below.

How malloc & free works

With the brk/sbrk introduced, we can now start thinking about what else is needed for a malloc/free implementation. For this, let’s start with their APIs:

void *
malloc(size_t size);

void
free(void *ptr);

A naive implementation

Let’s start with malloc – it returns a pointer to a block of memory of size bytes. A naive implementation just increments the program break.

void *malloc(size_t size) {
    return sbrk(size);
}

This as you can imagine is horribly inefficient, as the program break always goes up and is never reset to lower positions. As the process proceeds, there will be heap segments (between &end and program break) that gets freed up (holes). But the naive implementation above makes no use of those holes. In fact, a free() here doesn’t make any sense, as the malloc never utilises the space freed up.

Free list

It makes sense for the free() to capture the deallocated memory chunks for subsequent use by the malloc(). The chunks can be of different sizes, and we can find a chunk which fits the requested size.
malloc’ing a free chunk means it’s not available for further allocation requests, until it is freed again. We define the following doubly linked list struct which resolves these requirements (calling it free list):

typedef struct free_l {
  size_t chunk_size;
  struct free_l* previous;
  struct free_l* next;
} free_l;

free_l* head = NULL;

A malloc’ed segment looks like:

note that,

(actual) memory size allocated = 
memory size requested (argument of `malloc`) + sizeof(size_t)

When the allocated segment is freed, the “memory size requested” portion is used to store the previous and next free list entries.

Extending further on how this might work. Say we have a free list entry f_entry which is suitable to serve the malloc request, we need to do the following:

  • remove the f_entry from the free list
  • return &(f_entry->previous), which is the memory addr at which chunk_size ends and can be used by user.

The Implementation

Let’s look at the malloc implementation. This section breakdowns the code. If you prefer to just read the complete code, have a look at the gist
Starting with the case when free list is empty:

static const size_t CHUNK_SIZE_SIZE = sizeof(size_t);
// size needed for bookeeping requirement via free list entry
static const size_t FREE_L_SIZE = CHUNK_SIZE_SIZE + 2 * sizeof(void*);

void* malloc(size_t size) {
  // blob = free list entry 'header' + requested memory
  size_t blob_size = size + FREE_L_SIZE;

  if (head == NULL) {
    // free list is empty
    free_l* entry = sbrk(blob_size);
    if (entry == NULL) {
      return NULL;
    }
    entry->chunk_size = blob_size - CHUNK_SIZE_SIZE;
    entry->previous = entry->next = NULL;
    return &entry->previous;  // skip 'chunk_size'
  }
  // rest of the code is left for brevity
}

Notice the two static variables there. CHUNK_SIZE_SIZE is what it takes to store the chunk_size; While in a free list, the size taken for the record keeping is two pointers + 1 size_t variable, which reflects in FREE_L_SIZE.
The rest of the code is about adjusting the page break, setting the values in free list entry and returning the address for the user.

Next we look at the case when the free list has some entries, and can use be used to allocate the memory. This would involve traversing the free list and finding the chunk which best fits the request. This is called best-fit. Other algorithms like first-fit can also be used.

// best fit strategy
free_l* best_candidate = NULL;
long int best_gap = LONG_MAX;
for (free_l* curr = head; curr != NULL; curr = curr->next) {
    int hole_sz = curr->chunk_size - size;
    if (hole_sz < best_gap && hole_sz >= 0) {
        best_gap = hole_sz;
        best_candidate = curr;
    }
}

If we found a best_candidate here, we pop it off the free list, and return the &best_candidate->previous as the address from which the requested size is allocated.
When there’s no candidate found to serve the request (e.g. the requested size doesn’t fit any free chunk), we need to increment the current heap limit (via brk/sbrk), and return the appropriate address.

if (best_candidate == NULL) {
    // no slot available big enough to fit `size`. Adjust brk
    free_l* new_slot = sbrk(blob_size);
    if (new_slot == NULL) {
        return NULL;
    }
    new_slot->chunk_size = blob_size - CHUNK_SIZE_SIZE;

    return &new_slot->previous;
}

pop_from_free_list(best_candidate);
return &best_candidate->previous;

The implementation for pop_from_free_list is given:

void pop_from_free_list(free_l* entry) {
  if (entry->previous == NULL) {
    // entry is first in the list, adjust the head to next entry
    head = entry->next;
  } else {
    entry->previous->next = entry->next;
  }

  if (entry->next != NULL) {
    entry->next->previous = entry->previous;
  }
}

The following is the implementation for free(). It does a couple of things:

  • return the chunk to the free list
  • if the freed memory corresponds to program break, it can be adjusted.
void free(void* ptr) {
  if (ptr == NULL) {
    return;
  }

  // return slot to the free list
  ptr -= CHUNK_SIZE_SIZE;
  free_l* entry_ptr = (free_l*)ptr;
  printf("entry location to free: %10p\n", ptr);
  entry_ptr->previous = NULL;
  entry_ptr->next = head;
  head = entry_ptr;
}

The full gist has an additional optimization in free() to lower the program break if memory is freed from the top of heap.

Conclusion

Here are some notable optimizations/concerns dlmalloc has (over the naive implementation above):

  • it differentiates between large requests (>=512 bytes) and small requests (<=64 bytes), and has various optimizations around it. Example: The free list entry additionally maintains a pair of pointers to navigate to the previous/next “large size” chunk. Or having a caching allocator (which maintains pools of quickly recycled chunks) for serving small requests.
  • memory tagging to differentiate between allocated/de-allocated chunks, which is for debugging purposes.
  • free() coalesces consecutive chunks into a single free list entry.
  • when utilizing an entry, malloc() serves the request with the exact memory required and creates a new/modified entry for the leftover memory space.

Resources