malloc & free
2022 May 07
See all posts
malloc & free
malloc & free
The post does walkthrough of a basic malloc/free implementation. It
won't contain all the optimizations and concerns which the excellent Douglas
Lea's Malloc aka dlmalloc covers. But the aim is to
highlight the underlying system calls and internal
structures which drives these memory management APIs.
brk and sbrk system calls
Resizing
the heap (for allocating or freeing memory) is all about getting the
kernel to adjust its idea of the process's program
break – which is essentially the current limit of the heap.
brk()
and sbrk()
system calls allow to
manipulate the program break, thereby creating or freeing space on the
heap.
#include <unistd.h>
void *
brk(const void *addr);
void *
sbrk(int incr);
In short, brk()
sets the program break to the input
addr. sbrk()
accepts offset and increments the current
program break by incr
. It returns the location of the last
program break i.e. start of the incremented heap segment.
You can get the current program break by using sbrk(0)
–
something we'll be using in the discussed implementation below.
How malloc & free works
With the brk/sbrk introduced, we can now start thinking about what
else is needed for a malloc/free implementation. For this, let's start
with their APIs:
void *
malloc(size_t size);
void
free(void *ptr);
A naive implementation
Let's start with malloc – it returns a pointer to a block of memory
of size
bytes. A naive implementation just increments the
program break.
void *malloc(size_t size) {
return sbrk(size);
}
This as you can imagine is horribly inefficient, as the program break
always goes up and is never reset to lower positions. As the process
proceeds, there will be heap segments (between &end
and
program break) that gets freed up (holes). But the naive implementation
above makes no use of those holes. In fact, a free()
here
doesn't make any sense, as the malloc never utilises the space freed
up.
Free list
It makes sense for the free()
to capture the deallocated
memory chunks for subsequent use by the malloc()
. The
chunks can be of different sizes, and we can find a chunk which fits the
requested size.
malloc'ing a free chunk means it's not available for further allocation
requests, until it is freed again. We define the following doubly linked
list struct which resolves these requirements (calling it free
list):
typedef struct free_l {
size_t chunk_size;
struct free_l* previous;
struct free_l* next;
} free_l;
free_l* head = NULL;
A malloc'ed segment looks like:
note that,
(actual) memory size allocated =
memory size requested (argument of `malloc`) + sizeof(size_t)
When the allocated segment is freed, the "memory size requested"
portion is used to store the previous
and next
free list entries.
Extending further on how this might work. Say we have a free list
entry f_entry
which is suitable to serve the malloc
request, we need to do the following: - remove the f_entry
from the free list - return &(f_entry->previous)
,
which is the memory addr at which chunk_size
ends and can
be used by user.
The Implementation
Let's look at the malloc implementation. This section breakdowns the
code. If you prefer to just read the complete code, have
a look at the gist
Starting with the case when free list is empty:
static const size_t CHUNK_SIZE_SIZE = sizeof(size_t);
// size needed for bookeeping requirement via free list entry
static const size_t FREE_L_SIZE = CHUNK_SIZE_SIZE + 2 * sizeof(void*);
void* malloc(size_t size) {
// blob = free list entry 'header' + requested memory
size_t blob_size = size + FREE_L_SIZE;
if (head == NULL) {
// free list is empty
free_l* entry = sbrk(blob_size);
if (entry == NULL) {
return NULL;
}
entry->chunk_size = blob_size - CHUNK_SIZE_SIZE;
entry->previous = entry->next = NULL;
return &entry->previous; // skip 'chunk_size'
}
// rest of the code is left for brevity
}
Notice the two static variables there. CHUNK_SIZE_SIZE
is what it takes to store the chunk_size
; While in a free
list, the size taken for the record keeping is two pointers + 1
size_t
variable, which reflects in
FREE_L_SIZE
.
The rest of the code is about adjusting the page break, setting the
values in free list entry and returning the address for the user.
Next we look at the case when the free list has some entries, and can
use be used to allocate the memory. This would involve traversing the
free list and finding the chunk which best fits the request. This is
called best-fit. Other algorithms like
first-fit can also be used.
// best fit strategy
free_l* best_candidate = NULL;
long int best_gap = LONG_MAX;
for (free_l* curr = head; curr != NULL; curr = curr->next) {
int hole_sz = curr->chunk_size - size;
if (hole_sz < best_gap && hole_sz >= 0) {
best_gap = hole_sz;
best_candidate = curr;
}
}
If we found a best_candidate
here, we pop it off the
free list, and return the &best_candidate->previous
as the address from which the requested size is allocated.
When there's no candidate found to serve the request (e.g. the requested
size doesn't fit any free chunk), we need to increment the current heap
limit (via brk/sbrk), and return the appropriate address.
if (best_candidate == NULL) {
// no slot available big enough to fit `size`. Adjust brk
free_l* new_slot = sbrk(blob_size);
if (new_slot == NULL) {
return NULL;
}
new_slot->chunk_size = blob_size - CHUNK_SIZE_SIZE;
return &new_slot->previous;
}
pop_from_free_list(best_candidate);
return &best_candidate->previous;
The implementation for pop_from_free_list
is given:
void pop_from_free_list(free_l* entry) {
if (entry->previous == NULL) {
// entry is first in the list, adjust the head to next entry
head = entry->next;
} else {
entry->previous->next = entry->next;
}
if (entry->next != NULL) {
entry->next->previous = entry->previous;
}
}
The following is the implementation for free()
. It does
a couple of things: - return the chunk to the free list - if the freed
memory corresponds to program break, it can be adjusted.
void free(void* ptr) {
if (ptr == NULL) {
return;
}
// return slot to the free list
ptr -= CHUNK_SIZE_SIZE;
free_l* entry_ptr = (free_l*)ptr;
printf("entry location to free: %10p\n", ptr);
entry_ptr->previous = NULL;
entry_ptr->next = head;
head = entry_ptr;
}
The full
gist has an additional optimization in free()
to lower
the program break if memory is freed from the top of heap.
Conclusion
Here are some notable optimizations/concerns
dlmalloc has (over the naive implementation above): -
it differentiates between large requests (>=512 bytes) and small
requests (<=64 bytes), and has various optimizations around it.
Example: The free list entry additionally maintains a pair of pointers
to navigate to the previous/next "large size" chunk. Or having a caching
allocator (which maintains pools of quickly recycled chunks) for serving
small requests. - memory tagging to differentiate between
allocated/de-allocated chunks, which is for debugging purposes. -
free()
coalesces consecutive chunks into a single free list
entry. - when utilizing an entry, malloc()
serves the
request with the exact memory required and creates a new/modified entry
for the leftover memory space.
Resources
malloc & free
2022 May 07 See all postsmalloc & free
The post does walkthrough of a basic malloc/free implementation. It won't contain all the optimizations and concerns which the excellent Douglas Lea's Malloc aka dlmalloc covers. But the aim is to highlight the underlying system calls and internal structures which drives these memory management APIs.
brk and sbrk system calls
Resizing the heap (for allocating or freeing memory) is all about getting the kernel to adjust its idea of the process's program break – which is essentially the current limit of the heap.
brk()
andsbrk()
system calls allow to manipulate the program break, thereby creating or freeing space on the heap.In short,
brk()
sets the program break to the input addr.sbrk()
accepts offset and increments the current program break byincr
. It returns the location of the last program break i.e. start of the incremented heap segment.You can get the current program break by using
sbrk(0)
– something we'll be using in the discussed implementation below.How malloc & free works
With the brk/sbrk introduced, we can now start thinking about what else is needed for a malloc/free implementation. For this, let's start with their APIs:
A naive implementation
Let's start with malloc – it returns a pointer to a block of memory of
size
bytes. A naive implementation just increments the program break.This as you can imagine is horribly inefficient, as the program break always goes up and is never reset to lower positions. As the process proceeds, there will be heap segments (between
&end
and program break) that gets freed up (holes). But the naive implementation above makes no use of those holes. In fact, afree()
here doesn't make any sense, as the malloc never utilises the space freed up.Free list
It makes sense for the
free()
to capture the deallocated memory chunks for subsequent use by themalloc()
. The chunks can be of different sizes, and we can find a chunk which fits the requested size.malloc'ing a free chunk means it's not available for further allocation requests, until it is freed again. We define the following doubly linked list struct which resolves these requirements (calling it free list):
A malloc'ed segment looks like:
note that,
When the allocated segment is freed, the "memory size requested" portion is used to store the
previous
andnext
free list entries.Extending further on how this might work. Say we have a free list entry
f_entry
which is suitable to serve the malloc request, we need to do the following: - remove thef_entry
from the free list - return&(f_entry->previous)
, which is the memory addr at whichchunk_size
ends and can be used by user.The Implementation
Let's look at the malloc implementation. This section breakdowns the code. If you prefer to just read the complete code, have a look at the gist
Starting with the case when free list is empty:
Notice the two static variables there.
CHUNK_SIZE_SIZE
is what it takes to store thechunk_size
; While in a free list, the size taken for the record keeping is two pointers + 1size_t
variable, which reflects inFREE_L_SIZE
.The rest of the code is about adjusting the page break, setting the values in free list entry and returning the address for the user.
Next we look at the case when the free list has some entries, and can use be used to allocate the memory. This would involve traversing the free list and finding the chunk which best fits the request. This is called best-fit. Other algorithms like first-fit can also be used.
If we found a
best_candidate
here, we pop it off the free list, and return the&best_candidate->previous
as the address from which the requested size is allocated.When there's no candidate found to serve the request (e.g. the requested size doesn't fit any free chunk), we need to increment the current heap limit (via brk/sbrk), and return the appropriate address.
The implementation for
pop_from_free_list
is given:The following is the implementation for
free()
. It does a couple of things: - return the chunk to the free list - if the freed memory corresponds to program break, it can be adjusted.The full gist has an additional optimization in
free()
to lower the program break if memory is freed from the top of heap.Conclusion
Here are some notable optimizations/concerns dlmalloc has (over the naive implementation above): - it differentiates between large requests (>=512 bytes) and small requests (<=64 bytes), and has various optimizations around it. Example: The free list entry additionally maintains a pair of pointers to navigate to the previous/next "large size" chunk. Or having a caching allocator (which maintains pools of quickly recycled chunks) for serving small requests. - memory tagging to differentiate between allocated/de-allocated chunks, which is for debugging purposes. -
free()
coalesces consecutive chunks into a single free list entry. - when utilizing an entry,malloc()
serves the request with the exact memory required and creates a new/modified entry for the leftover memory space.Resources