Back to Blog

the memory problem

exploring different solutions of how to deal with memory.

Context

Memory had always been one of the main issues in the world of programming, it could completely change the shape and design of your program, force you to re-consider your entire design top to bottom. and over the years developers have come up with many solutions, tricks and helpers to minimize these problems as much as possible, and we are about to discover some of the most remarkable and creative solutions that not only make dealing with memory much more faster and easier, but sometimes even better than manually handling memory yourself.

Historical Context

in the early days, there wasn't even a thing such as dynamic memory decided at runtime. people used to write programs and explicitly decide ahead-of-time the amount of memory that is needed, it would be pre-allocated entirely in the stack.

in 1960 one of the most influential programming languages was released, Lisp. this language came up with a lot of revolutionary ideas, and one of them was a garbage collector.

C would come much later on and it would introduce malloc(). which would become the most reliable way of asking the operating system for memory at that time, giving the developers direct control of the memory we use.

of course, this came with its own set of problems which would partially handled in C++, and later on by other languages in many different ways.

Overall i wanted to share some of the important concepts that I've learned about memory.

Techniques

Manual Handling [ C ]

Of course, this is the classical approach of all, the strategy is pretty much simple: allocate the memory that you need at the time that you need it, and the moment the lifetime of that object is over you could just free/delete that memory.

this strategy completely trusts and relies on the developers to avoid issues such as memory leaks and use-after-free, but it has the benefit of giving you more control over the object's lifetime, but it somewhat has some bright sides alongside the fantasy of being a control freak, which is explicitly and clearly defining the lifetime of each object in your program.

Memory Tracking [ Electric Fence / Purify / Valgrind ]

with a lot of issues that come with manual handling, people started creating memory trackers that could tell you exactly where you could possibly be experiencing data leaks, those would be somewhat modified version of malloc that allow you to customize and track every allocation.

these can help detect memory leaks, and prevent you from a use after free. and with the help of those, you can detect if objects belong in their actual intended lifetime.

Memory Pool/Arena [ C / Game Engines / Zig ]

this one is my favorite of all time, this technique was not only created as a solution for the problem of handling memory itself but also to solve optimization. because calling malloc() each time would be too demanding for many high performance applications and video games, developers created this brilliant technique which allocates an entire arena of memory, in which you could use to put all of your data inside then free it at once. not only that it saves up resources but it also prevents memory leaks since you wont have to worry about freeing each allocation.

Some variants include:

but it does have some limitations, for example if i wanted to create an object with a limited lifetime then i know for sure that i shouldn't be using that.

for games its definitely the choice because they would have something like a limited fixed size of the maximum number of mobs that could spawn, and so they would allocate memory previously for that maximum number of mobs.

if you were short on memory, and had to free up memory each time for another object that could have a completely different size then you wont have much of a choice, or had a very large allocation that exceeds the size of your arena, or you wanted to specify the lifetime of an object in which case you can't because you wont be able to free that object.

there is also alignment and memory efficiency issues because Arenas tend to be large and also tend to lean into having a single alignment pattern and don't allow freeing and re-using the freed memory for another object. also increased complexity because of the many variants which leads to more complex decision making, it could also make it more difficult to define each object's lifetime.

Reference Counting

the idea behind it is very simple keep tracking of references for the object, if there is no more objects referencing it, delete the object. you can find many examples of that, such as std::shared_ptr in C++, in which case it keeps a count of every reference to the address starting with 1.

it does have a few niche cases, like if two objects were pointing to each other then they would never be freed you should be careful not to fall in a situation for example with a doubly linked list while utilizing reference counting. and it also comes with a slight overhead.

RAII (Resource Acquisition Is Initialization) [C++]

This one is one of the most advanced so far, the idea is to keep track of each object from the moment of its creation, until it gets deleted or is no longer reachable. and its not being used just for memory, it also works on any type of resources that need to be released, such as file handles and locks. this concept was introduced in C++, and played an important role in shaping the ownership model in rust as well.

for cons: - hidden control flow - same as Arenas if your context happens to be too large then objects that should've been freed up or destroyed a long time ago would continue to live on silently. - construction and destruction could possibly fail, if not handled properly it could lead to further issues.

Garbage Collection [ Lisp ]

this one is the most widespread one among most languages, and the idea is basically having a background mechanism within the language's runtime keeping track of every single element and freeing them once they are unreachable.

the downsides is that it introduces runtime overhead and tends to be unpredictable (depending on the GC model/design). there are many language that use this, javascript, java, golang, python, ruby, Lisp etc...

Ownership and Borrowing Model [ Rust ]

each allocated block of memory have an owner, when the owner dies the memory dies with it. only the owner could access that memory unless another object borrows the memory or had that memory ownership transferred to them.

this one is also advanced, because its enforced by the compiler which directly means that you have to think VERY carefully about the lifetime of your objects, and in many cases adjust the design of your program for its own safety.

Pluggable Memory Allocators [ Zig / C++17 ]

Alexander Stepanov introduced std::allocator as a parameter in the C++ STD, allowing containers to use different allocation strategies at compile time, but most standard C++ libraries implementations ignored the custom allocator. until much later on when std::pmr was introduced in C++17.

languages like Zig and Odin on the other hand were built and utilized this idea in its foundation and standard library from the get go.

the idea is simple: instead of creating specific function/api calls for allocating memory in your program you could simply create objects called "Allocator" in which you could replace at any point of your program.

ex: i could make a general purpose allocator that works like malloc, i could keep it as it is or add a debugging mode for it where it would warn me about leaks during testing. or a page_allocator, which allocates memory in chunks of the OS's page-sizes.

this allows you to use the same interface across the entire program without really having to worry much about allocation, and only later on decide which allocator is best used in which case with only minor modifications, or if you needed to switch behavior during runtime and decide which allocation technique would be best used.

Notes:

there is something strange about memory, there doesn't seem to be a very reliable API or function similar to malloc for retrieving memory from the operating system in order to use before and much later after LISP, which for context was the first language to implement a garbage collector

im not really sure about the reason for it, but it seems that garbage collection very much existed before manual management ? or at least that manual memory management was so much different back in time in comparison to after C and much later to today as well.

Conclusion:

these are mostly techniques i experienced during my 2 years of programming, i might've not covered everything.

what i presented so far was just my usual experience with memory, "the memory problem" is actually much more complicated than just how developers allocating and freeing memory, it expands to more complex concepts like virtual memory, caches, mmap, copy-on-write, OOM...