Python and Memory Management, Part 2: Heap Memory

Python and Memory Management, Part 2: Heap Memory

# computerscience# linux# python# tutorial
Python and Memory Management, Part 2: Heap MemoryJa'far Khakpour

In previous section, I talked about how objects are stored in memory and how they are stored in...

In previous section, I talked about how objects are stored in memory and how they are stored in memory.
In this part we will talk about memeory regions and how Python allocate/deallocate memory.

Python's Memory Sections: Where Objects Live?

When you run a Python program, the operating system allocates several memory regions. In Linux, you can use this code to see regions of the python script you create (you need to save it as a file and execute it):

#!/usr/bin/env python3
import os, sys

def main():
    pid = os.getpid()
    with open(f"/proc/{pid}/maps") as f:
        for line in f:
            line = line.strip()
            parts = line.split()
            if len(parts) < 5: continue

            s, e = [int(x,16) for x in parts[0].split('-')]
            print(f"{line:70}")

            for name, obj in vars.items():
                if s <= id(obj) < e:
                    print(f"  >>>>> Found: {name} ({id(obj):#x}) in this block")

# Objects to track
vars = {
    "True": True,
    "small number": 4,
    "large number": 2**123,
    "small string": "a", 
    "large string": "a"*2000,
    "int class": int, 
    "sys module": sys,
    "main function": main,
}
if __name__ == "__main__":    
    main()
Enter fullscreen mode Exit fullscreen mode

This code is showing memory maps from /proc//maps. It is showing each memory block and if any of values in vars dictionary are stored in any of these blocks. These memory blocks have different types:

1. Executable Code Region (Read-Only)

This is the code (instruction) section of program, which OS creates a read-only and private (r--p permission) memory block and some of them are executable code (r-xp):

00400000-0041f000 r--p 00000000 08:11 545993 /usr/bin/python3.14
0041f000-006d2000 r-xp 0001f000 08:11 545993 /usr/bin/python3.14
006d2000-00945000 r--p 002d2000 08:11 545993 /usr/bin/python3.14
...
7f3d6c6ab000-7f3d6c6d1000 r--p 00000000 08:11 541316 /usr/lib/x86_64-linux-gnu/libc.so.6
7f3d6c6d1000-7f3d6c827000 r-xp 00026000 08:11 541316 /usr/lib/x86_64-linux-gnu/libc.so.6
...
Enter fullscreen mode Exit fullscreen mode

These sections are related to Python binary code, or libraries (shared objects) used by Python Interpreter.

2. Data Segment (Read-Write Data)

what we see here is a memory block from Python Executabele which contains: True, 4 (immutable number which is initialized in memory when Python starts), and type int. These are part of python executable on the file system:

00946000-00a85000 rw-p 00545000 08:11 545993 /usr/bin/python3.14
  >>>>> Found: True (0x95a0a0) in this block
  >>>>> Found: small number (0xa5bb08) in this block
  >>>>> Found: int class (0x958cc0) in this block
Enter fullscreen mode Exit fullscreen mode

3. Heap Region (Dynamic Allocation)

OS managed heap of the application. Objects allocated with C's malloc are placed here. Generally, object with memory requrement more than 512 bytes are located here.

1039c000-10449000 rw-p 00000000 00:00 0                                  [heap]
  >>>>> Found: large string (0x10406840) in this block
Enter fullscreen mode Exit fullscreen mode

4. Anonymous Memory Mappings (Python's pymalloc)

These include memory blocks used for storing data in RAM called Arena. Arenas have a spceific structure and help manage small objects (<512 bytes) easier than OS managed heap. They are like pre-allocated memory spaces Python reserve for future usage. (For memory management in python, there are some posts like this post which explain memory structure in detail, for Arena, pool based allocation and why it is efficient, look at this series)
You can find a lot of object types here:

7f75e62bf000-7f75e6400000 rw-p 00000000 00:00 0                       
  >>>>> Found: small string (0x7f75e62ec730) in this block
  >>>>> Found: main function (0x7f75e62d85e0) in this block
  >>>>> Found: large number (0x7f75e67e3f30) in this block
  >>>>> Found: sys module (0x7f75e67a2ca0) in this block
Enter fullscreen mode Exit fullscreen mode

5. Stack Region (Function Calls)

Used for C function call stack (not Python objects!). Python objects are never stored here - only C frames and local C variables.

7fff20372000-7fff20394000 rw-p 00000000 00:00 0 [stack]
Enter fullscreen mode Exit fullscreen mode

6. Kernel Interface Regions

These are OS related regions. They allow the application talk to OS faster without need to context switch in CPU, and they are out of this post's scope.

7fff203b8000-7fff203bc000 r--p 00000000 00:00 0 [vvar]
7fff203bc000-7fff203be000 r-xp 00000000 00:00 0 [vdso]
Enter fullscreen mode Exit fullscreen mode

Now we have an idea on how Python allocates memory for a variable. In next section we will talk about how these variables are garbase collected in Python.

How Python Manages Garbage Collection with Reference Counting

We already know that every Python object has a counter tracking how many references point to it. You remember PyObjects from first part of these series:

// CPython's PyObject structure
typedef struct _object {
    Py_ssize_t ob_refcnt;    // Reference count field
    PyTypeObject *ob_type;   // Type pointer
    // ... type-specific data
} PyObject;
Enter fullscreen mode Exit fullscreen mode

Let's play with this concept a little:

import sys
import weakref

class Node:
    name = None
    child = None
    def __init__(self, Name):
        self.name = Name
    def __repr__(self):
        return f"Node: {self.name}"
a = Node('a') # reference count = 1
b = Node('b')
a_ref = weakref.ref(a) # weak reference keeps the reference, but does not increase refcount
b_ref = weakref.ref(b)
a.child = b
b.child = a # reference count = 2

print("All refs a, b, and a_ref asigned", f"{sys.getrefcount(a_ref())=}") # 2 + 1 x argument to getrefcount = 3
del a
print("After varialble a deleted", f"{sys.getrefcount(a_ref())=}") # reference count = 1
del b
print("After both varialble deleted", f"{sys.getrefcount(a_ref())=}") # reference count = 0, object freed. But Refcount is still 1!
print("After both varialble deleted", f"{sys.getrefcount(b_ref())=}") # same results
Enter fullscreen mode Exit fullscreen mode

What is happening here? After deleting objects a and b, we still see a reference pointing to the object? Add these two lines to bottom of above code to see what is still refering to a:

import gc # Garbage Collector Interface Module
print(f"Referres to a: {gc.get_referrers(a_ref())}")
Enter fullscreen mode Exit fullscreen mode

These references are a.child -> b and b.child -> a references. we have deletd a, but reference a.child exists, so b will be alive, and for the same reason b exists in memory.

How does Python handle such cases? Well, Python scans all objects in memory and marks objects as reachable/unreachable. This algorithm starts from some root variables which Python is sure about their existence (like global variables and local variables), and searches for values in memory referred by them to mark as reachable, and repets this lookup for these reachable objects. At the end, every object not tagged as reachable in this algorithm will be garbage collected.

This algorithm is not possible to run after every assignment, or after each refcount becoming zero. So, Python can't run this algorithm after any single assigment in python. Interpreter runs it after reaching a specific amount of assigments as the threshold:

import gc

print(f"GC generations: {gc.get_threshold()}") # Default: (700, 10, 10), but you can change them
Enter fullscreen mode Exit fullscreen mode

There are three numbers:

  • Genration 0: New objects created in memory and not scanned for garbage collection yet. These objects will be scanned after every 700 assigments in the code
  • Generation 1: Objects that survived one collection, These objects will be scanned again after every 10 Generation 0 collection
  • Generation 2: Objects that survived multiple collections, they will be scnanned after every 10 Generation 1 scan (by default) This partitioning is based on Generational Hypothesis: Young objects are more likely to become garbage quickly, and old objects that survive tend to stay alive longer

Now you know why after deleting a and b in the Python code above, Weak reference was still able to access the object, because Mark-and-Sweep algorithm (search algorithm used by garbage collector) was going to start working after 700 assignments, and before this threshold reaches, a.node was still there in memory, and b_ref weak reference was able to access this data in memory.

Try to force garbage collection:

import sys
import weakref

class Node:
    child = None

a = Node()
b = Node()
b_ref = weakref.ref(b)
a.child = b
b.child = a

print("All refs a, b, and a_ref asigned", f"{sys.getrefcount(a_ref())=}") # a_ref() returns value of a, if it's till in memory, or None if it has been deleted
del a
print("After varialble a deleted", f"{sys.getrefcount(a_ref())=}")
del b
print("After both varialble deleted", f"{sys.getrefcount(b_ref())=}")
n_collected = gc.collect(2) # force Python to collect generations 0, 1 and 2 values
print(f"collected {n_collected} objects") # collected 2 objects
Enter fullscreen mode Exit fullscreen mode

The GIL's Role in Garbage Collection

The Global Interpreter Lock (GIL) is crucial for thread-safe garbage collection, particularly for reference counting. Without the GIL, reference counting can cause race condition and memory corruption and break the garbage collection. Actually, GIL guarantees all operations in Python code is atomic (it always has an expected behavior).

I hope you enjoyed reading this post. Now we know how does Python (CPython to be specific) decides to collect memory slot and free it. Python's memory management is a fascinating blend of simplicity (everything is an object) and sophistication (multiple allocators, immortal objects, generational GC). By understanding these internals, you can understand Python code's behavior better than before.