Python - Learning the weakref Module Through Examples

Last updated on  Jul 28, 2024  by  Amo Chen  ‐ 9 min read

Python is a programming language equipped with a garbage collection (GC) mechanism, which is essentially an automated memory management system. In essence, it reclaims memory spaces that are no longer in use by the program, releasing them to prevent the problems associated with steadily decreasing available memory, such as program errors or failure to execute processes.

The GC mechanism is designed to lessen the burden on developers. In contrast, languages like C require manual memory deallocation (further details can be found in the free() function), and failure to release memory often leads to memory leaks. By relying on GC, developers can avoid worrying about memory management issues, thus improving development efficiency and reducing the likelihood of errors.

Python implements its garbage collection mechanism using a technique known as reference counting.

Environment

  • Python 3

Reference Counting

The basic functioning of Python’s reference counting is that each object instance has a counter. When an object is referenced, the instance’s counter increases by 1; when it is no longer referenced, the counter decreases by 1. When the counter reaches zero, the GC mechanism reclaims the object and releases its memory.

Consider the following code a = A(): when a references A(), the counter for A() becomes 1, so A() will not be reclaimed by the GC.

import gc

class A: pass

a1 = A()

Here, a is known as the referrer, and A() is the referent.

referrer_referent.png

To find out the referrers and referents of an object, you can use two methods from the gc module:

Getting the Reference Count

To get the reference count, you can use sys.getrefcount(), as shown in the following example:

import sys

class A: pass

a1 = A()

print('After `a1 = A()` =>', sys.getrefcount(a1))

a2 = a1

print('After `a2 = a1` =>', sys.getrefcount(a1))

a2 = None

print('After `a2 = None` =>', sys.getrefcount(a1))

The output for the above script shows how the counter changes with reference manipulation. We see the counter for A() reaching 3 after two references, then returning to 2 when the reference a2 is removed:

After `a1 = A()` => 2
After `a2 = a1` => 3
After `a2 = None` => 2

You might wonder why a1 = A() has a reference count of 2. Using sys.getrefcount() introduces a temporary reference, which accounts for an additional +1 in the count. The official documentation notes that the count returned is generally one higher than expected, so this peculiarity should be kept in mind:

Return the reference count of the object. The count returned is generally one higher than you might expect because it includes the (temporary) reference as an argument to getrefcount().

You can confirm this by executing the following code in the Python interpreter:

>>> class A:
>>> ...     def __del__(self):
>>> ...             print('Garbage collected!', self)
>>> ...
>>> A()
>>> <__main__.A object at 0x10153c550>
>>> _
>>> <__main__.A object at 0x10153c550>
>>> 3
>>> Garbage collected! <__main__.A object at 0x10153c550>
>>>

In the code above, an A class with a __del__ method is defined, which notifies when the GC runs. When we directly create an instance of A() without assigning it, you might assume it should be collected immediately; however, it is not. This is because the Python interpreter uses _ to store the previous result, serving as a hidden referrer. By assigning _ to a new value like 3, we trigger the collection of A() immediately, demonstrating that an unreferenced instance will indeed be collected right after execution. If referenced, the initial reference count is 1.

Alternatively, you can achieve the same outcome using the following Python script:

class A:
    def __del__(self):
        print('Garbage collected!', self)

A()

print('---end---')

This script produces the following output, indicating that A() is collected before the end of the program:

Garbage collected! <__main__.A object at 0x1049cbfa0>
---end---

Strong Reference

In Python’s C API, a strong reference is one that retains the object referred to by the code holding the reference. You take a strong reference by calling Py_INCREF() and release it with Py_DECREF().

By default, Python uses strong references, where the object’s reference count is increased by 1 upon creation and decreased by 1 upon dereferencing. However, strong references may lead to memory leaks in applications like Caches. Consider the following code, where we define an Image class, create three variables a, b, c each referencing an Image instance, and add them to CACHE. Even if we change the reference for b to None, you will notice b still exists in CACHE:

from pprint import pprint

class Image:
    def __init__(self, key):
        self.key = key
        self.body = f'body of {key}'

    def __del__(self):
        print('Garbage collected!', self.key, self)

CACHE = {}

def main():
    a = Image('a')
    CACHE[a.key] = a

    b = Image('b')
    CACHE[b.key] = b

    c = Image('c')
    CACHE[c.key] = c

    b = None
    print('CACHE:')
    pprint(CACHE)
    print('--- end of main() ---')

if __name__ == '__main__':
    main()
    print('--- end ---')

The output shows that b remains in CACHE because CACHE itself is a referrer. Simply setting b = None does not remove b from CACHE, and thus, a, b, c are only collected at the end of the script:

CACHE:
{'a': <__main__.Image object at 0x103443fa0>,
 'b': <__main__.Image object at 0x103443d00>,
 'c': <__main__.Image object at 0x1034ac850>}
--- end of main() ---
--- end ---
Garbage collected! b <__main__.Image object at 0x103443d00>
Garbage collected! c <__main__.Image object at 0x1034ac850>
Garbage collected! a <__main__.Image object at 0x103443fa0>

This scenario, where memory is not reclaimed due to strong references, can lead to memory leaks.

To release memory here, we need to remove all references to Image('b') <__main__.Image object at 0x103443d00>:

class Image:
    def __init__(self, key):
        self.key = key
        self.body = f'body of {key}'

    def __del__(self):
        print('Garbage collected!', self.key, self)

CACHE = {}

def main():
    a = Image('a')
    CACHE[a.key] = a

    b = Image('b')
    CACHE[b.key] = b

    c = Image('c')
    CACHE[c.key] = c

    del CACHE[b.key]
    b = None
    print('--- end of main() ---')

if __name__ == '__main__':
    main()
    print('--- end ---')

This change results in the following output, where Image('b') <__main__.Image object at 0x104d88850> is collected earlier:

Garbage collected! b <__main__.Image object at 0x104d88850>
--- end of main() ---
--- end ---
Garbage collected! a <__main__.Image object at 0x104d54550>
Garbage collected! c <__main__.Image object at 0x104d88a00>

Weak Reference

Understanding strong references, we can now delve into “weak references”.

In scenarios like the one above, where an object needs to be collected successfully without a strong reference keeping it alive, weak references are useful since they do not increase the reference count:

>>> import sys, weakref
>>> class A: pass
...
>>> a = A()
>>> sys.getrefcount(a)
2
>>> b = weakref.ref(a)
>>> sys.getrefcount(a)
2
>>> c = weakref.ref(a)
>>> sys.getrefcount(a)
2

Using weak references is straightforward: use weakref.ref(<referent>). Thus, the prior CACHE example can be adapted with weakref.ref():

import weakref

from pprint import pprint

class Image:
    def __init__(self, key):
        self.key = key
        self.body = f'body of {key}'

    def __del__(self):
        print('Garbage collected!', self.key, self)

CACHE = {}

def main():
    a = Image('a')
    CACHE[a.key] = weakref.ref(a)

    b = Image('b')
    CACHE[b.key] = weakref.ref(b)

    c = Image('c')
    CACHE[c.key] = weakref.ref(c)

    b = None
    print('CACHE:')
    pprint(CACHE)
    print('--- end of main() ---')

if __name__ == '__main__':
    main()
    print('--- end ---')

The output shows that when we set b to None, CACHE indicates b is <weakref at 0x102c65d60; dead>, meaning the reference is broken, and Image('b') <__main__.Image object at 0x102bc3d00> has been collected:

Garbage collected! b <__main__.Image object at 0x102bc3d00>
CACHE:
{'a': <weakref at 0x102c4b810; to 'Image' at 0x102bc3fa0>,
 'b': <weakref at 0x102c65d60; dead>,
 'c': <weakref at 0x102c2cdb0; to 'Image' at 0x102c239d0>}
--- end of main() ---
Garbage collected! a <__main__.Image object at 0x102bc3fa0>
Garbage collected! c <__main__.Image object at 0x102c239d0>
--- end ---

Using weakref, when checking if a key in CACHE is usable, you must call the value and check if it returns None:

cache_ref = CACHE['b']
value = cache_ref()
if value is None:
    print('No cache')
else:
    print('Got cache', value)

WeakValueDictionary

The previous examples using weakref() leave the key in PLACE even when it becomes unusable. To ensure automatic cleanup, you can use a WeakValueDictionary, which automatically removes the item when there are no strong references:

import weakref

from pprint import pprint

class Image:
    def __init__(self, key):
        self.key = key
        self.body = f'body of {key}'

    def __del__(self):
        print('Garbage collected!', self.key, self)

CACHE = weakref.WeakValueDictionary()

def main():
    a = Image('a')
    CACHE[a.key] = a

    b = Image('b')
    CACHE[b.key] = b

    c = Image('c')
    CACHE[c.key] = c

    b = None
    print('CACHE:')
    pprint(list(CACHE.items()))
    print('--- end of main() ---')

if __name__ == '__main__':
    main()
    print('--- end ---')

This output shows that setting b to None removes it from CACHE automatically since there are no strong references:

Garbage collected! b <__main__.Image object at 0x100f5c940>
CACHE:
[('a', <__main__.Image object at 0x100ef3d00>),
 ('c', <__main__.Image object at 0x100f9f3a0>)]
--- end of main() ---
Garbage collected! a <__main__.Image object at 0x100ef3d00>
Garbage collected! c <__main__.Image object at 0x100f9f3a0>
--- end ---

Similar to WeakValueDictionary, there’s also WeakKeyDictionary, which removes items when the key has no strong references. The weakref module also provides WeakSet and WeakMethod, among other utilities suitable for your needs.

Circular Reference

Lastly, let’s discuss Circular Reference. The following example illustrates a classic Circular Reference due to the attribute a in class A:

import sys

class A:
    def __init__(self, key):
        self.a = self
        self.key = key

    def __del__(self):
        print('Garbage collected!', self.key)

a = A('a')
print("A's refcount:", sys.getrefcount(a))
a = None

print('---end---')

After creating an instance of a class with Circular Reference, the counter starts at 2 and remains unaffected when the reference changes. Hence, it’s collected only when the script ends, as shown by the output:

A's refcount: 3
---end---
Garbage collected! a

In diagram form, it looks like this:

ref_self.png

In scenarios with many such classes, the memory leak is inevitable unless the strong and weak references are understood.

This issue can be resolved using weakref.proxy(). Simply use weakref.proxy() to reference self in the attribute:

import sys
import weakref

class A:
    def __init__(self, key):
        self.a = weakref.proxy(self)
        self.key = key

    def __del__(self):
        print('Garbage collected!', self.key)

a = A('a')
print("A's refcount:", sys.getrefcount(a))
a = None

print('---end---')

This revised script ensures that a is correctly collected when set to None, as seen in this output:

A's refcount: 2
Garbage collected! a
---end---

Weak Reference Objects and slots

Objects in Python can use weak references, but if a class uses __slots__, you must include __weakref__ to enable weak references, like so:

class A:
    __slots__ = ['id', '__weakref__']
    def __init__(self, id):
        self.id = id

Conclusion

Understanding weakref is not straightforward. Like many, I navigated through numerous articles and videos. Only through experimenting with examples and writing this post did I grasp weakref’s significance. Used correctly, it aids Python’s GC mechanism in effectively preventing memory leaks.

That’s it!

Happy Coding!

References

sys — System-specific parameters and functions

weakref — Weak references

python: what is weakref? (intermediate - advanced) anthony explains #366