Skip to content

Increased memory usage with mimalloc #135153

Open
@nascheme

Description

@nascheme

This was originally going to be titled "with free-threaded build". However, based on some investigation, the increased memory usage is seemingly largely due to the use of mimalloc rather than free-threading specific behaviors. This issue will focus on mimalloc causing increased memory usage.

Here are some techniques I used to investigate this, which might be useful to others. Note that these instructions are specific for running on Linux.

  • In Include/internal/mimalloc/mimalloc/types.h, uncomment the MI_TRACK_VALGRIND define. This allows the "massif" tool of Valgrind to trace the mimalloc allocations and deallocations. Also note that you must compile with GCC, not Clang.
  • In Objects/mimalloc/alloc.c, comment out the line containing the "reallocation still fits and not more than 50% waste". This makes mimalloc always resize down an object, even if the decrease is less than 50% of the original size. This is not a major cause of extra memory usage but reduces the gap between pymalloc and mimalloc. pymalloc always resizes downwards.
  • Your benchmark script can be run with valgrind --tool=massif ./python <script>. This will create a massif.out.NNN file as a result. That is a text file report of the allocations and where they happened.

As an alternative to "massif", you can also use memray. To make it run with the free-threaded build, you need to uncomment one line (the definition of Py_LIMITED_API in src/memray/_memray/inject.cpp). You do have to rely on --trace-python-allocators as well as --native because mimalloc allocates a 1Gb region right at process startup, but you really want to see the results of tracing python allocations anyway.

Here is a run of pyperformance done by Thomas Wouters, showing memory usage of benchmarks compared to a baseline.
pyperformance results

Based on the pyperformance memory usage statistics, the bm_tomli_loads benchmark is a benchmark that uses quite a lot of extra memory under the free-threaded build vs the default build. To make it easier to run, I wrote a simpler version of a benchmark that uses the builtin tomllib module instead. It also shows a similar amount of extra memory usage.

bench_tomllib_mem.py.txt

Some additional scripts, to dump out memory usage stats:

memory_stats.py.txt
count_used_pages.py.txt

Statistics on RSS (resident set size) when running this script under various Python builds:

Build RSS (kB) Increase
Default build, pymalloc 131,192 -
Default build, mimalloc 196,796 1.5x
FT build, mimalloc 213,408 1.6x

Note that to run the default build with mimalloc, you must set PYTHONMALLOC=mimalloc in the environment. Otherwise, pymalloc is used, even if mimalloc is available in the build. The FT build always uses mimalloc.

Using the massif reports, we can examine where the memory is being allocated from. Below is a summary for three different builds.

Build Mem (MB)
default, pymalloc:
67.5 PyUnicode_New, _PyUnicodeWriter_PrepareInternal
0.2 PyUnicode_New, others
16.8 _PyBytes_Resize
3.9 new_keys_object
88.5 sub-total
90.3 total heap usage reported by massif
default, mimalloc:
67.5 PyUnicode_New _PyUnicodeWriter_PrepareInternal
16.8 _PyBytes_FromSize
37.8 PyUnicode_New unicode_subscript
8.5 new_keys_object
130.6 sub-total
136.5 total heap usage reported by massif
free-threaded, mimalloc:
67.5 PyUnicode_New _PyUnicodeWriter_PrepareInternal
16.8 _PyBytes_FromSize
47.7 PyUnicode_New unicode_subscript
8.7 new_keys_object
140.7 sub-total
147.0 total heap usage reported by massif

Based on these results, my suspicion is that mimalloc can waste quite a lot of memory compared to pymalloc when many small allocations are made. I suspect so because it seems most of the extra memory usage is due to the unicode_subscript calls. For the pymalloc build, that memory must be accounted under the "PyUnicode_New, others" category (massif doesn't show the unicode_subscript call explicitly). More work is needed to confirm this theory. Another theory is that mimalloc is not releasing the freed memory back to the OS as aggressively as pymalloc does.

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions