Skip to content

GH-50083: [C++] Access mimalloc through dynamically-resolved symbols#41128

Open
pitrou wants to merge 1 commit into
apache:mainfrom
pitrou:mimalloc-dynamic-symbols
Open

GH-50083: [C++] Access mimalloc through dynamically-resolved symbols#41128
pitrou wants to merge 1 commit into
apache:mainfrom
pitrou:mimalloc-dynamic-symbols

Conversation

@pitrou

@pitrou pitrou commented Apr 10, 2024

Copy link
Copy Markdown
Member

Rationale for this change

The memray memory profiler works by interposing certain dynamic symbols in the profiled process to replace them with their own functions that will collect memory allocation data. It will currently, to the best of my knowledge, only recognize system C calls such malloc, mmap...

When a third-party allocator like mimalloc or jemalloc is being used, such that Arrow does by default, memray does not see the logical allocation calls made through these allocator's APIs (because they are not interposed), but only the raw memory reservations that they issue using system routines.

This can lead people using memray to think that a given Arrow workload (or any workload using such allocators, really) that an inordinate amount of memory is being used, while the reported memory mostly represents non-committed virtual memory that the allocator keeps for performance reasons. Concrete example in GH-40301: we allocate a number of 1kiB buffers from mimalloc, but memray sees a similar number of 64MiB calls to mmap.

We discussed how to enhance memray such as to account for the corresponding logical allocations, and we came to the conclusion that it requires that Arrow exposes API calls that can be dynamically interposed. Since we typically build against a static libmimalloc.a, the mimalloc symbols cannot be exposed (at least, I cannot seem to get this to work on Ubuntu). This means we need to define our own symbols wrapping the mimalloc APIs.

What changes are included in this PR?

Define public, interposable symbols that redirect into the mimalloc APIs that we use.

Are these changes tested?

Not for now. We could probably test them, at least on Linux, by compiling an almost trivial shared library and interposing it using LD_PRELOAD.

Are there any user-facing changes?

No. There should not be any noticeable performance regression, except perhaps on memory pool micro-benchmarks.

@github-actions

Copy link
Copy Markdown

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@pitrou pitrou force-pushed the mimalloc-dynamic-symbols branch from 6a53c52 to b40f113 Compare April 10, 2024 15:38
@pitrou

pitrou commented Apr 10, 2024

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheel-manylinux-2-28*

@github-actions

This comment was marked as outdated.

@pitrou

pitrou commented Apr 10, 2024

Copy link
Copy Markdown
Member Author

We can see the difference in the code generated for freeing mimalloc memory:

  • before, there is a direct call to mi_free:
00000000012e5900 <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll>:
 12e5900:	55                   	push   %rbp
 12e5901:	48 3b 35 00 fb 47 00 	cmp    0x47fb00(%rip),%rsi        # 1765408 <_ZN5arrow11memory_pool8internal14zero_size_areaE@@Base-0x91b8>
 12e5908:	48 89 e5             	mov    %rsp,%rbp
 12e590b:	41 54                	push   %r12
 12e590d:	49 89 d4             	mov    %rdx,%r12
 12e5910:	53                   	push   %rbx
 12e5911:	48 89 fb             	mov    %rdi,%rbx
 12e5914:	74 09                	je     12e591f <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll+0x1f>
 12e5916:	48 89 f7             	mov    %rsi,%rdi
 12e5919:	67 e8 21 af 16 00    	addr32 call 1450840 <mi_free>
 12e591f:	f0 4c 29 63 48       	lock sub %r12,0x48(%rbx)
 12e5924:	5b                   	pop    %rbx
 12e5925:	41 5c                	pop    %r12
 12e5927:	5d                   	pop    %rbp
 12e5928:	c3                   	ret    
  • after, there is an indirect call (the call*) through the jump table to arrow_mi_free:
00000000012e5b00 <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll>:
 12e5b00:	55                   	push   %rbp
 12e5b01:	48 3b 35 c0 f8 47 00 	cmp    0x47f8c0(%rip),%rsi        # 17653c8 <_ZN5arrow11memory_pool8internal14zero_size_areaE@@Base-0x91f8>
 12e5b08:	48 89 e5             	mov    %rsp,%rbp
 12e5b0b:	41 54                	push   %r12
 12e5b0d:	49 89 d4             	mov    %rdx,%r12
 12e5b10:	53                   	push   %rbx
 12e5b11:	48 89 fb             	mov    %rdi,%rbx
 12e5b14:	74 09                	je     12e5b1f <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll+0x1f>
 12e5b16:	48 89 f7             	mov    %rsi,%rdi
 12e5b19:	ff 15 51 6e 48 00    	call   *0x486e51(%rip)        # 176c970 <arrow_mi_free@@Base+0x47bad0>
 12e5b1f:	f0 4c 29 63 48       	lock sub %r12,0x48(%rbx)
 12e5b24:	5b                   	pop    %rbx
 12e5b25:	41 5c                	pop    %r12
 12e5b27:	5d                   	pop    %rbp
 12e5b28:	c3                   	ret    

@pitrou

pitrou commented Apr 10, 2024

Copy link
Copy Markdown
Member Author

Also, I do not see any significant and robust regressions in our nano-benchmarks when running them locally.

@pitrou

pitrou commented Apr 10, 2024

Copy link
Copy Markdown
Member Author

Interestingly, the binary wheels built in the Crossbow builds above use a different indirection code (a direct call to arrow_mi_free@plt instead of an indirect call*):

0000000001476850 <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll>:
 1476850:	55                   	push   %rbp
 1476851:	48 89 d5             	mov    %rdx,%rbp
 1476854:	53                   	push   %rbx
 1476855:	48 89 fb             	mov    %rdi,%rbx
 1476858:	48 83 ec 08          	sub    $0x8,%rsp
 147685c:	48 3b 35 05 5c 9d 01 	cmp    0x19d5c05(%rip),%rsi        # 2e4c468 <_ZN5arrow11memory_pool8internal14zero_size_areaE@@Base-0x10898>
 1476863:	74 08                	je     147686d <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll+0x1d>
 1476865:	48 89 f7             	mov    %rsi,%rdi
 1476868:	e8 13 56 02 ff       	call   49be80 <arrow_mi_free@plt>
 147686d:	f0 48 29 6b 48       	lock sub %rbp,0x48(%rbx)
 1476872:	48 83 c4 08          	add    $0x8,%rsp
 1476876:	5b                   	pop    %rbx
 1476877:	5d                   	pop    %rbp
 1476878:	c3                   	ret    

I've checked that LD_PRELOAD still works to interpose these functions (arrow_mi_free etc.).

@github-actions github-actions Bot added the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Nov 18, 2025
@thisisnic

Copy link
Copy Markdown
Member

Thank you for your contribution. Unfortunately, this
pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label
or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you
do not have repository permissions to reopen the PR, please tag a maintainer.

@github-actions github-actions Bot removed the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Jan 26, 2026
@pitrou pitrou force-pushed the mimalloc-dynamic-symbols branch from b40f113 to ee6537d Compare June 2, 2026 12:49
@pitrou

pitrou commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheel-manylinux-2-28*

@pitrou pitrou force-pushed the mimalloc-dynamic-symbols branch from ee6537d to ba3c4f7 Compare June 2, 2026 12:52
@github-actions

This comment was marked as outdated.

@pitrou pitrou force-pushed the mimalloc-dynamic-symbols branch from ba3c4f7 to f414851 Compare June 2, 2026 12:55
@pitrou

pitrou commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheel-manylinux-2-28*

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Revision: f414851

Submitted crossbow builds: ursacomputing/crossbow @ actions-f5c1edacb2

Task Status
wheel-manylinux-2-28-cp310-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-arm64 GitHub Actions

@pitrou

pitrou commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Unfortunately, interposition is not possible anymore with the wheels built in the Crossbow build above. We see the call instruction uses a hardcoded target address:

0000000000b89070 <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll.lto_priv.0>:
  b89070:       55                      push   %rbp
  b89071:       48 89 d5                mov    %rdx,%rbp
  b89074:       53                      push   %rbx
  b89075:       48 89 fb                mov    %rdi,%rbx
  b89078:       48 83 ec 08             sub    $0x8,%rsp
  b8907c:       48 3b 35 cd 47 de 01    cmp    0x1de47cd(%rip),%rsi        # 296d850 <_ZN5arrow11memory_pool8internal14zero_size_areaE@@Base-0x3870>
  b89083:       74 08                   je     b8908d <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll.lto_priv.0+0x1d>
  b89085:       48 89 f7                mov    %rsi,%rdi
  b89088:       e8 d3 ff ff ff          call   b89060 <arrow_mi_free>
  b8908d:       f0 48 29 6b 48          lock sub %rbp,0x48(%rbx)
  b89092:       48 83 c4 08             add    $0x8,%rsp
  b89096:       5b                      pop    %rbx
  b89097:       5d                      pop    %rbp
  b89098:       c3                      ret

@pitrou pitrou force-pushed the mimalloc-dynamic-symbols branch from f414851 to cfd2971 Compare June 2, 2026 14:21
@pitrou

pitrou commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheel-manylinux-2-28*

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Revision: cfd2971

Submitted crossbow builds: ursacomputing/crossbow @ actions-b477fba900

Task Status
wheel-manylinux-2-28-cp310-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-arm64 GitHub Actions

@pitrou

pitrou commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

I selectively added the -fsemantic-interposition flag for the MemoryPool implementation code and it seems it made the mimalloc symbols in the PyArrow wheels interposable again.

This can be seen for example by the arrow_mi_free call going through the PLT (whatever that means exactly :-)):

0000000000b89fc0 <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll.lto_priv.0>:
  b89fc0:       55                      push   %rbp
  b89fc1:       48 89 d5                mov    %rdx,%rbp
  b89fc4:       53                      push   %rbx
  b89fc5:       48 89 fb                mov    %rdi,%rbx
  b89fc8:       48 83 ec 08             sub    $0x8,%rsp
  b89fcc:       48 3b 35 9d 78 de 01    cmp    0x1de789d(%rip),%rsi        # 2971870 <_ZN5arrow11memory_pool8internal14zero_size_areaE@@Base-0x38d0>
  b89fd3:       74 08                   je     b89fdd <_ZN5arrow18BaseMemoryPoolImplINS_12_GLOBAL__N_117MimallocAllocatorEE4FreeEPhll.lto_priv.0+0x1d>
  b89fd5:       48 89 f7                mov    %rsi,%rdi
  b89fd8:       e8 23 a7 7b ff          call   344700 <arrow_mi_free@plt>
  b89fdd:       f0 48 29 6b 48          lock sub %rbp,0x48(%rbx)
  b89fe2:       48 83 c4 08             add    $0x8,%rsp
  b89fe6:       5b                      pop    %rbx
  b89fe7:       5d                      pop    %rbp
  b89fe8:       c3                      ret

(and I also checked manually with LD_PRELOAD)

@pitrou

pitrou commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheelcp314*

@pitrou pitrou added the CI: Extra: C++ Run extra C++ CI label Jun 2, 2026
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Revision: cfd2971

Submitted crossbow builds: ursacomputing/crossbow @ actions-275e51b993

Task Status
wheel-macos-monterey-cp314-cp314-amd64 GitHub Actions
wheel-macos-monterey-cp314-cp314-arm64 GitHub Actions
wheel-macos-monterey-cp314-cp314t-amd64 GitHub Actions
wheel-macos-monterey-cp314-cp314t-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-arm64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314-amd64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314-arm64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314t-amd64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314t-arm64 GitHub Actions
wheel-windows-cp314-cp314-amd64 GitHub Actions
wheel-windows-cp314-cp314t-amd64 GitHub Actions

@pitrou pitrou changed the title EXPERIMENT: [C++] Access mimalloc through dynamically-resolved symbols GH-50083: [C++] Access mimalloc through dynamically-resolved symbols Jun 2, 2026
@pitrou pitrou marked this pull request as ready for review June 2, 2026 16:35
@pitrou pitrou force-pushed the mimalloc-dynamic-symbols branch from cfd2971 to 28f7069 Compare June 2, 2026 16:42
@pitrou

pitrou commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

@pablogsal This PR is ready. Do you want to take a look?

@github-actions github-actions Bot removed the CI: Extra: C++ Run extra C++ CI label Jun 2, 2026
@pitrou pitrou added the CI: Extra: C++ Run extra C++ CI label Jun 2, 2026
@pitrou

pitrou commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

@ursabot please benchmark lang=C++

@rok

rok commented Jun 3, 2026

Copy link
Copy Markdown
Member

Benchmark runs are scheduled for commit 28f7069. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete.

@conbench-apache-arrow

Copy link
Copy Markdown

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit 28f7069.

There were 8 benchmark results indicating a performance regression:

The full Conbench report has more details.

@pitrou

pitrou commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit -g cpp

@pitrou pitrou requested a review from kou June 8, 2026 08:27
@github-actions

This comment was marked as outdated.

@pitrou

pitrou commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@zanmato1984 Would you like to take a look?

@zanmato1984

Copy link
Copy Markdown
Contributor

Yes, looking.

@zanmato1984 zanmato1984 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline.

Also, I see the conbench showing visible performance regression, is this expected and acceptable?

Comment thread cpp/src/arrow/CMakeLists.txt

#include "arrow/memory_pool.h"
#include "arrow/result.h"
#include "arrow/util/config.h"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I noticed only the system memory pool benchmarks were enabled otherwise, as ARROW_MIMALLOC and ARROW_JEMALLOC were not defined.

Comment thread cpp/src/arrow/CMakeLists.txt Outdated
Comment thread cpp/src/arrow/memory_pool_internal.h Outdated
}
}

void TestReleaseUnused() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test really helping on the code changed in this PR?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exercising the interposition for mi_collect.

@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 12, 2026
@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Also, I see the conbench showing visible performance regression, is this expected and acceptable?

TouchArea is an allocation micro-benchmark, so I wouldn't care too much. The 5 other benchmark regressions seem a bit random and only concern a couple parameter sets, there doesn't seem to be any systemic performance regression.

@pitrou pitrou force-pushed the mimalloc-dynamic-symbols branch from 28f7069 to fe2a4e3 Compare June 15, 2026 09:49
@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit -g cpp

@github-actions github-actions Bot removed the CI: Extra: C++ Run extra C++ CI label Jun 15, 2026
@github-actions

Copy link
Copy Markdown

Revision: fe2a4e3

Submitted crossbow builds: ursacomputing/crossbow @ actions-34d517b649

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheelcp312*

@github-actions

Copy link
Copy Markdown

Revision: fe2a4e3

Submitted crossbow builds: ursacomputing/crossbow @ actions-48f2689d41

Task Status
wheel-macos-monterey-cp312-cp312-amd64 GitHub Actions
wheel-macos-monterey-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions

@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

I think I addressed your review comments @zanmato1984 , do you want to take another look at this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants