GH-50083: [C++] Access mimalloc through dynamically-resolved symbols#41128
GH-50083: [C++] Access mimalloc through dynamically-resolved symbols#41128pitrou wants to merge 1 commit into
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or In the case of PARQUET issues on JIRA the title also supports: See also: |
6a53c52 to
b40f113
Compare
|
@github-actions crossbow submit wheel-manylinux-2-28* |
This comment was marked as outdated.
This comment was marked as outdated.
|
We can see the difference in the code generated for freeing mimalloc memory:
|
|
Also, I do not see any significant and robust regressions in our nano-benchmarks when running them locally. |
|
Interestingly, the binary wheels built in the Crossbow builds above use a different indirection code (a direct call to I've checked that |
|
Thank you for your contribution. Unfortunately, this |
b40f113 to
ee6537d
Compare
|
@github-actions crossbow submit wheel-manylinux-2-28* |
ee6537d to
ba3c4f7
Compare
This comment was marked as outdated.
This comment was marked as outdated.
ba3c4f7 to
f414851
Compare
|
@github-actions crossbow submit wheel-manylinux-2-28* |
|
Revision: f414851 Submitted crossbow builds: ursacomputing/crossbow @ actions-f5c1edacb2 |
|
Unfortunately, interposition is not possible anymore with the wheels built in the Crossbow build above. We see the |
f414851 to
cfd2971
Compare
|
@github-actions crossbow submit wheel-manylinux-2-28* |
|
Revision: cfd2971 Submitted crossbow builds: ursacomputing/crossbow @ actions-b477fba900 |
|
I selectively added the This can be seen for example by the (and I also checked manually with |
|
@github-actions crossbow submit wheelcp314* |
|
Revision: cfd2971 Submitted crossbow builds: ursacomputing/crossbow @ actions-275e51b993 |
cfd2971 to
28f7069
Compare
|
@pablogsal This PR is ready. Do you want to take a look? |
|
@ursabot please benchmark lang=C++ |
|
Benchmark runs are scheduled for commit 28f7069. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete. |
|
Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit 28f7069. There were 8 benchmark results indicating a performance regression:
The full Conbench report has more details. |
|
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
|
@zanmato1984 Would you like to take a look? |
|
Yes, looking. |
zanmato1984
left a comment
There was a problem hiding this comment.
Some comments inline.
Also, I see the conbench showing visible performance regression, is this expected and acceptable?
|
|
||
| #include "arrow/memory_pool.h" | ||
| #include "arrow/result.h" | ||
| #include "arrow/util/config.h" |
There was a problem hiding this comment.
Because I noticed only the system memory pool benchmarks were enabled otherwise, as ARROW_MIMALLOC and ARROW_JEMALLOC were not defined.
| } | ||
| } | ||
|
|
||
| void TestReleaseUnused() { |
There was a problem hiding this comment.
Is this test really helping on the code changed in this PR?
There was a problem hiding this comment.
It's exercising the interposition for mi_collect.
|
28f7069 to
fe2a4e3
Compare
|
@github-actions crossbow submit -g cpp |
|
Revision: fe2a4e3 Submitted crossbow builds: ursacomputing/crossbow @ actions-34d517b649 |
|
@github-actions crossbow submit wheelcp312* |
|
Revision: fe2a4e3 Submitted crossbow builds: ursacomputing/crossbow @ actions-48f2689d41 |
|
I think I addressed your review comments @zanmato1984 , do you want to take another look at this PR? |
Rationale for this change
The memray memory profiler works by interposing certain dynamic symbols in the profiled process to replace them with their own functions that will collect memory allocation data. It will currently, to the best of my knowledge, only recognize system C calls such
malloc,mmap...When a third-party allocator like mimalloc or jemalloc is being used, such that Arrow does by default, memray does not see the logical allocation calls made through these allocator's APIs (because they are not interposed), but only the raw memory reservations that they issue using system routines.
This can lead people using memray to think that a given Arrow workload (or any workload using such allocators, really) that an inordinate amount of memory is being used, while the reported memory mostly represents non-committed virtual memory that the allocator keeps for performance reasons. Concrete example in GH-40301: we allocate a number of 1kiB buffers from mimalloc, but memray sees a similar number of 64MiB calls to
mmap.We discussed how to enhance memray such as to account for the corresponding logical allocations, and we came to the conclusion that it requires that Arrow exposes API calls that can be dynamically interposed. Since we typically build against a static
libmimalloc.a, the mimalloc symbols cannot be exposed (at least, I cannot seem to get this to work on Ubuntu). This means we need to define our own symbols wrapping the mimalloc APIs.What changes are included in this PR?
Define public, interposable symbols that redirect into the mimalloc APIs that we use.
Are these changes tested?
Not for now. We could probably test them, at least on Linux, by compiling an almost trivial shared library and interposing it using
LD_PRELOAD.Are there any user-facing changes?
No. There should not be any noticeable performance regression, except perhaps on memory pool micro-benchmarks.