Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

perf: add experimental support for using mimalloc allocator#404

Open
wincent wants to merge 7 commits into
main from
wincent/mimalloc
Open

perf: add experimental support for using mimalloc allocator #404
wincent wants to merge 7 commits into
main from
wincent/mimalloc

Conversation

@wincent

@wincent wincent commented Sep 2, 2022
edited
Loading

Copy link
Copy Markdown
Owner

Vendoring from microsoft/mimalloc and specifically the (削除) v2.0.6 tag (削除ここまで) v2.1.7 tag.

(Note: There have been a few releases since then — see the diff from v2.1.7 to v3.2.8 — so this may be worth reevaluating.)

mimalloc is a simple allocator focused on performance and it is easy to drop in as a replacement for malloc() and friends as described in its README. So as not to bring in a dependency on CMake, we just build the static.c version. Sadly, the performance delta (see numbers below) is not a clear win; the numbers are a bit all over the place. This probably isn't that surprising because most of the heavy memory allocation in Command-T is already micro-managed internally (but simply, with little overhead) using big slabs allocated with mmap(). Nevertheless, parking this here as a possible idea.

I added a script to pull down the release archive and dump it into a directory, because I don't want to use a submodule for this (people installing a Vim plugin from a Git repo shouldn't have to know/worry about whether it needs or uses submodules). Space on disk for this set of files (some of which are obviously redundant in our context) is:

du -sh lua/wincent/commandt/lib/vendor/github/microsoft
4.8M lua/wincent/commandt/lib/vendor/github/microsoft

As it is not clear whether this is going to be a great idea or not, it only takes effect if you call make with USE_MIMALLOC set. You can verify that it actually is overriding the standard malloc() etc calls by running a command with MIMALLOC_VERBOSE, which will cause it to print some extra info out:

env MIMALLOC_VERBOSE=1 TIMES=1 bin/benchmarks/scanner.lua

Impact (unfortunately, a bit inconclusive) on scanner and matcher benchmarks follows. Note that numbers shouldn't be compared across machines because they were produced at different times (for example, the M3 numbers are from a different version of the OS, and the branch was rebased, compared with the other machines).

On mid-2015 MacBook Pro

These numbers are all over the map due to thermal throttling.

 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.04094 0.04178 0.00278 [-0.6%] (0.04100) (0.04186) (0.00287) [-0.6%]
 file 0.30707 0.31436 0.02486 [-1.0%] 0.05 (0.30735) (0.31473) (0.02499) [-1.0%] 0.05
 find 0.05827 0.06678 0.01162 [+1.5%] 0.05 (0.92013) (0.93752) (0.04453) [-1.0%] 0.025
 git 0.05163 0.06000 0.01115 [+3.3%] 0.0005 (1.00993) (1.02469) (0.04072) [-0.7%] 0.025
 rg 0.06419 0.07229 0.01203 [+3.8%] 0.005 (1.61018) (1.66326) (0.08803) [+0.3%]
watchman 0.01095 0.01121 0.00068 [+0.2%] (1.16830) (1.17605) (0.01835) [+0.6%] 0.005
 total 0.54387 0.56643 0.04391 [+0.4%] (5.09873) (5.15811) (0.15328) [-0.1%]
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.44648 0.48275 0.19826 [-10.0%] 0.01 (0.44705) (0.48350) (0.19793) [-10.0%] 0.01
 command-t 0.41205 0.44292 0.21658 [+3.8%] 0.005 (0.41255) (0.44364) (0.21681) [+3.8%] 0.005
chromium (subset) 2.75724 2.99017 0.47925 [-1.3%] (0.51232) (0.55960) (0.17228) [-1.5%]
 chromium (whole) 3.18933 3.63241 0.64392 [-0.7%] (0.41821) (0.49571) (0.14853) [-0.3%] 0.05
 big (400k) 4.90155 5.51271 1.20748 [-1.0%] (0.65297) (0.74723) (0.23045) [-4.5%] 0.05
 total 11.74815 13.06097 2.16866 [-1.2%] (2.47007) (2.72968) (0.54795) [-2.8%] 0.025

M1 MacBook Pro

 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.04407 0.05368 0.01123 [-1.4%] 0.025 (0.04433) (0.05413) (0.01150) [-1.6%] 0.025
 file 0.20902 0.21428 0.01060 [+1.0%] 0.01 (0.20902) (0.21511) (0.01219) [+1.1%] 0.005
 find 0.02687 0.03006 0.01015 [+3.9%] 0.05 (0.63141) (0.64156) (0.03483) [+0.7%] 0.05
 git 0.02693 0.02995 0.00980 [+2.2%] (0.71734) (0.72825) (0.04266) [-0.4%]
 rg 0.02916 0.03318 0.01136 [+2.9%] (0.90193) (0.91710) (0.07157) [+1.4%] 0.005
watchman 0.01100 0.01156 0.00165 [-0.7%] (1.18802) (1.21274) (0.13422) [+1.5%] 0.005
 total 0.36119 0.37272 0.03632 [+1.1%] (3.71713) (3.76889) (0.18577) [+0.9%] 0.005
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.28526 0.29636 0.08356 [-4.0%] 0.025 (0.28527) (0.29647) (0.08343) [-4.0%] 0.025
 command-t 0.23759 0.24616 0.07356 [+1.6%] (0.23760) (0.24618) (0.07354) [+1.6%]
chromium (subset) 1.56761 1.58469 0.03655 [-0.3%] (0.41376) (0.42040) (0.02032) [-0.4%]
 chromium (whole) 1.87180 1.88726 0.06174 [-0.4%] 0.025 (0.31695) (0.32809) (0.03497) [+0.4%]
 big (400k) 2.90455 2.92204 0.07185 [-0.2%] (0.48384) (0.50533) (0.07608) [-0.0%]
 total 6.88851 6.93650 0.15002 [-0.4%] 0.025 (1.74550) (1.79647) (0.14517) [-0.5%]

M3 MacBook Pro

 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.01255 0.01400 0.00409 [+2.0%] (0.01260) (0.01447) (0.00635) [-3.3%]
 file 0.14749 0.15026 0.00629 [+38.1%] 0.0005 (0.14843) (0.15115) (0.00626) [+37.9%] 0.0005
 find 0.20783 0.27306 0.12796 [+15.8%] 0.0005 (1.13360) (1.38588) (0.55490) [+15.3%] 0.0005
 git 0.21748 0.25155 0.10398 [+13.0%] 0.0005 (1.17693) (1.40937) (0.54965) [+9.1%] 0.0005
 rg 0.20640 0.26983 0.12977 [+12.2%] 0.0005 (1.55310) (1.78037) (0.55921) [+6.9%] 0.0005
watchman 0.01813 0.01980 0.00287 [+6.1%] 0.0005 (1.19740) (1.21007) (0.02198) [-0.2%]
 total 0.81542 0.97850 0.33560 [+17.1%] 0.0005 (5.23262) (5.95132) (1.66475) [+8.7%] 0.0005
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.21079 0.22604 0.10943 [+4.8%] 0.025 (0.21107) (0.22640) (0.10972) [+4.7%] 0.025
 command-t 0.16694 0.17164 0.04923 [-0.6%] (0.16716) (0.17228) (0.05253) [-0.5%]
chromium (subset) 1.35310 1.36239 0.02010 [+0.1%] (0.28797) (0.29255) (0.01108) [+0.3%]
 chromium (whole) 1.11148 1.11599 0.01258 [+0.3%] 0.01 (0.12167) (0.12478) (0.00828) [-0.2%]
 big (400k) 1.67454 1.68249 0.05630 [+0.6%] 0.0005 (0.18195) (0.18487) (0.00876) [+0.0%]
 total 4.52863 4.55855 0.15573 [+0.5%] 0.01 (0.97644) (1.00087) (0.12712) [+1.0%]

Ryzen 5950X Arch Linux

 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.02465 0.02544 0.01098 [-0.4%] (0.02467) (0.02546) (0.01099) [-0.5%]
 file 0.09906 0.09948 0.00124 [-0.1%] (0.09943) (0.09995) (0.00130) [-0.2%]
 find 0.01852 0.01885 0.00084 [+0.5%] (0.25137) (0.25430) (0.00762) [+0.1%]
 git 0.01718 0.01811 0.00210 [+0.6%] (0.22095) (0.22468) (0.01156) [-0.6%]
 rg 0.01748 0.01792 0.00105 [+0.5%] (0.60575) (0.61077) (0.01562) [-0.1%]
watchman 0.00178 0.00186 0.00033 [-5.6%] (0.02282) (0.02717) (0.02826) [-11.5%]
 total 0.17975 0.18165 0.01018 [-0.0%] (1.23025) (1.24233) (0.04061) [-0.4%] 0.05
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.26186 0.27703 0.10940 [-4.4%] 0.0005 (0.26196) (0.27715) (0.10946) [-4.4%] 0.0005
 command-t 0.19271 0.20058 0.05044 [-3.0%] 0.0005 (0.19279) (0.20065) (0.05047) [-3.0%] 0.0005
chromium (subset) 1.83627 1.89158 0.25631 [-3.8%] 0.01 (0.45977) (0.49985) (0.21028) [-15.7%] 0.005
 chromium (whole) 1.36877 1.38916 0.06031 [+2.6%] 0.0005 (0.12129) (0.12530) (0.01659) [-0.4%]
 big (400k) 2.39053 2.43636 0.11813 [+1.8%] 0.0005 (0.19600) (0.20396) (0.02644) [-0.1%]
 total 6.09256 6.19472 0.33431 [-0.2%] (1.24139) (1.30690) (0.25114) [-7.5%] 0.005

wincent and others added 7 commits August 13, 2024 18:19
Fixes:
```
luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:199: attempt to call field 'nvim_buf_is_valid' (a nil value)
```
Fixes:
```
luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:244: attempt to index field 'scanners' (a nil value)
```
Vendoring from:
- https://github.com/microsoft/mimalloc
and specifically:
- https://github.com/microsoft/mimalloc/releases/tag/v2.0.6
I added a script to pull down the release archive and dump it into a
directory, because I don't want to use a submodule for this (people
installing a Vim plugin from a Git repo shouldn't have to know/worry
about whether it needs or uses submodules). Space on disk for this set
of files (some of which are obviously redundant in our context) is:
 du -sh lua/wincent/commandt/lib/vendor/github/microsoft
 4.8M lua/wincent/commandt/lib/vendor/github/microsoft
As it is not clear whether this is going to be a great idea or not, it
only takes effect if you call `make` with `USE_MIMALLOC` set. You can
verify that it actually _is_ overriding the standard `malloc()` etc
calls by running a command with `MIMALLOC_VERBOSE`, which will cause it
to print some extra info out:
 env MIMALLOC_VERBOSE=1 TIMES=1 bin/benchmarks/scanner.lua
Impact (unfortunately, a bit inconclusive) on scanner and matcher
benchmarks follows. Note that numbers shouldn't be compared across
machines because they were produced at different times (for example, the
M3 numbers are from a different version of the OS, and the branch was
rebased, compared with the other machines).
On mid-2015 MacBook Pro
=======================
These numbers are all over the map due to thermal throttling.
 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.04094 0.04178 0.00278 [-0.6%] (0.04100) (0.04186) (0.00287) [-0.6%]
 file 0.30707 0.31436 0.02486 [-1.0%] 0.05 (0.30735) (0.31473) (0.02499) [-1.0%] 0.05
 find 0.05827 0.06678 0.01162 [+1.5%] 0.05 (0.92013) (0.93752) (0.04453) [-1.0%] 0.025
 git 0.05163 0.06000 0.01115 [+3.3%] 0.0005 (1.00993) (1.02469) (0.04072) [-0.7%] 0.025
 rg 0.06419 0.07229 0.01203 [+3.8%] 0.005 (1.61018) (1.66326) (0.08803) [+0.3%]
 watchman 0.01095 0.01121 0.00068 [+0.2%] (1.16830) (1.17605) (0.01835) [+0.6%] 0.005
 total 0.54387 0.56643 0.04391 [+0.4%] (5.09873) (5.15811) (0.15328) [-0.1%]
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.44648 0.48275 0.19826 [-10.0%] 0.01 (0.44705) (0.48350) (0.19793) [-10.0%] 0.01
 command-t 0.41205 0.44292 0.21658 [+3.8%] 0.005 (0.41255) (0.44364) (0.21681) [+3.8%] 0.005
 chromium (subset) 2.75724 2.99017 0.47925 [-1.3%] (0.51232) (0.55960) (0.17228) [-1.5%]
 chromium (whole) 3.18933 3.63241 0.64392 [-0.7%] (0.41821) (0.49571) (0.14853) [-0.3%] 0.05
 big (400k) 4.90155 5.51271 1.20748 [-1.0%] (0.65297) (0.74723) (0.23045) [-4.5%] 0.05
 total 11.74815 13.06097 2.16866 [-1.2%] (2.47007) (2.72968) (0.54795) [-2.8%] 0.025
M1 MacBook Pro
==============
 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.04407 0.05368 0.01123 [-1.4%] 0.025 (0.04433) (0.05413) (0.01150) [-1.6%] 0.025
 file 0.20902 0.21428 0.01060 [+1.0%] 0.01 (0.20902) (0.21511) (0.01219) [+1.1%] 0.005
 find 0.02687 0.03006 0.01015 [+3.9%] 0.05 (0.63141) (0.64156) (0.03483) [+0.7%] 0.05
 git 0.02693 0.02995 0.00980 [+2.2%] (0.71734) (0.72825) (0.04266) [-0.4%]
 rg 0.02916 0.03318 0.01136 [+2.9%] (0.90193) (0.91710) (0.07157) [+1.4%] 0.005
 watchman 0.01100 0.01156 0.00165 [-0.7%] (1.18802) (1.21274) (0.13422) [+1.5%] 0.005
 total 0.36119 0.37272 0.03632 [+1.1%] (3.71713) (3.76889) (0.18577) [+0.9%] 0.005
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.28526 0.29636 0.08356 [-4.0%] 0.025 (0.28527) (0.29647) (0.08343) [-4.0%] 0.025
 command-t 0.23759 0.24616 0.07356 [+1.6%] (0.23760) (0.24618) (0.07354) [+1.6%]
 chromium (subset) 1.56761 1.58469 0.03655 [-0.3%] (0.41376) (0.42040) (0.02032) [-0.4%]
 chromium (whole) 1.87180 1.88726 0.06174 [-0.4%] 0.025 (0.31695) (0.32809) (0.03497) [+0.4%]
 big (400k) 2.90455 2.92204 0.07185 [-0.2%] (0.48384) (0.50533) (0.07608) [-0.0%]
 total 6.88851 6.93650 0.15002 [-0.4%] 0.025 (1.74550) (1.79647) (0.14517) [-0.5%]
M3 MacBook Pro
==============
 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.01255 0.01400 0.00409 [+2.0%] (0.01260) (0.01447) (0.00635) [-3.3%]
 file 0.14749 0.15026 0.00629 [+38.1%] 0.0005 (0.14843) (0.15115) (0.00626) [+37.9%] 0.0005
 find 0.20783 0.27306 0.12796 [+15.8%] 0.0005 (1.13360) (1.38588) (0.55490) [+15.3%] 0.0005
 git 0.21748 0.25155 0.10398 [+13.0%] 0.0005 (1.17693) (1.40937) (0.54965) [+9.1%] 0.0005
 rg 0.20640 0.26983 0.12977 [+12.2%] 0.0005 (1.55310) (1.78037) (0.55921) [+6.9%] 0.0005
 watchman 0.01813 0.01980 0.00287 [+6.1%] 0.0005 (1.19740) (1.21007) (0.02198) [-0.2%]
 total 0.81542 0.97850 0.33560 [+17.1%] 0.0005 (5.23262) (5.95132) (1.66475) [+8.7%] 0.0005
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.21079 0.22604 0.10943 [+4.8%] 0.025 (0.21107) (0.22640) (0.10972) [+4.7%] 0.025
 command-t 0.16694 0.17164 0.04923 [-0.6%] (0.16716) (0.17228) (0.05253) [-0.5%]
 chromium (subset) 1.35310 1.36239 0.02010 [+0.1%] (0.28797) (0.29255) (0.01108) [+0.3%]
 chromium (whole) 1.11148 1.11599 0.01258 [+0.3%] 0.01 (0.12167) (0.12478) (0.00828) [-0.2%]
 big (400k) 1.67454 1.68249 0.05630 [+0.6%] 0.0005 (0.18195) (0.18487) (0.00876) [+0.0%]
 total 4.52863 4.55855 0.15573 [+0.5%] 0.01 (0.97644) (1.00087) (0.12712) [+1.0%]
Ryzen 5950X Arch Linux
======================
 best avg sd +/- p (best) (avg) (sd) +/- p
 buffer 0.02465 0.02544 0.01098 [-0.4%] (0.02467) (0.02546) (0.01099) [-0.5%]
 file 0.09906 0.09948 0.00124 [-0.1%] (0.09943) (0.09995) (0.00130) [-0.2%]
 find 0.01852 0.01885 0.00084 [+0.5%] (0.25137) (0.25430) (0.00762) [+0.1%]
 git 0.01718 0.01811 0.00210 [+0.6%] (0.22095) (0.22468) (0.01156) [-0.6%]
 rg 0.01748 0.01792 0.00105 [+0.5%] (0.60575) (0.61077) (0.01562) [-0.1%]
 watchman 0.00178 0.00186 0.00033 [-5.6%] (0.02282) (0.02717) (0.02826) [-11.5%]
 total 0.17975 0.18165 0.01018 [-0.0%] (1.23025) (1.24233) (0.04061) [-0.4%] 0.05
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.26186 0.27703 0.10940 [-4.4%] 0.0005 (0.26196) (0.27715) (0.10946) [-4.4%] 0.0005
 command-t 0.19271 0.20058 0.05044 [-3.0%] 0.0005 (0.19279) (0.20065) (0.05047) [-3.0%] 0.0005
 chromium (subset) 1.83627 1.89158 0.25631 [-3.8%] 0.01 (0.45977) (0.49985) (0.21028) [-15.7%] 0.005
 chromium (whole) 1.36877 1.38916 0.06031 [+2.6%] 0.0005 (0.12129) (0.12530) (0.01659) [-0.4%]
 big (400k) 2.39053 2.43636 0.11813 [+1.8%] 0.0005 (0.19600) (0.20396) (0.02644) [-0.1%]
 total 6.09256 6.19472 0.33431 [-0.2%] (1.24139) (1.30690) (0.25114) [-7.5%] 0.005
The .prettierignore change is because there are a couple of things in
the Markdown files that Prettier doesn't like.
The clang-format thing comes from a tip here:
- https://stackoverflow.com/a/57272592/2103996
Should prevent CI failures like this one:
- https://github.com/wincent/command-t/actions/runs/2979207632 
Wasn't needed on clang, but is needed with gcc:
 /usr/bin/ld: mimalloc-override.o: relocation R_X86_64_TPOFF32
 against `recurse' can not be used when making a shared object;
 recompile with -fPIC
I can't see a changelog or release notes in the repo, so here is the
diff:
- microsoft/mimalloc@v2.0.6...v2.1.7 

wincent commented Aug 13, 2024
edited
Loading

Copy link
Copy Markdown
Owner Author

Quick test of Hoard, for comparison:

brew tap emeryberger/hoard
brew install --HEAD emeryberger/hoard/libhoard
make clean
make
hoard bin/benchmarks/matcher.lua

Results (relative to wincent/mimalloc branch) on M3:

Summary of cpu time and (wall time):
 best avg sd +/- p (best) (avg) (sd) +/- p
 pathological 0.20645 0.21815 0.07995 [-3.6%] 0.025 (0.20715) (0.21876) (0.08035) [-3.5%] 0.025
 command-t 0.16663 0.17294 0.05643 [+0.7%] (0.16724) (0.17352) (0.05677) [+0.7%]
chromium (subset) 1.34275 1.35172 0.02076 [-0.8%] 0.0005 (0.28418) (0.28908) (0.01675) [-1.2%] 0.005
 chromium (whole) 1.10651 1.11530 0.02674 [-0.1%] (0.12181) (0.12475) (0.01076) [-0.0%]
 big (400k) 1.66873 1.68029 0.03942 [-0.1%] (0.18046) (0.18403) (0.01414) [-0.5%]
 total 4.49797 4.53841 0.14236 [-0.4%] 0.05 (0.96567) (0.99015) (0.11602) [-1.1%] 0.05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /