|
Hey!
Thought I'd share my small attempt at syscall golf: lsr. It's a version of ls that uses io_uring to batch as many syscalls as possible - to the point that I am opening / reading / closing /etc/localtime, /etc/passwd, etc using io_uring. The syscall reduction is impressive, and it has some perf gains too over several other ls-replacements. Some benchmarks for time and number of syscalls are summarized below.
By far the biggest missing feature of io_uring for this program is readlink. lsr is able to do massive amounts of batching of statx but every encountered symlink requires it's own additional readlink syscall. This doesn't show up in my benchmarks because the test directory is plain files.
A couple of io_uring related things to know:
- Single threaded
- Queue size = 256
- Flags =
IORING_SETUP_CLAMP | IORING_SETUP_SUBMIT_ALL | IORING_SETUP_COOP_TASKRUN | IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN
- Notably I'm not using
SQPOLL. That would have some benefit particularly for large directories. There is no reason in particular I'm not using it.
Thanks for all the work on io_uring, it's a pleasure to use!
Time
Data gathered with hyperfine on a directory of n plain files.
| Program |
n=10 |
n=100 |
n=1,000 |
n=10,000 |
| lsr -al |
372.6 μs |
634.3 μs |
2.7 ms |
22.1 ms |
| busybox ls -al |
403.8 μs |
1.1 ms |
3.5 ms |
32.5 ms |
| ls -al |
1.4 ms |
1.7 ms |
4.7 ms |
38.0 ms |
| eza -al |
2.9 ms |
3.3 ms |
6.6 ms |
40.2 ms |
| lsd -al |
2.1 ms |
3.5 ms |
17.0 ms |
153.4 ms |
| uutils ls -al |
2.9 ms |
3.6 ms |
11.3 ms |
89.6 ms |
Syscalls
Data gathered with strace -c on a directory of n plain files.
| Program |
n=10 |
n=100 |
n=1,000 |
n=10,000 |
| lsr -al |
20 |
28 |
105 |
848 |
| busybox ls -al |
84 |
410 |
2,128 |
20,383 |
| ls -al |
405 |
675 |
3,377 |
30,396 |
| eza -al |
319 |
411 |
1,320 |
10,364 |
| lsd -al |
508 |
1,408 |
10,423 |
100,512 |
| uutils ls -al |
445 |
986 |
6,397 |
10,005 |
|