While experimenting with GNU parallel I found that the following cases all hang with decreasing CPU usage on a Fedora 41 VM with 8GB RAM. Is this expected behaviour?
parallel --halt now,fail=1 --timeout 2s --memfree 30G echo ::: a b c
parallel --halt now,fail=1 --timeout 2s --memsuspend 30G echo ::: a b c
parallel --timeout 2s --memsuspend 30G echo ::: a b c
parallel --timeout 2s --memfree 30G echo ::: a b c
I'd have expected at least the first or second command to actually timeout and exit with errorcode 3. strace log that shows it's basically spinning and continuously reading /proc/meminfo with an awk subprocess which is in line with expected behaviour (memfreescript
) even though it seems pretty wasteful every 1 second.
Why does it allow --memfree and --memsuspend values much greater than physical RAM ?
Could someone also clarify this section in the manual for --memfree. Does it mean the youngest running job would be killed?
If the jobs take up very different amount of RAM, GNU parallel will only start as many as there is memory for. If less than size bytes are free, no more jobs will be started. If less than 50% size bytes are free, the youngest job will be killed (as per --term-seq), and put back on the queue to be run later.
kill_youngster_if_not_enough_mem
code is relevant but isn't something I quite grasp in relation to the full GNU parallel codebase.
parallel --version GNU parallel 20241222
uname -a Linux host 6.11.4-301.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 20 15:02:33 UTC 2024 x86_64 GNU/Linux
1 Answer 1
continuously reading /proc/meminfo with an awk subprocess which is in line with expected behaviour (memfreescript) even though it seems pretty wasteful every 1 second.
It only does this on GNU/Linux. If run on HP-UX it calls vmstat
instead of reading /proc/meminfo. On FreeBSD it calls sysctl
. The idea is to have a single awk
script that gives the same output no matter the O/S.
It does this every second because we need to know if the free memory has changed. We assume that there are other things running on the machine, so GNU Parallel is not the only program that affects memory usage, thus we cannot predict what the memory usage will be in a second from now.
Why does it allow --memfree and --memsuspend values much greater than physical RAM ?
GNU Parallel does not rigorously every single option for sane combinations. And as you have discovered, it is not sane to use a value greater than physical RAM.
Could someone also clarify this section in the manual for --memfree. Does it mean the youngest running job would be killed?
Yes. So why killing the youngest? Let us assume all jobs need 60% RAM to complete, then no jobs would ever complete if you killed the oldest.
-
Thank you for taking the time to answer. Even with a sane value in my case though, say
parallel --halt now,fail=1 --timeout 2s --memfree 5G echo ::: a b c
the program still hangs since it is timeout per command/job and 5G of RAM is still most of the time not available . "Global" timeout is probably best accomplished by using timeout from coreutils liketimeout 2s <parallel_usage>
?Somniar– Somniar2025年06月01日 06:53:09 +00:00Commented Jun 1 at 6:53 -
@Somniar: I assume that as long as the specified amount of memory is not free,
echo
will not be started and therefore timeout will not take effect.Cyrus– Cyrus2025年06月01日 10:39:09 +00:00Commented Jun 1 at 10:39 -
@Somniar Normal values of
--memfree
are in the order of 10-30% of RAM that is free before starting. If you go much higher it you risk only being able to run a single job - or none at all, which seems to be what you experience.--timeout
is only activated when a job is started, so yes: A global timeout is better done with coreutilstimeout
.Ole Tange– Ole Tange2025年06月01日 13:26:05 +00:00Commented Jun 1 at 13:26
You must log in to answer this question.
Explore related questions
See similar questions with these tags.