GNU parallel out, pararun in
Another third-party tool removed from my toolbox. And not any small bauble, a 16k lines Perl script that often gave me headaches: GNU parallel.
I actually made three (3) replacements because I was incredibly bored:
pararun
: a wrapper aroundxargs -P
to provide features like exit on job error or progress reporting.pararun -m
: the same wrapper usingmake -j
(POSIX-2024) as command runner.pararun_portable
: an exercise in futility, more precisely a portable and standalone (without my portability shim) version of the previous script, using only the shell job control and a FIFO.
The resulting syntax is in my opinion much clearer, since it's just an explicit sh command. These days, I much prefer the now familiar "quoting hell" to "automagic, DSL and many-ways-to-do-stuff hell", especially when I have to debug it. Worse is truly better, in some cases.
$ bmps=$(find dir/ -type f -name '*.bmp') $ echo "$bmps" | parallel magick {} {.}.png $ echo "$bmps" | pararun 'magick "$1" "${1%.*}".png'
Parallel's --keep-order
flag can be easily emulated via the JOBNUM
variable (equiv. to parallel's {#}
) made available to the command:
$ parallel -j4 -k sleep {}\; echo {} ::: 2 1 4 3 $ printf '%s\n' 2 1 4 3 | pararun -j4 'sleep $1; echo $JOBNUM $1' | sort -k1n | cut -d' ' -f2-
In fact, here's the final wrapper: paramap
The comparison stops here because, yes, parallel has much more features. Especially its remote job distribution via ssh, which would be a lot more work to duplicate correctly.
Or does it? Using the make
backend with GNU make (and bmake, if I understand
their job token pool
thing correctly) brings something I have always wanted: jobserver orchestration for nested instances, to never let a CPU core go unused.
Worth mentioning that this backend has an inconvenient: make -
only starts
working after stdin is closed. This "work" includes parsing and dependency resolution, which
means a small startup overhead depending on the number of jobs . Said overhead isn't massive,
but it's there:
$ seq 1000 | time pararun '' pararun '' 0.21s user 0.12s system 177% cpu 0.186 total $ seq 1000 | time pararun -m '' pararun -m '' 1.49s user 0.77s system 186% cpu 1.211 total