GNU parallel out, pararun in
Another third-party tool removed from my toolbox. And not any small bauble, a 16k lines Perl script that often gave me headaches: GNU parallel.
I actually made three (3) replacements because I was incredibly bored:
- pararun: a wrapper around- xargs -Pto provide features like exit on job error or progress reporting.
- pararun -m: the same wrapper using- make -j(POSIX-2024) as command runner.
- pararun_portable: an exercise in futility, more precisely a portable and standalone (without my portability shim) version of the previous script, using only the shell job control and a FIFO.
The resulting syntax is in my opinion much clearer, since it's just an explicit sh command. These days, I much prefer the now familiar "quoting hell" to "automagic, DSL and many-ways-to-do-stuff hell", especially when I have to debug it. Worse is truly better, in some cases.
$ parallel magick {} {.}.png ::: *.bmp
$ pararun 'magick "$1" "${1%.*}".png' *.bmp
Parallel's --keep-order flag can be easily emulated via the JOBNUM variable (equiv. to parallel's {#}) made available to the command:
$ parallel -j4 -k sleep {}\; echo {} ::: 2 1 4 3
$ pararun -j4 'sleep $1; echo $JOBNUM $1' 2 1 4 3 | sort -k1n | cut -d' ' -f2-
In fact, here's the final wrapper: paramap
The comparison stops here because, yes, parallel has much more features. Especially its remote job distribution via ssh, which would be a lot more work to duplicate correctly.
Or does it? Using the make backend with GNU make (and bmake, if I understand
 their job token pool
 thing correctly) brings something I have always wanted: jobserver orchestration for nested instances, to never let a CPU core go unused.
Worth mentioning that this backend has an inconvenient: make - only starts
 working after stdin is closed. This "work" includes parsing and dependency resolution, which
 means a small startup overhead depending on the number of jobs . Said overhead isn't massive,
 but it's there:
$ seq 1000 | time pararun '' pararun '' 0.21s user 0.12s system 177% cpu 0.186 total $ seq 1000 | time pararun -m '' pararun -m '' 1.49s user 0.77s system 186% cpu 1.211 total