Speeding up cljqalbum
Part 2 of my cljq
adventures, this time around one of my end-user applications: the music (album) query tools I
use with rymscrap
.
To make it short, everything went fine, I was able to enact my migration simply by swapping
the command called via find -exec {} ;
and get that:
$ jqalbum 'has_genre("Sludge Metal") and year < 2000' /home/user/Music/Acid Bath/(1994) When the Kite String Pops/album.json /home/user/Music/Acid Bath/(1996) Paegan Terrorism Tactics/album.json … $ cljqalbum '(and (has-genre "Sludge Metal") (< year 2000))' /home/user/Music/Acid Bath/(1994) When the Kite String Pops/album.json /home/user/Music/Acid Bath/(1996) Paegan Terrorism Tactics/album.json …
But as for the performance…
$ jq --version; sbcl --version jq-1.7.1 SBCL 2.5.2 $ hyperfine 'jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \ 'cljqalbum '\''(and (has-genre "Sludge Metal") (< year 2000))'\''' Benchmark 1: jqalbum 'has_genre("Sludge Metal") and year < 2000' Time (mean ± σ): 3.531 s ± 0.028 s [User: 1.949 s, System: 1.573 s] Range (min … max): 3.490 s … 3.570 s 10 runs Benchmark 2: cljqalbum '(and (has-genre "Sludge Metal") (< year 2000))' Time (mean ± σ): 8.373 s ± 0.055 s [User: 3.444 s, System: 4.973 s] Range (min … max): 8.302 s … 8.456 s 10 runs Summary jqalbum 'has_genre("Sludge Metal") and year < 2000' ran 2.37 ± 0.02 times faster than cljqalbum '(and (has-genre "Sludge Metal") (< year 2000))'
Unacceptable. But I don't cry uncle this easily so I started to investigate the two suspects that immediately came to mind:
- Simply the repeated cost of launching SBCL, reloading the bundled image, compiling the album query file and the query itself for each small JSON.
- Maybe those short query functions' implementation (album.jq vs album.lisp), as I know cl-ppcre - the de facto standard CL regexp lib - can be a tad slow. Nothing I can do about it without ditching regexps, and that'd be both painful and unfair in the context of benchmarking.
So I decided to at least solve the first one and wrap everything inside a single Lisp
process. Didn't even have to re-implement find, as uiop:launch-program
gives me popen
-like process spawning.
(declaim (optimize (speed 3) (debug 0) (safety 0))) (require "asdf") (asdf:load-system "q3cpma-json-utils") (ql:quickload '("com.inuoe.jzon" "cl-ppcre" "iterate") :silent t) (defpackage #:cljqalbum (:use #:cl #:iterate) (:export #:toplevel)) (in-package #:cljqalbum) (defmacro ? (json &rest path) `(q3cpma-json:query ,json ',path)) (defmacro ?1 (json &rest path) `(car (q3cpma-json:query ,json ',path))) (defparameter $ nil) (load (merge-pathnames "album" *load-truename*)) (defun toplevel () (destructuring-bind (form-str &optional (dir (uiop:native-namestring (merge-pathnames "Music/" (user-homedir-pathname))))) (uiop:command-line-arguments) (let ((find-process (uiop:launch-program `("find" "-L" ,dir "-type" "f" "-name" "album.json") :output :stream)) (form-fun (let ((*package* (find-package :cljqalbum))) (compile nil `(lambda () ,(read-from-string form-str)))))) (iter (for line = (read-line (uiop:process-info-output find-process) nil nil)) (while line) (let (($ (com.inuoe.jzon:parse (uiop:parse-native-namestring line)))) (when (funcall form-fun) (write-line line)))))))
Let's see the result with both SBCL and Clozure CL:
$ hyperfine 'jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \ 'sbcl … '\''(and (has-genre "Sludge Metal") (< year 2000))'\''' \ 'ccl … '\''(and (has-genre "Sludge Metal") (< year 2000))'\''' Benchmark 1: jqalbum 'has_genre("Sludge Metal") and year < 2000' Time (mean ± σ): 3.530 s ± 0.027 s [User: 1.923 s, System: 1.597 s] Range (min … max): 3.474 s … 3.560 s 10 runs Benchmark 2: sbcl … '(and (has-genre "Sludge Metal") (< year 2000))' Time (mean ± σ): 396.8 ms ± 2.9 ms [User: 327.8 ms, System: 94.0 ms] Range (min … max): 393.4 ms … 402.4 ms 10 runs Benchmark 3: ccl … '(and (has-genre "Sludge Metal") (< year 2000))' Time (mean ± σ): 1.298 s ± 0.006 s [User: 1.243 s, System: 0.082 s] Range (min … max): 1.292 s … 1.308 s 10 runs Summary sbcl … '(and (has-genre "Sludge Metal") (< year 2000))' ran 3.27 ± 0.03 times faster than ccl … '(and (has-genre "Sludge Metal") (< year 2000))' 8.90 ± 0.09 times faster than jqalbum 'has_genre("Sludge Metal") and year < 2000'
Alright, already crushing jq by a x9 factor! Now I'm kind of obligated to go all in and try with "executable images" (fat bundling of the runtime with a dumped image; around 40 MB):
$ sbcl --no-userinit --load ~/.local/lib/quicklisp/setup.lisp --script cljq/make-cljqalbum.lisp cljqalbum.sbcl $ ccl --no-init --load ~/.local/lib/quicklisp/setup.lisp --load cljq/make-cljqalbum.lisp --eval '(uiop:quit)' -- cljqalbum.ccl $ hyperfine 'jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \ 'cljqalbum.sbcl '\''(and (has-genre "Sludge Metal") (< year 2000))'\''' \ 'cljqalbum.ccl '\''(and (has-genre "Sludge Metal") (< year 2000))'\''' Benchmark 1: jqalbum 'has_genre("Sludge Metal") and year < 2000' Time (mean ± σ): 3.541 s ± 0.041 s [User: 1.957 s, System: 1.574 s] Range (min … max): 3.480 s … 3.618 s 10 runs Benchmark 2: cljqalbum.sbcl '(and (has-genre "Sludge Metal") (< year 2000))' Time (mean ± σ): 57.9 ms ± 1.5 ms [User: 41.5 ms, System: 41.5 ms] Range (min … max): 55.4 ms … 62.8 ms 50 runs Benchmark 3: cljqalbum.ccl '(and (has-genre "Sludge Metal") (< year 2000))' Time (mean ± σ): 176.8 ms ± 2.7 ms [User: 151.0 ms, System: 51.7 ms] Range (min … max): 174.2 ms … 183.3 ms 16 runs Summary cljqalbum.sbcl '(and (has-genre "Sludge Metal") (< year 2000))' ran 3.06 ± 0.09 times faster than cljqalbum.ccl '(and (has-genre "Sludge Metal") (< year 2000))' 61.18 ± 1.71 times faster than jqalbum 'has_genre("Sludge Metal") and year < 2000'
The jq version is now thoroughly steamrolled, not merely crushed. Mission accomplished. For the second performance point, I could try to replace cl-ppcre with the experimental one-more-re-nightmare, but I'm not super confident about the potential wins (and it doesn't have full POSIX ERE support yet).
This illustrates the main reason why you should build tools as libraries instead of binaries with their own interpreter: you can easily turn the first into the second via a wrapper, but not the reverse. Well, you can, but you'll suffer the binary startup overhead (for jq, this is pretty high seemingly due to the builtins' registration) and losses from serialization/parsing.
A small 2025-03-29 edit to compare the original jq with two popular drop-in replacements: jaq and gojq.
$ hyperfine 'JQ=jq jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \ 'JQ=gojq jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \ 'JQ=jaq jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' Benchmark 1: JQ=jq jqalbum 'has_genre("Sludge Metal") and year < 2000' Time (mean ± σ): 3.389 s ± 0.018 s [User: 1.888 s, System: 1.496 s] Range (min … max): 3.362 s … 3.424 s 10 runs Benchmark 2: JQ=gojq jqalbum 'has_genre("Sludge Metal") and year < 2000' Time (mean ± σ): 3.407 s ± 0.018 s [User: 0.889 s, System: 2.475 s] Range (min … max): 3.377 s … 3.435 s 10 runs Benchmark 3: JQ=jaq jqalbum 'has_genre("Sludge Metal") and year < 2000' Time (mean ± σ): 2.512 s ± 0.014 s [User: 0.945 s, System: 1.566 s] Range (min … max): 2.489 s … 2.532 s 10 runs Summary JQ=jaq jqalbum 'has_genre("Sludge Metal") and year < 2000' ran 1.35 ± 0.01 times faster than JQ=jq jqalbum 'has_genre("Sludge Metal") and year < 2000' 1.36 ± 0.01 times faster than JQ=gojq jqalbum 'has_genre("Sludge Metal") and year < 2000'