World Playground Deceit.net

Speeding up cljqalbum


Part 2 of my cljq adventures, this time around one of my end-user applications: the music (album) query tools I use with rymscrap.

To make it short, everything went fine, I was able to enact my migration simply by swapping the command called via find -exec {} ; and get that:

$ jqalbum   'has_genre("Sludge Metal") and year < 2000'
/home/user/Music/Acid Bath/(1994) When the Kite String Pops/album.json
/home/user/Music/Acid Bath/(1996) Paegan Terrorism Tactics/album.json
…
$ cljqalbum '(and (has-genre "Sludge Metal") (< year 2000))'
/home/user/Music/Acid Bath/(1994) When the Kite String Pops/album.json
/home/user/Music/Acid Bath/(1996) Paegan Terrorism Tactics/album.json
…

But as for the performance…

$ jq --version; sbcl --version
jq-1.7.1
SBCL 2.5.2
$ hyperfine 'jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \
            'cljqalbum '\''(and (has-genre "Sludge Metal") (< year 2000))'\'''
Benchmark 1: jqalbum 'has_genre("Sludge Metal") and year < 2000'
  Time (mean ± σ):      3.531 s ±  0.028 s    [User: 1.949 s, System: 1.573 s]
  Range (min … max):    3.490 s …  3.570 s    10 runs

Benchmark 2: cljqalbum '(and (has-genre "Sludge Metal") (< year 2000))'
  Time (mean ± σ):      8.373 s ±  0.055 s    [User: 3.444 s, System: 4.973 s]
  Range (min … max):    8.302 s …  8.456 s    10 runs

Summary
  jqalbum 'has_genre("Sludge Metal") and year < 2000' ran
    2.37 ± 0.02 times faster than cljqalbum '(and (has-genre "Sludge Metal") (< year 2000))'

Unacceptable. But I don't cry uncle this easily so I started to investigate the two suspects that immediately came to mind:

  1. Simply the repeated cost of launching SBCL, reloading the bundled image, compiling the album query file and the query itself for each small JSON.
  2. Maybe those short query functions' implementation (album.jq vs album.lisp), as I know cl-ppcre - the de facto standard CL regexp lib - can be a tad slow. Nothing I can do about it without ditching regexps, and that'd be both painful and unfair in the context of benchmarking.

So I decided to at least solve the first one and wrap everything inside a single Lisp process. Didn't even have to re-implement find, as uiop:launch-program gives me popen-like process spawning.

(declaim (optimize (speed 3) (debug 0) (safety 0)))

(require "asdf")
(asdf:load-system "q3cpma-json-utils")
(ql:quickload '("com.inuoe.jzon" "cl-ppcre" "iterate") :silent t)

(defpackage #:cljqalbum (:use #:cl #:iterate) (:export #:toplevel))
(in-package #:cljqalbum)

(defmacro ?  (json &rest path) `(q3cpma-json:query ,json ',path))
(defmacro ?1 (json &rest path) `(car (q3cpma-json:query ,json ',path)))
(defparameter $ nil)

(load (merge-pathnames "album" *load-truename*))

(defun toplevel ()
  (destructuring-bind (form-str &optional (dir (uiop:native-namestring
                                                (merge-pathnames "Music/" (user-homedir-pathname)))))
      (uiop:command-line-arguments)
    (let ((find-process (uiop:launch-program `("find" "-L" ,dir "-type" "f" "-name" "album.json")
                                             :output :stream))
          (form-fun (let ((*package* (find-package :cljqalbum)))
                      (compile nil `(lambda () ,(read-from-string form-str))))))
      (iter (for line = (read-line (uiop:process-info-output find-process) nil nil))
        (while line)
        (let (($ (com.inuoe.jzon:parse (uiop:parse-native-namestring line))))
          (when (funcall form-fun)
            (write-line line)))))))

Let's see the result with both SBCL and Clozure CL:

$ hyperfine 'jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \
            'sbcl … '\''(and (has-genre "Sludge Metal") (< year 2000))'\''' \
            'ccl  … '\''(and (has-genre "Sludge Metal") (< year 2000))'\'''
Benchmark 1: jqalbum 'has_genre("Sludge Metal") and year < 2000'
  Time (mean ± σ):      3.530 s ±  0.027 s    [User: 1.923 s, System: 1.597 s]
  Range (min … max):    3.474 s …  3.560 s    10 runs

Benchmark 2: sbcl … '(and (has-genre "Sludge Metal") (< year 2000))'
  Time (mean ± σ):     396.8 ms ±   2.9 ms    [User: 327.8 ms, System: 94.0 ms]
  Range (min … max):   393.4 ms … 402.4 ms    10 runs

Benchmark 3: ccl  … '(and (has-genre "Sludge Metal") (< year 2000))'
  Time (mean ± σ):      1.298 s ±  0.006 s    [User: 1.243 s, System: 0.082 s]
  Range (min … max):    1.292 s …  1.308 s    10 runs

Summary
  sbcl … '(and (has-genre "Sludge Metal") (< year 2000))' ran
    3.27 ± 0.03 times faster than ccl … '(and (has-genre "Sludge Metal") (< year 2000))'
    8.90 ± 0.09 times faster than jqalbum 'has_genre("Sludge Metal") and year < 2000'

Alright, already crushing jq by a x9 factor! Now I'm kind of obligated to go all in and try with "executable images" (fat bundling of the runtime with a dumped image; around 40 MB):

$ sbcl --no-userinit --load ~/.local/lib/quicklisp/setup.lisp --script cljq/make-cljqalbum.lisp cljqalbum.sbcl
$ ccl --no-init --load ~/.local/lib/quicklisp/setup.lisp --load cljq/make-cljqalbum.lisp --eval '(uiop:quit)' -- cljqalbum.ccl
$ hyperfine 'jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \
            'cljqalbum.sbcl '\''(and (has-genre "Sludge Metal") (< year 2000))'\''' \
            'cljqalbum.ccl  '\''(and (has-genre "Sludge Metal") (< year 2000))'\'''
Benchmark 1: jqalbum 'has_genre("Sludge Metal") and year < 2000'
  Time (mean ± σ):      3.541 s ±  0.041 s    [User: 1.957 s, System: 1.574 s]
  Range (min … max):    3.480 s …  3.618 s    10 runs

Benchmark 2: cljqalbum.sbcl '(and (has-genre "Sludge Metal") (< year 2000))'
  Time (mean ± σ):      57.9 ms ±   1.5 ms    [User: 41.5 ms, System: 41.5 ms]
  Range (min … max):    55.4 ms …  62.8 ms    50 runs

Benchmark 3: cljqalbum.ccl  '(and (has-genre "Sludge Metal") (< year 2000))'
  Time (mean ± σ):     176.8 ms ±   2.7 ms    [User: 151.0 ms, System: 51.7 ms]
  Range (min … max):   174.2 ms … 183.3 ms    16 runs

Summary
  cljqalbum.sbcl '(and (has-genre "Sludge Metal") (< year 2000))' ran
    3.06 ± 0.09 times faster than cljqalbum.ccl  '(and (has-genre "Sludge Metal") (< year 2000))'
   61.18 ± 1.71 times faster than jqalbum 'has_genre("Sludge Metal") and year < 2000'

jq is now thoroughly steamrolled, not merely crushed. Mission accomplished. For the second performance point, I could try to replace cl-ppcre with the experimental one-more-re-nightmare, but I'm not super confident about the potential wins (and it doesn't have full POSIX ERE support yet).


A small 2025-03-29 edit to compare the original jq with two popular drop-in replacements: jaq and gojq.

$ hyperfine 'JQ=jq   jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \
            'JQ=gojq jqalbum '\''has_genre("Sludge Metal") and year < 2000'\''' \
            'JQ=jaq  jqalbum '\''has_genre("Sludge Metal") and year < 2000'\'''
Benchmark 1: JQ=jq   jqalbum 'has_genre("Sludge Metal") and year < 2000'
  Time (mean ± σ):      3.389 s ±  0.018 s    [User: 1.888 s, System: 1.496 s]
  Range (min … max):    3.362 s …  3.424 s    10 runs

Benchmark 2: JQ=gojq jqalbum 'has_genre("Sludge Metal") and year < 2000'
  Time (mean ± σ):      3.407 s ±  0.018 s    [User: 0.889 s, System: 2.475 s]
  Range (min … max):    3.377 s …  3.435 s    10 runs

Benchmark 3: JQ=jaq  jqalbum 'has_genre("Sludge Metal") and year < 2000'
  Time (mean ± σ):      2.512 s ±  0.014 s    [User: 0.945 s, System: 1.566 s]
  Range (min … max):    2.489 s …  2.532 s    10 runs

Summary
  JQ=jaq  jqalbum 'has_genre("Sludge Metal") and year < 2000' ran
    1.35 ± 0.01 times faster than JQ=jq   jqalbum 'has_genre("Sludge Metal") and year < 2000'
    1.36 ± 0.01 times faster than JQ=gojq jqalbum 'has_genre("Sludge Metal") and year < 2000'