Fast SEQUENCE iteration in Common Lisp
If you don't know what sequences are in CL, here's the gist of it: either a
linked list or a vector. The main idea being that some operations like element search, comparison or deletion should
have the same interface for both those types.
Basically, sequences are a band-aid over the lack of real iterator protocol (yes, even
considering the mildy supported extensible sequences thing). But well, that's what we have to work with and it's not so bad really; we could be in
barren C-ountry after all.
One of the problems of sequences is that if you want to iterate over these while supporting the
conventional keywords (:start, :end, :key) without writing
entirely separate (and gnarly) versions, there are only two unified interfaces provided by ANSI:
elt+length+loop(ordo): naïve and simple, what's actually used under the hood byiterate'sin-sequencedriver. Also quadratic (thus extremely slow) on lists.reduce: even simpler as it handles all the keywords for you (even:from-end) and much faster since any non-toy implementation internally dispatches over the actual sequence type. Still not ideal as it has the overhead of repeatfuncallof your closure and you don't have access to the often needed internal raw element (without:keyapplied) or iteration index.
So when I made a naïve max-elt (for this years' Advent of Code) then was forced to rewrite it to use reduce, I said to myself I must build that Lovecraftian macro
to handle the required nested monomorphization
… so here's the beast:
;; Support :FROM-END ? Would add yet another level of monomorphization though... (defmacro do-sequence ((var seq &key (start 0) end key with-index with-raw-elt) &body body) "Iterate on a sequence with the usual keywords available. :WITH-INDEX (resp. :WITH-RAW-ELT) takes an unevaluated symbol to use as index (resp. element without :KEY applied). NB: since iteration is done via `(LOOP ... :DO (LOCALLY ,@BODY)), said body can use RETURN and contain declarations." (once-only (seq start end key) (macrolet ((list-impl (has-key-p has-end-p) `(let ((ivar (or with-index (gensym "IDX"))) (rvar (or with-raw-elt (gensym "RAW")))) (declare (ignorable ivar rvar)) `(loop ,@(when (not ,has-key-p) `(:with ,rvar)) ;; Silence SBCL/CCL warnings ,@(when (or ,has-end-p with-index) `(:for ,ivar :of-type (integer 0 #.array-dimension-limit) :from ,start ,@(when ,has-end-p `(:below ,end)))) ,@(if ,has-key-p `(:for ,rvar :in ,seq :for ,var := (funcall ,key ,rvar)) `(:for ,var :in ,seq)) :do (locally ,@body)))) (vec-impl (has-key-p) `(let ((ivar (or with-index (gensym "IDX"))) (rvar (or with-raw-elt (gensym "RAW")))) (declare (ignorable ivar rvar)) `(loop ,@(when (not ,has-key-p) `(:with ,rvar)) ;; Silence SBCL/CCL warnings :for ,ivar :of-type (integer 0 #.array-dimension-limit) :from ,start :below (or ,end (length ,seq)) ,@(if ,has-key-p `(:for ,rvar := (aref ,seq ,ivar) :for ,var := (funcall ,key ,rvar)) `(:for ,var := (aref ,seq ,ivar))) :do (locally ,@body))))) `(etypecase ,seq (list (cond ((and ,key ,end) ,(list-impl t t)) (,key ,(list-impl t nil)) (,end ,(list-impl nil t)) (t ,(list-impl nil nil)))) ((simple-array * (*)) ;; Same impl. as VECTOR, monomorphize for performance (if ,key ,(vec-impl t) ,(vec-impl nil))) (vector (if ,key ,(vec-impl t) ,(vec-impl nil)))))))
Except for the de facto standard once-only macro, it is ANSI CL
compliant. You can copy it under the Zlib license, if you wish so.
Now, for some benchmarks! Here I compare the aforementioned max-elt core loop
written with reduce then with do-sequence against large sequences of
different types; needing both the index and raw element somewhat complicates the result and
illustrate why I included :with-index and :with-raw-elt.
(defun max-elt-reduce (seq pred &key (start 0) end key) (let* ((i start) (maxi start) (max (elt seq 0))) (declare (type (integer 0 #.array-dimension-limit) i maxi)) (reduce (if key ;; Hoist NIL key checking (lambda (kmax el) (incf i) (let ((kel (funcall key el))) (if (funcall pred kel kmax) (progn (setf maxi i max el) kel) kmax))) (lambda (kmax el) (incf i) (if (funcall pred el kmax) (progn (setf maxi i max el) el) kmax))) seq :start (+ start 1) :end end :initial-value (if key (funcall key max) max)) (values max maxi))) (defun max-elt-do-seq (seq pred &key (start 0) end key) (let* ((maxi start) (rmax (elt seq 0)) (max (if key (funcall key rmax) rmax))) (do-sequence (el seq :start start :end end :key key :with-index i :with-raw-elt r) (when (funcall pred el max) (setf maxi i rmax r max el))) (values rmax maxi))) (defconstant +len+ 5000000) (let ((l (make-sequence 'list +len+ :initial-element 0)) (sv (make-sequence 'vector +len+ :initial-element 0)) (fsv (make-array +len+ :element-type 'fixnum :initial-element 0)) (fv (make-array +len+ :element-type 'fixnum :initial-element 0 :adjustable t))) (format t "Test,~A~%" (lisp-implementation-type)) (loop :for (args test) :in `(((,l ,#'>) "LIST") ((,l ,#'> :start 100 :end ,(- +len+ 100) :key ,#'1+) "LIST (fiddly)") ((,sv ,#'>) "SIMPLE-VECTOR") ((,fsv ,#'>) "(SIMPLE-ARRAY FIXNUM)") ((,fv ,#'>) "(VECTOR FIXNUM)")) :do (let ((tref (nth-value 1 (measure (apply #'max-elt-reduce args)))) (tnew (nth-value 1 (measure (apply #'max-elt-do-seq args))))) (format t "~A,~,0F → ~,0F (~,0@F%)~%" test (* tref 1000) (* tnew 1000) (* 100 (- (/ tref tnew) 1))))))
And here are the results on some popular implementations (durations in ms):
| Test | SBCL | CCL | ECL | CLISP |
|---|---|---|---|---|
| LIST | 35 → 25 (+40%) | 119 → 63 (+89%) | 886 → 382 (+132%) | 314 → 172 (+83%) |
| LIST (fiddly) | 49 → 35 (+40%) | 148 → 74 (+101%) | 1113 → 588 (+89%) | 383 → 325 (+18%) |
| SIMPLE-VECTOR | 42 → 33 (+27%) | 143 → 84 (+69%) | 858 → 510 (+68%) | 338 → 238 (+42%) |
| (SIMPLE-ARRAY FIXNUM) | 41 → 34 (+21%) | 144 → 86 (+67%) | 897 → 484 (+85%) | 337 → 237 (+42%) |
| (VECTOR FIXNUM) | 52 → 35 (+49%) | 180 → 123 (+47%) | 866 → 519 (+67%) | 339 → 249 (+36%) |
Benchmarking details
- Versions used: SBCL 2.5.11, CCL 1.13, ECL 24.5.10, CLISP 2.49.92 (bytecode compiler used)
- Hardware: AMD 5900X, 64 GB DDR4
- Software: Gentoo Linux, linux 6.12, gcc 15.2, glibc 2.41
optimize:(speed 3) (safety 0) (debug 0), speedups are comparable without, but used for the sake of benchmarking.
Well, isn't that welcome? Both the performance and readability gains were worth the effort, in
my opinion; especially outside of SBCL which seems to have a particularly well optimized reduce.
Beware code bloat and compilation slowdown though, such aggressive monomorphization of
loop macros isn't free; so don't inline unless you intend on exploiting your
compiler's DCE pass.