CL iterate's COLLECT performance notice
I just cleared a "funny" misunderstanding I had about iterate so I thought I'd share it in case it also helps the imaginary person reading this blog, using CL and being similarly misguided.
First of all, iterate is probably the most important of my external systems ("package", for
non-Lispers), it is the only package (again, "namespace") I unconditionally use
simply because it replaces loop
that looks like it was added as-is to ANSI CL only to calm the weird tantrum of an Algol
fanatic who got lost and somehow ended up on the X3J13 committee. And for all the useful additions
too, but mainly because it actually looks like Lisp, yeah.
With that said, you'll probably understand how annoyed one would be my in shoes.
The misunderstanding was: in the documentation for the collect
clause, it is said that collecting items in reverse order is likely to be faster in most Common Lisp implementations
.
And somehow, what my brain started running with is that the result list was probably being
walked during each collect
, as if it expanded into something like (rplacd (last result) (list i))
. Which means that any performance-sensitive loop
that would normally be written
(iter (for i below 10) (collect i))
had to be transformed into
(nreverse (iter (for i below 10) (collect i at start)))
...Ugly, right? So during this week-end, I decided to simply see for myself:
CL-USER> (macroexpand-1 '(iter (for i below 10) (collect i)))
Okay, macroexpand
ed loop code is usually the kind of Lisp only a mother could
love, but even then it's pretty clear from the aptly named gensym
s that iterate is
smart enough to maintain a last cons pointer and keep append an O(1) operation.
Let me tell you, this was big weight off my shoulders! As a bonus, let us put the original claim to the test and compare performance of the two methods:
CL-USER> (prog1 (sb-ext:gc :full t) (time (iter (for i below 100000) (collect i)))) Evaluation took: 0.000 seconds of real time 0.000749 seconds of total run time (0.000633 user, 0.000116 system) 100.00% CPU 2,535,277 processor cycles 1,572,096 bytes consed CL-USER> (prog1 (sb-ext:gc :full t) (time (nreverse (iter (for i below 100000) (collect i at start))))) Evaluation took: 0.001 seconds of real time 0.000856 seconds of total run time (0.000000 user, 0.000856 system) 100.00% CPU 3,007,730 processor cycles 1,572,096 bytes consed
Huh! As expected, maintaining that end pointer isn't as expensive as that final nreverse
; at least here, on SBCL 2.4.10,
AMD Zen 3
So everything's alright in the world, right? Well, almost. My biggest bugbear with iterate
still interacts badly with this whole thing: collect
isn't available in the
prologue/epilogue code. So if wanted to do something like this:
(iter (for i in '(1 2 3)) (collect i) (finally (collect nil)))
to get (1 2 3 NIL)
, I'd need to mangle my pristine loop into this:
(iter (for i in '(1 2 3))
(collect i into res at start)
(finally (return (nreverse (cons nil res)))))
NB: even if collect
did work in there, I'd still need half the
workaround because a user-defined epilogue overrides the implicit return of the collected result,
hence the into res
thing.
Basically, iterate would need a pair of at-first
/at-last
clauses
to remediate this oversight; after all, it already has first-iteration-p
(which needs
a branch per loop iteration instead of doing proper code placement).
I still have to investigate if a potential switch to for would make the implementation of those ideas easy enough (and no code walker is nice).