World Playground Deceit.net

CL iterate's COLLECT performance notice


I just cleared a "funny" misunderstanding I had about iterate so I thought I'd share it in case it also helps the imaginary person reading this blog, using CL and being similarly misguided.

First of all, iterate is probably the most important of my external systems ("package", for non-Lispers), it is the only package (again, "namespace") I unconditionally use simply because it replaces loop that looks like it was added as-is to ANSI CL only to calm the weird tantrum of an Algol fanatic who got lost and somehow ended up on the X3J13 committee. And for all the useful additions too, but mainly because it actually looks like Lisp, yeah.

With that said, you'll probably understand how annoyed one would be my in shoes.

The misunderstanding was: in the documentation for the collect clause, it is said that collecting items in reverse order is likely to be faster in most Common Lisp implementations.

And somehow, what my brain started running with is that the result list was probably being walked during each collect, as if it expanded into something like (rplacd (last result) (list i)). Which means that any performance-sensitive loop that would normally be written

(iter (for i below 10)
  (collect i))

had to be transformed into

(nreverse
 (iter (for i below 10)
   (collect i at start)))

...Ugly, right? So during this week-end, I decided to simply see for myself:

CL-USER> (macroexpand-1 '(iter (for i below 10) (collect i)))
(LET* ((I NIL) (#:RESULT30 NIL) (#:END-POINTER31 NIL) (#:TEMP32 NIL))
  (BLOCK NIL
    (BLOCK #:ITERATE331
      (TAGBODY
        (PROGN (SETQ I -1))
       LOOP-TOP-NIL
        (PROGN
         (SETQ I (+ I 1))
         (IF (>= I 10)
             (GO LOOP-END-NIL))
         (PROGN
          (SETQ #:TEMP32 (LIST I))
          (SETQ #:END-POINTER31
                  (IF #:RESULT30
                      (SETF (CDR #:END-POINTER31) #:TEMP32)
                      (SETQ #:RESULT30 #:TEMP32)))
          #:RESULT30))
        (PROGN)
        (GO LOOP-TOP-NIL)
       LOOP-END-NIL
        (PROGN))
      #:RESULT30)))
T

Okay, macroexpanded loop code is usually the kind of Lisp only a mother could love, but even then it's pretty clear from the aptly named gensyms that iterate is smart enough to maintain a last cons pointer and keep append an O(1) operation.

Let me tell you, this was big weight off my shoulders! As a bonus, let us put the original claim to the test and compare performance of the two methods:

CL-USER> (prog1 (sb-ext:gc :full t)
               (time (iter (for i below 100000) (collect i))))
Evaluation took:
  0.000 seconds of real time
  0.000749 seconds of total run time (0.000633 user, 0.000116 system)
  100.00% CPU
  2,535,277 processor cycles
  1,572,096 bytes consed

NIL
CL-USER> (prog1 (sb-ext:gc :full t)
               (time (nreverse (iter (for i below 100000) (collect i at start)))))
Evaluation took:
  0.001 seconds of real time
  0.000856 seconds of total run time (0.000000 user, 0.000856 system)
  100.00% CPU
  3,007,730 processor cycles
  1,572,096 bytes consed

NIL

Huh! As expected, maintaining that end pointer isn't as expensive as that final nreverse; at least here, on SBCL 2.4.10, AMD Zen 3

So everything's alright in the world, right? Well, almost. My biggest bugbear with iterate still interacts badly with this whole thing: collect isn't available in the prologue/epilogue code. So if wanted to do something like this:

(iter (for i in '(1 2 3))
  (collect i)
  (finally (collect nil)))

to get (1 2 3 NIL), I'd need to mangle my pristine loop into this:

(iter (for i in '(1 2 3))
  (collect i into res at start)
  (finally (return (nreverse (cons nil res)))))

NB: even if collect did work in there, I'd still need half the workaround because a user-defined epilogue overrides the implicit return of the collected result, hence the into res thing.

Basically, iterate would need a pair of at-first/at-last clauses to remediate this oversight; after all, it already has first-iteration-p (which needs a branch per loop iteration instead of doing proper code placement).


I still have to investigate if a potential switch to for would make the implementation of those ideas easy enough (and no code walker is nice).