Emacs htmlize as batch script


Small followup to yesterday's post about implementing :codeblock, the tag to bring automatic syntax highlighting to my :pre.

What I looked at, using Common Lisp as sample language:

  1. pygments: the "standard" choice. Avoided because its parser isn't good enough for my taste (lambda list keywords don't have their own color, docstrings are the same as strings, bloated output with a <span> for each whitespace/parenthese run).
  2. chroma: faster, but same parsers so same problems and not packaged.
  3. syntect: not packaged, it's a library so I'd need to write a simple CLI, their builtin HTML module always inlines CSS and the Sublime syntax spec used didn't seem very good.
  4. tree-sitter: lots of eblow grease needed and I'm not sure about the quality of the Clojure derived CL grammar.

So I finally considered Emacs' batch mode, knowing I already had everything I needed there but fearing the accumulated cost of starting the behemoth (even with --quick) that would have needed some caching if truly horrible.

After an hour or two of flailing around, I got this:

#!/usr/bin/emacs -x

(setq gc-cons-threshold most-positive-fixnum) ;; Disable GC

 ;; Try to load using a likely path and bypass the heavy package-initialize ritual
(let ((path (car (file-expand-wildcards "~/.emacs.d/elpa/htmlize-*/htmlize.el"))))
  (if path
      (load (file-name-sans-extension path))
      (progn (package-initialize)
             (load-library "htmlize"))))

(with-temp-buffer
  (funcall (intern (concat (car argv) "-mode")))
  (buffer-disable-undo)
  (insert-file-contents "/dev/stdin")
  (let ((htmlize-convert-nonascii-to-entities nil)
        (htmlize-ignore-face-size nil)
        (htmlize-css-name-prefix "codeblock-")
        (htmlize-html-major-mode nil))
    (htmlize-buffer (current-buffer) t))
  (princ (buffer-substring
          (search-forward "<pre>")
          (- (search-forward "</pre>") 6))))

In the end, it's actually faster than pygmentize! Around 40 ms for 100 lines of CL vs 55. It started at 140 ms, though, but bypassing the package system dropped me to 50 then disabling the GC helped me squeeze the last drop.

I still added a *codeblock-use-pygmentize* knob for the hypothetical Vim peasant bipolar enough to use my CL SSG :^).