World Playground Deceit.net

PDF to CBZ for e-reader


A small post about the "correct" way to convert a PDF to CBZ. But first, why? In my case, I had just finished my novel and wanted to try and see if the much hyped House of Leaves was all that. The thing is that typesetting is pretty important in that novel, so EPUB or even text PDF isn't going to cut it, which is why I obtained it as a scanned PDF (i.e. containing raster pictures).

When I tried it on my e-reader (Kobo Aura One running koreader), each page took something like 10~20s to turn. Unacceptable. Since I already had some knowledge from prior e-reader conversion scripts, I decided to precompute as much as possible and save the result.

For this recipe, you'll need to know your e-reader's display resolution (1404x1872 for the Aura One) and its grey level count (16 for anything even slightly modern). Then it's just a matter of using the right tools:

mkdir work && cd work
# Convert the PDF to images fitting the final dimensions
mutool draw -o %04d.ppm -w 1404 -h 1872 '../(2000) House of Leaves.pdf'
# Convert to grayscale, apply a bit of sigmoidal contrast to clean up and
# quantize to 16 colors with proper dithering
ls *.ppm | pararun -p 'magick "$1" -colorspace gray -sigmoidal-contrast 7x50% \
    -dither FloydSteinberg -colors 16 -depth 8 -define png:compression-level=0 \
    "${1%.ppm}.png"'
# Finally compress the results
oxipng --ng -o3 -s *.png
# And zip it
zip '../(2000) House of Leaves.cbz' *.png

And there you are! Sure, the book went from 34 MB to 240, but page turns are instantaneous and it's much easier to read thanks to the increased contrast. Worth mentioning that this inspired me to add a sigmoidal contrast operator to pyvips (still have to write the sRGB (u8) -> scRGB (float) -> numpy -> sRGB (u8) codepath) during this afternoon and that I'm planning to add a way to import GIMP Curves in there too.