http://qs321.pair.com?node_id=794918


in reply to Convert PDF to HTML (or JPEG)

For PDF to JPG (or any other raster image format like PNG or TIFF), you could use GhostScript to do the conversion:

$ gs -q -dBATCH -dNOPAUSE -sDEVICE=jpeg -dJPEGQ88 -r150 -sOutputFile=i +mg%d.jpg input.pdf

This would create as many images (img1.jpg to imgN.jpg) as there are pages in the PDF file.  -r is the resolution in dpi (150dpi would create an image size of 1240x1754 for A4 paper size), and -dJPEGQ is the quality factor (up to 100).

Unfortunately, this doesn't do any anti-aliasing, so the fonts typically look rather ragged...  You can work around that problem by doing the anti-aliasing yourself; which means, you'd have to oversample while rendering from PDF to raster (e.g. by a factor of 4, i.e. 600dpi) and then downsample with an appropriate filter.

ImageMagick's convert can be used for the latter. The complete sequence of steps would be:

$ gs -q -dBATCH -dNOPAUSE -sDEVICE=jpeg -dJPEGQ88 -r600 -sOutputFile=i +mg%d.jpg input.pdf $ for img in img*.jpg ; do convert $img -filter Lanczos -resize 25% -q +uality 90 out_$img ; done

The resulting anti-aliased images out_img*.jpg would then have 150dpi resolution.

In case you have the non-/usr/bin-namespace-polluting sister GraphicsMagick installed (instead of ImageMagick), the command would be gm convert ...

(Those who hold a degree in Signal Processing - or have come in contact with filter design in some other context - might want to take a look at the list of filters to choose from — in case of doubt, stick with Lanczos or Kaiser for somewhat sharper, or Gaussian or Cubic for somewhat softer results.)

Also, there's documentation - well hidden from daylight - under /usr/share/doc/ghostscript/Devices.htm, which explains what options are available with the individual Ghostscript output devices (you usually need to have another package installed (e.g. ghostscript-doc on Debian/Ubuntu) to have that file).