Scanning tools and software
Panasonic
KV-SS55EX duplex grayscale 300dpi scanner
Contex FSS 8300 COPY DSP
large-format (E-size) scanner (ist gut, ja?)
Contex
WIDEimage imaging software
Adobe Capture 3.0
Adobe Acrobat
(including Adobe Catalog)
libtiff v3.5.6
ImageMagick (and Perl
Magick)
Kodak
Imaging for Windows
TMSSequoia ScanFix 4.2
Notes
Skew
Document skew is a problem for OCR software, it substantially degrades the
recognition. Although one source of skew is misaligned feeding from a
scanner, the most prevalent source is misalignment is skew already present in
the original text from the printing process used.
Adobe Acrobat contains it's own skew detection and correction engine.
It's not very good, at least at the correction. If you see a word that has
a strange offset split it in, that what it's from. Skew correction isn't
as simple as rotating the text to un-skew it, in a bitonal image this results in
character distortion that interferes with recognition. There are various techniques
around this, unfortunately the one Capture uses isn't very good. The
deskew engine in scanfix is better, and I'll be a running a few tests using it,
but I don't have a way to disable the internal Capture deskew so there may be
some undesirable interaction.
My long term plan for deskew (unless Capture 4.0 is substantially better) is
to convert all of the images to grayscale, detect the skew externally and then
use true rotation to deskew them. Characters aren't deformed in grayscale
since pixels can now contain intermediate values instead of just on or
off. This grayscale conversion doesn't result in a substantially larger
image after compression since the orginal source is still bitonal.
Note that there is usually a corresponding .tif file to my .pdf files, these
are pre-deskew.
|