Scanning tools and software

Panasonic KV-SS55EX duplex grayscale 300dpi scanner
Contex FSS 8300 COPY DSP large-format (E-size) scanner (ist gut, ja?)
Contex WIDEimage imaging software
Adobe Capture 3.0
Adobe Acrobat (including Adobe Catalog)
libtiff v3.5.6
ImageMagick (and Perl Magick)
Kodak Imaging for Windows 
TMSSequoia ScanFix 4.2


Notes

Skew

Document skew is a problem for OCR software, it substantially degrades the recognition.  Although one source of skew is misaligned feeding from a scanner, the most prevalent source is misalignment is skew already present in the original text from the printing process used. 

Adobe Acrobat contains it's own skew detection and correction engine.  It's not very good, at least at the correction.  If you see a word that has a strange offset split it in, that what it's from.  Skew correction isn't as simple as rotating the text to un-skew it, in a bitonal image this results in character distortion that interferes with recognition.  There are various techniques around this, unfortunately the one Capture uses isn't very good.  The deskew engine in scanfix is better, and I'll be a running a few tests using it, but I don't have a way to disable the internal Capture deskew so there may be some undesirable interaction.

My long term plan for deskew (unless Capture 4.0 is substantially better) is to convert all of the images to grayscale, detect the skew externally and then use true rotation to deskew them.  Characters aren't deformed in grayscale since pixels can now contain intermediate values instead of just on or off.  This grayscale conversion doesn't result in a substantially larger image after compression since the orginal source is still bitonal.  

Note that there is usually a corresponding .tif file to my .pdf files, these are pre-deskew.

 

I buy LINC, PDP-8, PDP-12 and PDP-15 parts, systems and documentation. Write me at nabil@pdp-8.org.