Editor's review
This is a command line application that lets you convert scanned PDF documents into editable text through OCR.
VeryPDF PDF to TXT OCR Converter is a Command Line application. It uses Optical Character Recognition technology to convert PDF documents to editable TXT files. There is no need for Adobe Acrobat software. A range of image formats including TIFF, BMP, PNG, JPG, PCX, TGA, etc. are supported. It is possible to specify a single page, a range of pages or even the complete document. The tool can also handle several other languages besides English. These include German, French, Spanish, Italian and others. You can handle encrypted and password protected PDF files also quite easily. The original layout available in the source document is maintained after conversion. The quality of the OCR conversion process depends largely on the quality of the scanned image and the clarity of the characters of that image.
Thus some amount of image preprocessing is essential before submitting to the recognition process. De-speckling and de-skewing are essential processes that need to be done. General enhancement of contrast and brightness goes a long way to improve the recognition rate. This is significant as even at 5% failure the amount of editing that`ll be required builds up substantially when the document is large in volume. Some filters also may be effective, particularly the edge enhancement types. These additional processing will call for a suitable editor and you need to keep that in mind when planning your workflow. This is a handy tool if you need to carry out large amounts of character recognition often.
User comments