Saturday, October 3, 2009

Perform OCR with Google Docs – Turn Images Into Editable Documents

Google Docs can now perform OCR on digital images. You can upload an image containing typewritten or printed text (like a fax document or a scanned newspaper clipping) to your Google Docs account and it will turn that image into editable text.
In the following example, Google Docs successfully extracted all the text from a scanned book page and converted it into an editable document.
google docs ocr
The OCR feature in Google Docs is not part of the standard UI yet but you can use this sample form to upload scanned images to your Google Account and the server will automatically try to extract text from these images provided the image resolution is good and that the text inside images is written using Latin character sets.
The OCR feature can also extract text from noisy images as well (like this WSJ clipping) though the recognized text is not very accurate and the document formatting is lost (see conversion results).
If you are a developer, you can add the ocr=true parameter to your upload request and Google Docs will automatically scan that image for text patterns. You can also upload images to Google Docs without the OCR parameter but in that case, the image will be converted into a new Word document sans OCR.
Like Google Docs, Google Search too includes OCR features but the difference is that while Google Docs can extract text from images, the OCR in Google Search works only with scanned PDF files.

(Courtesy: Digital Inspiration)

No comments:

Post a Comment