How to Convert PDF to OCR
When a hard-copy document is scanned and saved into PDF format, a computer does not know the difference between your scanned page of text and a photograph. Thus, you cannot search for or select any text on the page to copy and paste. If you want to search or select text, you must run optical character recognition (OCR) on the document. Adobe Acrobat Professional provides this functionality, but the free version of Adobe Acrobat does not. If you do not have Acrobat Professional, please note that software other than Acrobat Professional exists to run OCR on a PDF document, and can be found by searching the web.
Things You'll Need
- Adobe Acrobat Professional
Run Optical Character Recognition (OCR) on a PDF Document
Load Adobe Acrobat Professional. The OCR feature of Acrobat Professional is not available through the web browser plug-in, so loading the actual program is necessary.
Load a PDF document with text that you cannot select to copy and paste. Such documents usually are produced by scanning a document and saving the document in Adobe Acrobat PDF format. (See Resources for a sample document, if you wish to practice with one.)
Run OCR on the document. In Adobe Acrobat Professional, click the "Document" menu, then select "OCR Text Recognition" and then click "Recognize Text Using OCR."
Choose the applicable OCR options. Once you click "Recognize Text Using OCR," a new window will pop up asking you to select the page range on which you want to run OCR. You can run OCR on the entire PDF file, or you may restrict the OCR recognition to only a few pages. Once you choose how many pages on which you want to run OCR, click "OK." Acrobat Professional will now begin to recognize the text in the pages of your document.
Search for text, once OCR is complete, and copy and paste text just as you could with a PDF distilled from Microsoft Word. Note, however, that the OCR technology is not perfect. OCR may not properly recognize certain words and may miss some text entirely. OCR works best with perfectly clear images of the text, something that is not always possible with scanned documents.