University of Texas Libraries

OCR Tools

An overview of a variety of OCR tools. Includes sample scripts.

Introduction to OCR

What is OCR?

Optical character recognition (OCR) is the electronic conversion of images of text into machine-encoded text, whether from a scanned document, a photo of a document, or another type of photo that includes text (for example, the subtitles in a film still).

Why use it?

OCR is a common and useful method of digitizing texts so that they can be electronically edited, searched, and manipulated. OCR-ed texts can be used in to aid work in cognitive computingmachine translation, text-to-speech, and text mining

Performing OCR on a text can be an important step in digitizing the document for use in your research, or in making the contents of the document widely accessible to others. By performing OCR, the contents of the document become digitally available in a searchable, easily manipulated format, and can be used for a variety of scholarly purposes.

