Skip to Main Content
University of Texas University of Texas Libraries

Introduction to Optical Character Recognition Workshop

This guide was created to support the Introduction to Optical Character Recognition DH workshop in Spring 2024: This workshop introduces the basics of optical character recognition (OCR), which allows for full-text searching and other types of text man


What is OCR?

OCR stands for optical character recognition, and is an automated method of creating machine-readable texts. An OCR program will interpret the text on a digital image and attempt to render it in a known alphabet. Often, you need to specify for the OCR software what language(s) are in the digital image, and sometimes you also need to help the software understand the formatting of the text in the image (for example, with columns of text, photographs, and advertisements in a newspaper). There are many notable OCR-focused projects, including Mapping TextsEighteenth Century Collections Online, and the Nusus Corpus. Learn More

You can access the slides for this presentation at this link.

You can access the recording of this workshop at this link.

Middle Eastern Studies Librarian & History Coordinator

Profile Photo
Dale J. Correa
Contact: Website


This guide was developed by Dale J. Correa, Middle Eastern Studies Librarian & History Coordinator at the UT Libraries, in collaboration with Mercedes Morris and Talya Stanke, dual degree Middle Eastern Studies and iSchool students.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.