The process of copying text from one medium to another, and in the Scanner Studio it means copying text from digitized images into a machine-readable format (like .docx). This method of creating machine-readable texts is the most manual: users must type what they read into a new document, usually following standard conventions of transcription.
Optical Character Recognition, or OCR, is an automated method of creating machine-readable texts. An OCR program will interpret the text on a digital image and attempt to render it in a known alphabet. Often, you need to specify for the OCR software what language(s) are in the digital image, and sometimes you also need to help the software understand the formatting of the text in the image (for example, with columns of text, photographs, and advertisements in a newspaper).
Handwritten Text Recognition, or HTR, is an automated method of creating machine-readable texts from handwritten sources and is much like OCR above. An HTR program renders handwritten text into machine-readable, "print" text, which is not only easier for researchers to read, but also for them to manipulate and explore with digital methods.
NLP stands for Natural Language Processing. These are methods for how to program computers to process and analyze large amounts of language data, such as digital images of texts that have been OCR'd or HTR'd. The digital images have been rendered machine-readable, so they are a ready for a computer to try to understand them. NLP software helps researchers explore their questions about context, organization, and language use.
We recommend you to consider these helpful tools based on the type of project you're working on (transcription, OCR/HTR, NLP, Text Analysis, or other methods).
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.