Skip to Main Content
University of Texas University of Texas Libraries

Digital Humanities Workshops @PCL

Schedule and course content from Digital Humanities Workshops @PCL series

Fall 2019 Workshops

Fall 2019 Workshop Schedule

Wednesdays from 1:00-2:30pm

PCL Learning Lab 3, unless otherwise noted

9/11- Intro to Scraping and Cleaning Structured Data

In this workshop, participants will learn how to transform structured data from websites into spreadsheet format using (freeware platform). Since technology is not always perfect, participants will also learn the basics for using OpenRefine (open-source tool) to "clean" and structure imperfectly extracted data.

Instructor: Albert A. Palacios

This workshop is presented by the LLILAS Benson Digital Scholarship Series and Libraries Digital Humanities Workshop Series.​

Workshop Instructions

9/18- Cleaning Unstructured Data

This workshop serves as an introduction to unstructured data and the use of regular expressions for cleaning data. We will cover the basic concepts of regex characters and how to write patterns for simple matches. 

9/25- Organizing Images with Tropy

Tropy is free, open-source software that allows you to organize and describe photographs of research material. In this workshop you will learn the basics of using Tropy to organize your research photos. We will also discuss how you can apply descriptive metadata to your images for information retrieval and to aid in citations.

10/2- Optical Character Recognition (OCR) for Non-Roman Texts

Attendees will be introduced to the basics of optical character recognition (OCR)––which allows for full-text searching and other types of text manipulation of a digitized document––with a particular focus on OCR for materials in languages other than English, and in scripts other than Roman/Latin. OCR is fairly commonplace for English and Roman-script languages like French or Spanish, but it does not work so seamlessly for languages such as Arabic, Hindi, or Chinese. This workshop will be an opportunity to explore an open source OCR tool (Kraken) that has demonstrated success with some non-Roman scripts. The workshop will look at a few different non-Roman scripts; however, participants are encouraged to bring a digitized, highly legible sample text of interest.

This workshop series is presented by the LLILAS Benson Digital Scholarship Series and Digital Humanities Workshops @PCL Series.​


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.