Skip to Main Content
University of Texas University of Texas Libraries

Digital Humanities Tools and Resources

Use this guide to learn about the field of Digital Humanities, software tools for humanist research, and resources to get started on new projects.

Text Analysis & Data Mining

Introduction

Computational text analysis tools allow scholars to read bodies of text in new ways by using machine learning to pick up on word frequency patterns in texts. This process, often called “distant reading” or “topic modeling,” can complement traditional close reading by providing insights into patterns such as word usage, psychological tendencies, and language commonly associated with historical events, etc. that a human might not notice without the assistance of computational tools. To explore the capabilities of text analysis programs, please refer to the list of commonly used tools below. 

Tools

 

  • Voyant
    A web-based, dashboard-style tool allows users to upload a corpora and visualize patterns in various ways. For instance, users can experiment with colorful word clusters representing word frequency and visualize how specific words and phrases appear across texts in line graphs.

Resources: Voyant - Getting Started

Resources: Mallet Tutorial PDFMallet Tutorial Video

  • Mallet
    A machine learning software program used through the command line with Python. Though installing and running requires some technical skill, it can produce powerful results by generating “topics” or lists of words frequently appearing together in corpora.

Resources: Mallet Tutorial PDFMallet Tutorial Video

 

Allows users to analyze human language data using classification, tokenizing, tagging, and more. Though it requires some technical expertise, it can be helpful for teaching and analysis.

Resources: NLTK Examples

  • Gephi
    Allows users to make colorful graphs and networks from textual data by revealing links between textual objects, social network patterns, and more. Gephi is easy to use and popular among humanists and social scientists alike.

Resources: Gephi Quick Start

  • Beautiful Soup
    A web-scraping tool that uses Python to pull data from HTML and XML files. It allows you to navigate and search a document using a parse tree.

Resources: Web scraping and parsing with Beautiful Soup 4 Introduction, Beautiful Soup Documentation

  • TAPoR Portal
    Serves as a portal to find curated lists of text analysis tools commonly used in the field. Examples of lists include easy-to-use and developing tools that offer exciting new possibilities.

Resources:  TAPoR Tutorial

  • R
    A Programming language commonly used by humanists to do statistical analysis and create visualizations.

Resources: R Manuals, R Programming for Absolute Beginners

  • Python
    A high-level programming language commonly used in text analysis tools in the digital humanities.

Resources: Python Beginners Guide for Non-Program

Readings

Argamon, S., & Olsen, M. (2009). Words, patterns, and documents: Experiments in machine learning and text analysis. Digital Humanities Quarterly, 3(1). Retrieved from http://digitalhumanities.org:8081/dhq/vol/3/2/000041/000041.html

Brett, M. (2012). Topic modeling: A basic introduction. Journal of the Digital Humanities, 2(1). Retrieved from http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/

Chen, Ho, S.-Y., & Chang, C. (2023). A hierarchical topic analysis tool to facilitate digital humanities research. Aslib Journal of Information  Management, 75(1), 1–19. https://doi.org/10.1108/AJIM-11-2021-0325 

El-Hjj, Zamani, M., B Büttner., Martinetz, J., Eberle, O., Shlomi, N., Siebold, A., Montavon, G., Müller,    K.-R., Kantz, H., & Valleriani, M. (2022). An Ever-Expanding Humanities Knowledge Graph: The Sphaera Corpus at the Intersection of Humanities, Data Management, and Machine Learning. Datenbank-Spektrum : Zeitschrift Für Datenbanktechnologie : Organ Der Fachgruppe Datenbanken Der Gesellschaft Für Informatik e.V, 22(2), 153–162. https://doi.org/10.1007/s13222-022-00414-1

Elliot, T., & Gillies, S. (2009). Digital geography and classics. Digital Humanities Quarterly, (3)1. Retrieved from http://www.digitalhumanities.org/dhq/vol/3/1/000031/000031.html 

Gregory, I., Donaldson, C., Murrieta-Flores, P., & Rayson, P. (2015). Geoparsing, GIS, and textual analysis: Current developments in spatial humanities research. International Journal Of Humanities & Arts Computing: A Journal Of Digital Humanities, 9(1), 1-14.

Sharma, Kumar, S., & Sharma, A. (2021). Literature and Cultural Studies Through Data Mining. ICFAI Journal of English Studies, 16(4), 119–125.

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.