For a useful overview, please see the UT Libraries Digital Humanities LibGuide
UTL has licensed access to :
Corpus of Historical American English (COHA) and Global Web-based English (GloWbe) Additional information.
The Caselaw Access Project, making 360 years of case law freely available online as a machine-readable text corpus, digitized from the collections of the Harvard Law School Library. Here are some ways to access the data, including:
Here are some of the ways people have been using the data
Chinese-English Parallel Corpora: TranslateFX researchers and linguists have developed these corpora, comprised of aligned sentence pairs from quality bilingual texts, covering the financial and legal domains in Hong Kong.
Digitized Archives from Digital Libraries - from UIUC
Digital Humanities Libguide-Datasets: humanities, social sciences, and government datasets.
Social Media corpora:
reddit APIs: Access data from posts, threads, comments, and users from reddit and subreddits.
Social Sciences Data Libguide:subscription and free social science data resources
Twitter Streaming APIs: public streams provide public access to public data flowing through Twitter.
Yelp Fusion API: Access to business data, including location, photos, Yelp rating, price, hours, types of transactions.
This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 Generic License.