Skip to Main Content
University of Texas Libraries
LibGuides
UT Libraries
Natural Language Processing for Non-English Text
Corpora and Resources
Search this Guide
Search
Natural Language Processing for Non-English Text
Tutorials and sample code for preforming analysis on non-English texts using Natural Language Processing methods, including Python scripts to clean, analyze, and visualize text.
Corpora and Resources
Home
Further Reading
Corpora and Resources
Documentation and Resources
Corpora and Resources
Awesome-Chinese-NLP
Corpora, tools, and resources for NLP projects in Chinese.
Corpus for Finance (CoFIF)
Reference documents and reports from France’s 60 largest companies from 1995-2018.
Gallica
French books, academic journals, newspapers, sound recordings, and videos.
German-NLP
Corpora, tools, and resources for NLP projects in German.
Japanese Text Initative
Classical Japanese Literature.
Middle Eastern and North African Newspapers
Arabic newspapers from 1870-2019.
Project Gutenberg
eBooks in a variety of languages.
TS Corpus
Corpora, tools, and resources for NLP projects in Turkish.
Twitter API
Tweets, Direct Messages, users, and other Twitter resources are available to download and analyze.
Wikidata
Data dumps from all wikis in different languages.
This work is licensed under a
Creative Commons Attribution-NonCommercial 2.0 Generic License
.