Skip to Main Content
University of Texas University of Texas Libraries

Natural Language Processing for Non-English Text

Tutorials and sample code for preforming analysis on non-English texts using Natural Language Processing methods, including Python scripts to clean, analyze, and visualize text.

Home

Welcome!

This LibGuide contains resources for preforming analysis on non-English texts using Natural Language Processing methods, including Python scripts to clean, analyze, and visualize text. It is a companion to the GitHub repository located here.

This tutorial is intentionally simple and introductory, and aims to offer users a jumping off point for further exploration of NLP and computational tools for use in text analysis and linguistic research. Due to the varied nature of NLP and text analysis work, there is no one size fits all approach to writing code for these projects; as such, you should be prepared to write your own code when performing any kind of work for your own research. Tasks such as training models for named entity recognition were omitted from this tutorial for simplicity’s sake, but should not be skipped when working on your own project. Notes accompany places in the code where such steps could be performed to help guide you in your own work.

What is Natural Language Processing?

Natural Language Processing (NLP) is a type of Artificial Intelligence. The goal of NLP is to make human, or natural, language understandable to a computer. This is achieved using syntactic and semantic analysis techniques. Though these algorithms are very sophisticated, they do not always yield perfect results. Human languages each have different conventions, and it can be difficult to make these rules machine-understandable. Despite these challenges, the use of NLP in textual analysis can uncover important and interesting information and insights.

Created by

This LibGuide was created by Madeline Goebel as part of the UT Libraries' Global Studies Digital Projects Graduate Research Assistantship.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.