This LibGuide contains resources for preforming analysis on non-English texts using Natural Language Processing methods, including Python scripts to clean, analyze, and visualize text. It is a companion to the GitHub repository located here.
This tutorial is intentionally simple and introductory, and aims to offer users a jumping off point for further exploration of NLP and computational tools for use in text analysis and linguistic research. Due to the varied nature of NLP and text analysis work, there is no one size fits all approach to writing code for these projects; as such, you should be prepared to write your own code when performing any kind of work for your own research. Tasks such as training models for named entity recognition were omitted from this tutorial for simplicity’s sake, but should not be skipped when working on your own project. Notes accompany places in the code where such steps could be performed to help guide you in your own work.