Skip to Main Content
University of Texas University of Texas Libraries

Natural Language Processing for Non-English Text

Tutorials and sample code for preforming analysis on non-English texts using Natural Language Processing methods, including Python scripts to clean, analyze, and visualize text.

Further Reading

Further Reading

Agarwal, Vaibhav, and Parteek Kumar. 2018. “UNLization of Punjabi Text for Natural Language Processing Applications.” Sadhana 43 (6): 1–1.

Azmi, Aqil M., Abdulaziz O. Al-Qabbany, and Amir Hussain. 2019. “Computational and Natural Language Processing Based Studies of Hadith Literature: A Survey.” The Artificial Intelligence Review; Dordrecht 52 (2): 1369–1414.

Gotti, Fabrizio, and Philippe Langlais. 2018. “From French Wikipedia to Erudit: A Test Case for Cross‐domain Open Information Extraction.” Computational Intelligence 34 (2): 420–39.

Jaśkiewicz, Grzegorz Jan. 2013. “Geolocalization of XIX Century Villages and Cities Mentioned in Geographical Dictionary of the Kingdom of Poland.” Computer Science 14 (3): 423.

Liu, Honglei, Yan Xu, Zhiqiang Zhang, Ni Wang, Yanqun Huang, Yanjun Hu, Zhenghan Yang, Rui Jiang, and Hui Chen. 2020. “A Natural Language Processing Pipeline of Chinese Free-Text Radiology Reports for Liver Cancer Diagnosis.” IEEE Access 8: 159110–19.

Moncla, Ludovic, Mauro Gaio, Thierry Joliveau, Yves-François Le Lay, Noémie Boeglin, and Pierre-Olivier Mazagol. 2019. “Mapping Urban Fingerprints of Odonyms Automatically Extracted from French Novels.” International Journal of Geographical Information Science 33 (12): 2477–97.

Onyenwe, Ikechukwu E, Mark Hepple, Uchechukwu Chinedu, and Ignatius Ezeani. 2018. “A Basic Language Resource Kit Implementation for the Igbo NLP Project.” ACM Transactions on Asian and Low-Resource Language Information Processing 17 (2): 1–23.

Oramas, Sergio, Luis Espinosa-Anke, Francisco Gómez, and Xavier Serra. 2018. “Natural Language Processing for Music Knowledge Discovery.” Journal of New Music Research 47 (4): 365–82.

Plaza-del-Arco, Flor Miriam, M. Dolores Molina-González, L. Alfonso Ureña-López, and M. Teresa Martín-Valdivia. 2021. “Comparing Pre-Trained Language Models for Spanish Hate Speech Detection.” Expert Systems with Applications 166 (March): 114120.

Schulz, Sarah, and Nora Ketschik. 2019. “From 0 to 10 Million Annotated Words: Part-of-Speech Tagging for Middle High German.” Language Resources and Evaluation 53 (4): 837–63.

Shrestha, Hewan, Chandramohan Dhasarathan, Shanmugam Munisamy, and Amudhavel Jayavel. 2020. “Natural Language Processing Based Sentimental Analysis of Hindi (SAH) Script an Optimization Approach.” International Journal of Speech Technology, July.

Spaiser, Viktoria, Thomas Chadefaux, Karsten Donnay, Fabian Russmann, and Dirk Helbing. 2017. “Communication Power Struggles on Social Media: A Case Study of the 2011–12 Russian Protests.” Journal of Information Technology & Politics 14 (2): 132–53.

Wevers, Melvin, and Jesper Verhoef. 2018. “Coca-Cola: An Icon of the American Way of Life. An Iterative Text Mining Workflow for Analyzing Advertisements in Dutch Twentieth-Century Newspapers.” Digital Humanities Quarterly 011 (4).

William, Andika, and Yunita Sari. 2020. “CLICK-ID: A Novel Dataset for Indonesian Clickbait Headlines.” Data in Brief 32 (August).

Yu, Liang-Chih, Wei-Cheng He, Wei-Nan Chien, and Yuen-Hsien Tseng. 2013. “Identification of Code-Switched Sentences and Words Using Language Modeling Approaches.” Mathematical Problems in Engineering 2013: 1–7.

Search UT's Collections

Find articles, books, media, and more in one search.

Advanced Search

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.