Skip to Main Content
University of Texas University of Texas Libraries

LAS 384 Proseminar- Digital Humanities

Data Organization

Question

What type of Data, Archive, or Collections are you working with?

Answer the question via Slido.

Data Organization

There’s no one way to organize your data, but a consistent and descriptive file structure can save you time and money later. Use a system that makes sense to you so that keeping things in order becomes a habit instead of a chore.

Why bother?

  • Save money. Well-organized, easy-to-find files make a big impact on the efficiency of your research. This is especially important if you are sharing active data with collaborators.
  • Increase your impact. In the long run, well-managed data are more discoverable, accessible, and reusable (including by your future self!). This will help increase the visibility and impact of your work.
  • Demonstrate integrity. Discoverable, accessible, and reusable data are fundamental to ensuring reproducible or replicable research.
  • It’s required. Most funding agencies now require a data management plan for grant proposals. Publishers increasingly insist on data sharing as a condition of publishing your work. Organized, reusable data help demonstrate compliance.

File Formats

Choosing file formats carefully helps avoid obsolescence. Use formats that are:

  • Non-proprietary, open, documented standards (e.g., .tif, .txt, .csv, .pdf)
  • Encoded with standard characters (e.g., ASCII, UTF-8)
  • Used commonly in your research community
  • Unencrypted
  • Uncompressed

Examples of open formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF, WMV
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

For more information on recommended file formats, go here: Recommended Formats Statement

File Naming

Adopt a naming convention and use it throughout a project (or throughout your career). Consider including a README.txt file that explains your naming convention and any codes or abbreviations you use. File names should:

  • Describe the contents of the file, but not be overly long. Avoid generic names (like draft.doc; final2.xls) that can be hard to decipher and easily overwritten.
  • Include dates. Don’t rely on system dates, which can be misleading. Recommended formats look like: YYYYMMDD or YYYY-MM-DD.
  • Reserve 3-letter file extensions for application-specific codes (e.g., .jpg, .mov, .tif).
  • Not contain special characters like "/ \ : * ? " < > [ ] & $. These have meaning in software and operating systems and can cause trouble.
  • Not contain spaces. These are problematic for some operating systems. Use underscores (file_name), dashes (file-name), or camel case (FileName) instead.
  • Be consistent.
  • Example Project_instrument_location_date_time_version.ext

Organization Tools

Tools for Renaming Files

Bulk Rename Utility

A free open-source tool that can rename files in bulk. This tool only works on Windows. 

Bulk Rename Utility Tutorial

Renaming Function on Mac

Mac computers offer a feature that allows you to rename your files by right-clicking on a file, and you can also rename multiple files. 

Renamer - Batch File Renamer for Mac

A free open-source tool that can rename files in bulk. This tool only works on Macs. 

Getting Started


Tools for Cleaning Data

OpenRefine

OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another.

Introduction to OpenRefine

Python
programming language useful for data wrangling

Data Wrangling with Python


Tools for Organizing Data

Zotero

Zotero is a free option for citation management.

Getting Started

Tropy

Tropy is free open-source software that allows you to organize and describe photographs of research material.

How to Use Tropy Guide

Recogito 

Recogito is an annotation tool for texts and images 

10 Minute Tutorial

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.