Skip to Main Content
University of Texas University of Texas Libraries

Scan Tech Studio

This guide provides orienting information and tutorials for the Digitization and Text Recognition Hub in the PCL Scholars Lab.

Research Data Management & Preservation

Make Organization a Habit

There’s no one way to organize your data, but a consistent and descriptive file structure can save you time and money later. Use a system that makes sense to you so that keeping things in order becomes a habit instead of a chore.

Why bother?

  • Save money. Well-organized, easy-to-find files make a big impact on the efficiency of your research. This is especially important if you are sharing active data with collaborators.
  • Increase your impact. In the long run, well-managed data are more discoverable, accessible, and reusable (including by your future self!). This will help increase the visibility and impact of your work.
  • Demonstrate integrity. Discoverable, accessible, and reusable data are fundamental to ensuring reproducible or replicable research.
  • It’s required. Most funding agencies now require a data management plan for grant proposals. Publishers increasingly insist on data sharing as a condition of publishing your work. Organized, reusable data help demonstrate compliance.

File Formats

Choosing file formats carefully helps avoid obsolescence. Use formats that are:

  • Non-proprietary, open, documented standards (e.g., .tif, .txt, .csv, .pdf)
  • Encoded with standard characters (e.g., ASCII, UTF-8)
  • Used commonly in your research community
  • Unencrypted
  • Uncompressed

Examples of open formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF, WMV
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

For more information on recommended file formats, go here: Recommended Formats Statement

File Naming

Adopt a naming convention and use it throughout a project (or throughout your career). Consider including a README.txt file that explains your naming convention and any codes or abbreviations you use. File names should:

  • Describe the contents of the file, but not be overly long. Avoid generic names (like draft.doc; final2.xls) that can be hard to decipher and easily overwritten.
  • Include dates. Don’t rely on system dates, which can be misleading. Recommended formats look like: YYYYMMDD or YYYY-MM-DD.
  • Reserve 3-letter file extensions for application-specific codes (e.g., .jpg, .mov, .tif).
  • Not contain special characters like "/ \ : * ? " < > [ ] & $. These have meaning in software and operating systems and can cause trouble.
  • Not contain spaces. These are problematic for some operating systems. Use underscores (file_name), dashes (file-name), or camel case (FileName) instead.
  • Be consistent.
  • Example Project_instrument_location_date_time_version.ext

Tools and Resources

Don’t name your files one at a time! Use a free batch-renaming tool:

  • OpenRefine is a powerful, free and open source tool for cleaning messy data.
  • Video tutorial for using OpenRefine:
  • Information Technology Services (ITS) offers virtual machine hosting for storing and managing data collections, as well as common good services like email and encryption, data security, and network access.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.