Skip to Main Content
University of Texas University of Texas Libraries

Scan Tech Studio (STS)

This guide provides orienting information and tutorials for the Digitization and Text Recognition Hub in the PCL Scholars Lab.

Research Data Management & Preservation

Make Organization a Habit

There’s no one way to organize your data, but a consistent and descriptive file structure can save you time and money later. Use a system that makes sense to you so that keeping things in order becomes a habit instead of a chore.

Why bother?

  • Save money. Well-organized, easy-to-find files make a big impact on the efficiency of your research. This is especially important if you are sharing active data with collaborators.
  • Increase your impact. In the long run, well-managed data are more discoverable, accessible, and reusable (including by your future self!). This will help increase the visibility and impact of your work.
  • Demonstrate integrity. Discoverable, accessible, and reusable data are fundamental to ensuring reproducible or replicable research.
  • It’s required. Most funding agencies now require a data management plan for grant proposals. Publishers increasingly insist on data sharing as a condition of publishing your work. Organized, reusable data help demonstrate compliance.

File Formats

Choosing file formats carefully helps avoid obsolescence. Use formats that are:

  • Non-proprietary, open, documented standards (e.g., .tif, .txt, .csv, .pdf)
  • Encoded with standard characters (e.g., ASCII, UTF-8)
  • Used commonly in your research community
  • Unencrypted
  • Uncompressed

Examples of open formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF, WMV
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

For more information on recommended file formats, go here: Recommended Formats Statement

File Naming

Adopt a naming convention and use it throughout a project (or throughout your career). Consider including a README.txt file that explains your naming convention and any codes or abbreviations you use. File names should:

  • Describe the contents of the file, but not be overly long. Avoid generic names (like draft.doc; final2.xls) that can be hard to decipher and easily overwritten.
  • Include dates. Don’t rely on system dates, which can be misleading. Recommended formats look like: YYYYMMDD or YYYY-MM-DD.
  • Reserve 3-letter file extensions for application-specific codes (e.g., .jpg, .mov, .tif).
  • Not contain special characters like "/ \ : * ? " < > [ ] & $. These have meaning in software and operating systems and can cause trouble.
  • Not contain spaces. These are problematic for some operating systems. Use underscores (file_name), dashes (file-name), or camel case (FileName) instead.
  • Be consistent.
  • Example Project_instrument_location_date_time_version.ext

Tools and Resources

Don’t name your files one at a time! Use a free batch-renaming tool:

  • OpenRefine is a powerful, free and open source tool for cleaning messy data.
  • Video tutorial for using OpenRefine: http://openrefine.org/

Make an Appointment

Please contact us for assistance with your project using this form.

Data Curation Tools

  • Humanities Data Curation Checklist
    Created by Adriana Cásarez, Spring 2020. A checklist guide for humanities researchers and liaison librarians on key considerations for making their data findable, accessible and clear to interested scholars and institutions.

  • Data Curation in the Texas Data Repository
    Created by Brenna Wheeler, Spring 2020 Capstone. A data curation workflow to improve the findability and reusability of datasets, localized to the Texas Data Repository.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.