Skip to main content
University of Texas University of Texas Libraries

Data & Donuts

Managing Research Data: A Guide to Good Practice

Managing Research Data: A Guide to Good Practice

September 14, 2018

Jessica Trelogan, Research Data Services, UT Libraries

Dealing with the mountains of digital data that can accumulate in the course of a research project can seem like a daunting process, especially if your work is collaborative or stretches over several years. Adopting a few key good habits early on can save you huge amounts of time, money, and frustration searching for things and recovering lost files. In this 1.5-hour workshop, you will get a general introduction to core data management concepts, practical tips for things like backups and file formats, and a wealth of information about tools and resources available to UT faculty, staff, and students. 

Presentation slides are here. 

Tools and Resources for UT Researchers

File Formats

Choosing file formats carefully helps avoid obsolescence. Use formats that are:

  • Non-proprietary, open, documented standards (e.g., .tif, .txt, .csv, .pdf)

  • Encoded with standard characters (e.g., ASCII, UTF-8)

  • Used commonly in your research community

File Naming

Adopt a naming convention and use it throughout a project (or throughout your career). 

  • Describe the contents of the file, but not be overly long. Avoid generic names (like draft.doc; final2.xls) that can be hard to decipher and easily overwritten.

  • Include dates. Don’t rely on system dates, which can be misleading. Recommended formats look like: YYYYMMDD or YYYY-MM-DD.

  • Reserve 3-letter file extensions for application-specific codes (e.g., .jpg, .mov, .tif).

  • Don't use special characters like "/ \ : * ? " < > [ ] & $. These have meaning in software and operating systems and can cause trouble.

  • Try to avoid spaces too. These are problematic for some operating systems. Use underscores (file_name), dashes (file-name), or camel case (FileName) instead.

  • Don’t re-name your files one at a time! Use a free batch-renaming tool:

Data Clean-up

This can be the biggest time suck of all. Don't forget to budget time and use reproducible steps for making your data tidy. 

  • OpenRefine is a powerful tool for bulk edits of messy data. 
  • Try tidyr if you're working in R

 

Loading ...

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 Generic License.