Skip to Main Content
University of Texas University of Texas Libraries

Data Management

This guide is to help you prepare a data management plan.

Data Collection & Analysis

Data Collection & Analysis

Creating data from scratch

  1. Test your plan
    • ​​Think through your data collection strategy from start to finish and consider a pilot run. This will highlight any issues with your tools or instruments and help ensure that you can process any data you produce.
  2. Automate
    • ​​Avoid unnecessary data entry later on by using built-in features in your capture devices to document as you go. Just make sure you understand and keep track of any preprocessing that might be happening behind the scenes.
  3. Create snapshots
    • Be sure to keep secure and backed-up copies of your data in their rawest form (prior to cleaning or processing). You may also want to save snapshots of your datasets at various stages of processing.
  4. Ensure compliance
    • ​​Make sure that your project complies with all applicable laws, regulations, and UT policies.

Re-using data

  1. Know your source 
    • ​​Find a repository of data relevant to your discipline
    • re3data.org, a global registry of data repositories, can help you locate subject-specific data sources and determine whether they are appropriate for you
  2. Cite data sources
    • ​​Citing data sources is just as important as citing journal articles, books, or other resources you make use of to produce your research. It allows researchers to locate and repurpose data, promotes reproducibility, and allows you to give and get credit for data products: increasingly viewed as scholarly output in their own right. 
  3. Ask for guidance
    • ​​Find your subject specialist for help locating existing data sets relevant to your research question.

Keep your data secure

Save your raw data. This allows you to start over if something goes wrong, or to re-analyze the same dataset testing different variables or protocols.

  • Consider saving snapshots of your data at a number of different stages (e.g., raw, cleaned up, subsetted).
  • Distinguish between these datasets in the file names and/or documentation.

Control versions

  • Keeping track of file versions can be done via consistently applied naming conventions.
  • In projects that involve code or software development where there are frequent edits or multiple contributors, consider using a more elaborate version control system. Git is a popular choice, but your research community or lab may have a preferred environment.

Back things up

  • Maintaining working copies of your data requires thoughtful consideration of hardware, redundant storage locations, and a disaster plan.

LOCKSS

  • Lots of Copies Keeps Stuff Safe is a helpful acronym to remember.
  • The more copies of your data, the better...as long as they’re not all in the same place.

Test your system frequently to make sure it’s working.

  • Use the 3-2-1 backup rule as rule of thumb: 3 copies, on 2 different types of storage media, 1 off-site.

Document your steps

  • This can mean taking good notes, saving log files, or capturing your every step in an electronic lab book. Be sure to keep a copy together with any data or code you produce so that you can follow your trail later on.
  • Scan paper notebooks. Include any pre-processing or data-cleaning steps to ensure reproducibility.

Tools

Surveys

File Naming

Cleaning & Organizing Data

Metadata

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 Generic License.