Skip to Main Content
University of Texas University of Texas Libraries

Data Management

This guide is to help you prepare a data management plan.

Data Preservation

Selecting Data for Preservation & Sharing

Over the course of a research project, a researcher may generate an enormous amount of data. While it's important to preserve the data, it is near impossible to aim to preserve all of the data. Instead, select data for preservation. 

The selection process can be aided by asking questions of the data. 

  • Are the data unique or difficult to replicate?
  • Is there value in the potential reuse of the data?  Consider cases within your field and in other academic disciplines.
  • Is there a publication requirement to preserve or make available specific data? 
  • Is there a norm in your field or with specific types of data for preservation?
  • Is there value for scientific reproducibility and confirmation of findings? 
  • Is the data of potential historical value? 
  • Does the data contain sensitive information that should not be published?

It is possible that your data is appropriate for preservation, but in order to share the data publicly, you need to redact information. Some researchers chose to create two copies of the data one to be redacted and shared publicly and the other to retain for internal preservation depending on the sensitive nature of some of the data. Some examples are consent forms, DNA data, personally identifying information, etc. 

The above questions aim to begin to outline some of the considerations. For more full considerations please see the following checklist. 

Assessing External Requirements Yes No Unsure
Are there funder requirements?      
Are there repository requirements (stability, reliability, usage by others, security, appropriate, terms, license)?      
Are there publisher requirements?      
Are there any disciplinary expectations/norms?      
Are there institutional requirements?      
Are there any legal requirements?      
Do any of the above include a retention schedule?      
Assessing Scientific Value and Reuse Potential Yes No Unsure
Does this data enable other researchers to verify or reproduce your published findings?      
Is your data original?      
Is the data available elsewhere?      
Could the data have value for future research?      
Could you or your colleagues use this data again for future outputs?      
Ethical Considerations Yes No Unsure
Do you have permission to share the data from all stakeholders (collaborators, study participants, etc.)?      
Can the data be anonymized?      
Are there any copyright or license restrictions on any part of the data?      
Practical Considerations Yes No Unsure
Are there storage costs?      
Is appropriate metadata and documentation in order?      
Is there any risk in preserving or sharing this data?
(i.e. could this data be used to target a particular population or community?) 
     

UT Libraries, CC License

Importance of Description

The ten-year reproducibility challenge is a practice in which researchers try to run code that was written ten years ago. Often this is difficult not due to changes in the hardware available but due to a lack of description. While the challenge refers to written code, it can easily be transposed to data as well. Description of data and what variables mean is often an afterthought, but years later it can be difficult to reconstruct the intended meaning.  Now imagine trying this process with someone else's data from ten years ago. Whether you are thinking about publishing your data openly so others can view, analyze, and reuse or preserving internally to meet standards of preservation, the description should be present and thorough enough so that a third party can understand the data. 

What Data to Share

It is important to think through what data you will preserve and what data you will share. These won't always be the same. You'll want to consider:

  • What data are important to keep?
  • What potential value might your data have for yourself or others?
  • Is there sensitive or confidential information included in your data?
  • Are there any ethical issues to consider around sharing your data?
  • Is your work reproducible without the data?

 

What Data to Save

It is important to be selective about what data you plan to retain, as every file requires some measure of overhead in terms of storage and maintenance for the long term. It’s a good idea to:
  • keep anything irreproducible, such as observations specific to a particular time and place,
  • retain results that are tied to a specific publication or presentation,
  • discard intermediate tests or failed experiments at the end of a project.

It is important to be selective about what data you plan to retain, as every file requires some measure of overhead in terms of storage and maintenance for the long term. It’s a good idea to:

  • keep anything irreproducible, such as observations specific to a particular time and place,
  • retain results that are tied to a specific publication or presentation,
  • discard intermediate tests or failed experiments at the end of a project.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 Generic License.