LibGuides: Research Data Services: Picking a Repository

Considerations for repository selection

There are many considerations to consider when deciding which repository or repositories are most appropriate for your data. This page provides an overview as well as a simplified decision tree diagram for high-level considerations.

First-order considerations: Can you share data at all?

There are many instances in which data cannot or should not be shared publicly, although it may be possible to deposit them in a controlled/restricted access repository whose metadata is publicly accessible (so that other researchers are aware that the data exist but require extra steps to access). Some common scenarios:

Data may pose a risk to national security if made publicly available (see university export control guidelines);
Data may pose a risk of re-identification if made publicly available;
Data were obtained from a third party that has imposed restrictions on re-distribution (e.g., a national healthcare provider);
Data were obtained from a proprietary source (e.g., commercial businesses);
Data were obtained in collaboration with a community that asserts sovereignty over data related to the community (see CARE Principles for indigenous data sovereignty)

In some of these instances, it may be possible to publish a minimum reproducible dataset that still allows for reproducing the results of an associated study but that removes or limits attributes that make sharing of the full dataset risky. Examples include redacting personally identifying information (PII); generalizing demographic attributes (e.g., replacing exact birthday with age in years); or generalizing geographic coordinates (e.g., reporting to nearest centroid). Johns Hopkins has an excellent guide for protecting human participants' identity, and the RDS team will be developing our own LibGuide for how to publish sensitive data. Feel free to reach out to your department liaison or to Bryan Gee if you need assistance with potentially sensitive data.

Second-order considerations: funder or publisher requirements

Some funding agencies will require that grant recipients use a specific repository; often, this is a repository that the funding agency maintains. NIH is the most common example of this, but some other agencies in the natural sciences (e.g., NASA, NOAA, USGS) may also have requirements. These requirements often exist to ensure that data adhere to common metadata standards, organization, and format. Requirements should be communicated to researchers in a Notice of Award (NoA). It is relatively rare for journals or publishers to require a specific repository, but this is more common for specialist journals or for certain types of data. The repository recommendations for Scientific Data are a good example.

Third-order considerations: does a specialist repository exist for some/all of the data?

Wherever possible, data should be shared in a specialist repository; this helps to centralize content of interest to specialists and tends to facilitate more rigorous metadata and file format standards to ensure reusability. Some specialist repositories may also have specialized features in the web interface to allow users to interact with data (e.g., 3D visualizers) without needing to download it. As there are thousands of specialist repositories, we cannot provide suggestions for all disciplines or data types. Several indexes/aggregators of information about repositories exist, such as re3data.org. A library departmental liaison may also be able to point you to discipline-specific repositories. Another good place to get an idea of whether a specialist repository exists is in the data availability statements of recently published papers in your discipline (especially those in disciplinary journals) or talking directly to colleagues; if you notice that researchers seem to be using generalist repositories, institutional repositories, or other avenues for sharing data, this does not necessarily mean that no specialist repository exists.

Fourth-order considerations: will the Texas Data Repository meet your needs?

There are many benefits to using the Texas Data Repository, including that the Research Data Services team and other members of the libraries are able to provide timely, customized support. For more information on TDR, see the tab to the left or click on the above link.

If nothing else works, use a generalist repository

Generalist repositories should typically be used only when other available options are not appropriate or not viable. While generalists have many advantages, such as larger support teams and larger corpuses of data, it can be hard to find data in a repository that contains everything from MRI images of childrens' brains to transcripts of people interviewed about credit unions. They often have more limited infrastructure for interacting with data in the web browser and may be slower to respond to service/help requests.

Decision tree

The following diagram presents a generalized decision tree for selecting a repository for your research data. If you have very distinct data types (e.g., a discipline-specific/common format and a generic format like CSV), you may want to proceed through this flowchart for those data types separately. If you are have any questions about how your data should be treated, please reach out to the Research Data Services team! We are glad to help with less common cases like handling sensitive data where some but not all data can be shared or where it may be possible to decrease/eliminate sensitivity of a dataset. This diagram was inspired by Stanford University's.

Need help picking a repository?

Bryan Gee

he/him

Email Me