Skip to Main Content
University of Texas University of Texas Libraries

Research Data Services

replacement for my website

Where to Share Data

What is a repository?

Repository (noun): a place, room, or container where something is deposited or stored (Merriam Webster). 

Repositories are not exclusive to research or research data, nor do they have to be digital entities. Some of the best-known repositories are physical entities like museums and libraries. In the context of research data, repositories are platforms with policies and infrastructure that are designed for the long-term preservation and accessibility of data. Data often are, but do not need to be, associated with a specific article, book, or similar output. Although repositories vary widely, some common core attributes include:

  • Use of persistent identifiers (PIDs) that point to a landing page that is perpetually accessible, even if a dataset is no longer available for some reason. DOIs are the most commonly known PID, but there are other examples like ARKs and handles.
  • Commitment to long-term sustainability that includes the technical infrastructure to ensure security of data and to establish back-ups, a sustainable business model, and a contingency plan for sunset-ing the repository.
  • Commitment to free access, consistent with legal and ethical standards.
  • Clear policies for users on depositing data, accessing data, and reusing data.

The White House Office of Science and Technology Policy (OSTP) produced a document titled "Desirable Characteristics of Data Repositories for Federally Funded Research" with additional details.

The term archive is sometimes also used with respect to data sharing (either as a noun or verb). These terms are generally synonymous in referring to platforms intended for long-term data storage and preservation but can have specific connotations (e.g., for the federal government, see USGS explanation). In general, archives are meant for a single static deposit that is not intended to be updated (versioned), and data may thus be stored in a way that are harder to quickly access or edit (e.g., tape drives).

Types of repositories

There are three major types of research data repositories:

  • Specialist / domain repositories: These are third-party repositories (many are supported by universities) that focus on a specific discipline or type of data. Some are relatively broad, such as the Qualitative Data Repository (QDR, supported by Syracuse) or the Inter-university Consortium on Social and Political Research (ICPSR, supported by the University of Michigan). Others may be more specific, like Xenbase, a repository specifically for the frog Xenopus (supported by the University of Calgary) or the Tromsø Repository of Language and Linguistics (TROLLing, supported by the University of Tromsø). Specialist repositories tend to have more stringent requirements for submission (e.g., only accept certain file types, require discipline-specific metadata, require staff approval for publication). There are thousands of specialist repositories in the world.
  • Generalist repositories: These are third-party repositories that accept practically any content from any discipline. There are relatively few generalist repositories, but they tend to be well-known because they hold orders of magnitude more data than specialist repositories (hundreds of thousands of datasets). These include Dryad, Figshare, Harvard Dataverse, IEEE Dataport, Mendeley Data, Open Science Framework (OSF), Science Data Bank, and Zenodo. For more, see the Generalist Repositories tab to the left.
  • Institutional repositories (IRs): These are repositories managed by institutions; these may be built on custom software or existing closed- or open-source software. Most institutions have at least one IR, which accept content across disciplines, but these can vary widely in the scope of what type of content they accept and are usually only for current researchers of that institution, which differentiates them from generalists. Some are only for data, or conversely, may only be for non-data outputs like theses and dissertations. At UT Austin, we have two main IRs, Texas ScholarWorks (TSW), which is for non-data/software scholarly outputs like preprints, accepted manuscripts, theses, dissertations, and conference presentations, and the Texas Data Repository (TDR), which is primarily for research data (but can be used for research software as well). The Research Data Services team manages TDR - see the Texas Data Repository tab to the left for more details.

Sample data repositories

Below are some examples of well-known specialist repositories for social science and humanities disciplines.

Below are some examples of well-known specialist repositories for natural science, engineering, technology, and math disciplines.

  • MorphoSource: MorphoSource preserves 3D data of physical objects such as cultural objects and zoological specimens. It features an in-browser visualizer that converts raw 2D image stacks to 3D volume renders and is popular for educators looking for animal and paleontological specimens.
  • Movebank: Movebank stores data related to animal movements and provides various interactive tools in their web interface for researchers and non-researchers alike.
  • NIH data repositories (assorted): National Institutes of Health-supported data repositories that make data accessible for reuse. Most accept submissions of appropriate data from NIH-funded investigators (and others), but some restrict data submission to only those researchers involved in a specific research network. Various databases maintained by NCBI also qualify as specialist repositories.
  • RCSB Protein Data Bank: The PBD provides data and tools related to experimentally-determined 3D protein structures and computationally-derived structural models.

Need help with repositories?

Profile Photo
Bryan Gee
he/him

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.