LibGuides: Research Data Services: Where to Share Data

What is a repository?

Repository (noun): a place, room, or container where something is deposited or stored (Merriam Webster).

Repositories are not exclusive to research or research data, nor do they have to be digital entities. Some of the best-known repositories are physical entities like museums and libraries. In the context of research data, repositories are platforms with policies and infrastructure that are designed for the long-term preservation and accessibility of data. Data often are, but do not need to be, associated with a specific article, book, or similar output. Although repositories vary widely, some common core attributes include:

Use of persistent identifiers (PIDs) that point to a landing page that is perpetually accessible, even if a dataset is no longer available for some reason. DOIs are the most commonly known PID, but there are other examples like ARKs and handles.
Commitment to long-term sustainability that includes the technical infrastructure to ensure security of data and to establish back-ups, a sustainable business model, and a contingency plan for sunset-ing the repository.
Commitment to free access, consistent with legal and ethical standards.
Clear policies for users on depositing data, accessing data, and reusing data.

The White House Office of Science and Technology Policy (OSTP) produced a document titled "Desirable Characteristics of Data Repositories for Federally Funded Research" with additional details.

The term archive is sometimes also used with respect to data sharing (either as a noun or verb). These terms are generally synonymous in referring to platforms intended for long-term data storage and preservation but can have specific connotations (e.g., for the federal government, see USGS explanation). In general, archives are meant for a single static deposit that is not intended to be updated (versioned), and data may thus be stored in a way that are harder to quickly access or edit (e.g., tape drives).

Types of repositories

There are three major types of research data repositories:

Specialist / domain repositories: These are third-party repositories (many are supported by universities) that focus on a specific discipline or type of data. Some are relatively broad, such as the Qualitative Data Repository (QDR, supported by Syracuse) or the Inter-university Consortium on Social and Political Research (ICPSR, supported by the University of Michigan). Others may be more specific, like Xenbase, a repository specifically for the frog Xenopus (supported by the University of Calgary) or the Tromsø Repository of Language and Linguistics (TROLLing, supported by the University of Tromsø). Specialist repositories tend to have more stringent requirements for submission (e.g., only accept certain file types, require discipline-specific metadata, require staff approval for publication). There are thousands of specialist repositories in the world.
Generalist repositories: These are third-party repositories that accept practically any content from any discipline. There are relatively few generalist repositories, but they tend to be well-known because they hold orders of magnitude more data than specialist repositories (hundreds of thousands of datasets). These include Dryad, Figshare, Harvard Dataverse, IEEE Dataport, Mendeley Data, Open Science Framework (OSF), Science Data Bank, and Zenodo. For more, see the Generalist Repositories tab to the left.
Institutional repositories (IRs): These are repositories managed by institutions; these may be built on custom software or existing closed- or open-source software. Most institutions have at least one IR, which accept content across disciplines, but these can vary widely in the scope of what type of content they accept and are usually only for current researchers of that institution, which differentiates them from generalists. Some are only for data, or conversely, may only be for non-data outputs like theses and dissertations. At UT Austin, we have two main IRs, Texas ScholarWorks (TSW), which is for non-data/software scholarly outputs like preprints, accepted manuscripts, theses, dissertations, and conference presentations, and the Texas Data Repository (TDR), which is primarily for research data (but can be used for research software as well). The Research Data Services team manages TDR - see the Texas Data Repository tab to the left for more details.

Sample data repositories

Social Sciences & Humanities
Natural Sciences, Engineering, Tech, & Math

Below are some examples of well-known specialist repositories for social science and humanities disciplines.

Archive of the Indigenous Languages of Latin America (AILLA): AILLA is a UT-managed repository whose primary mission is to preserve materials in and about the indigenous languages of Latin America.
Inter-university Consortium for Political and Social Research (ICPSR): ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts data collections in education, aging, criminal justice, substance abuse, terrorism, and other fields. Free with UT institutional membership.
Qualitative Data Repository (QDR): QDR curates, stores, preserves, publishes, and enables the download of digital data generated through qualitative and multi-method research in the social sciences.

Below are some examples of well-known specialist repositories for natural science, engineering, technology, and math disciplines.

MorphoSource: MorphoSource preserves 3D data of physical objects such as cultural objects and zoological specimens. It features an in-browser visualizer that converts raw 2D image stacks to 3D volume renders and is popular for educators looking for animal and paleontological specimens.
Movebank: Movebank stores data related to animal movements and provides various interactive tools in their web interface for researchers and non-researchers alike.
NIH data repositories (assorted): National Institutes of Health-supported data repositories that make data accessible for reuse. Most accept submissions of appropriate data from NIH-funded investigators (and others), but some restrict data submission to only those researchers involved in a specific research network. Various databases maintained by NCBI also qualify as specialist repositories.
RCSB Protein Data Bank: The PBD provides data and tools related to experimentally-determined 3D protein structures and computationally-derived structural models.

Need help with repositories?

Bryan Gee

he/him

Email Me

Subjects: Data Management