Data and other supporting research outputs can and have been shared through a wide range of outlets that are not data repositories. Most of these are not appropriate for long-term preservation and may be out of compliance for publisher or funder requirements. Some of the most common examples (supplemental information; GitHub) are detailed here in order to explain why these are sub-optimal avenues for data sharing.
Supplemental Information (SI) and its many nominal permutations (Supplementary Materials, etc.) is probably the most common location for researchers to share supporting materials, ranging from raw data to supplemental figures to appendices to code. However, SI is not a good location for sharing most/all of these materials anymore.
Both SI and data sharing are products of the rapid evolution of digital technology and its application to research. A hundred years ago, everything had to go into a print copy or be left out. When digital publishing started, the only data repositories were physical entities like museums. SI arose as a means for authors to provide more information without increasing print costs, sometimes in formats that cannot be included in article PDFs (e.g., videos); SI is often critical for journals with page/word limits, as most of the substance may be found in the SI (e.g., Nature, Science) (Sacher, 2011). For most researchers, the second most popular option was maintaining supporting material on a personal website or a university website, both of which are now regarded as poor practices because these pages can disappear overnight if a researcher leaves an institution or forgets to pay the bill for their website domain ('link rot'; see Briney et al., 2024 [PLOS ONE], for an assessment of how much link rot occurs with research data). Data repositories are the solution to many of these potential problems, among others (e.g., ensuring content is available beyond the career or lifespan of an individual researcher or research group).
The NIH DMS Policy expects that researchers will share scientific data through established data repositories. Sharing data through publications, local servers, or lab websites is not the same as using a repository and does not meet the Policy’s expectations.” – NIH Office of Data Science Strategy
For researchers supported by U.S. federal research grants, there is an expectation that data are shared through proper data repositories; use of SI is considered to fall under "through publications" and is not acceptable for meeting funder requirements.
GitHub is a common tool used for software development, and increasingly, in academic research. Oftentimes, researchers are confused when they hear that GitHub does not meet qualify as a proper repository for publishing any of their research materials, but it is important to keep a few things in mind:
GitHub is a cloud-based platform where you can store, share, and work together with others to write code. -GitHub Docs
GitHub was developed for non-academic software developers, who remain its primary user base. It is not itself a repository, and individual 'repositories' in GitHub are defined only as a "a location to store code" (see GitHub's own description) rather than the definition used for publishing research data and software, which is a platform specifically designed for long-term preservation of academic research data.
Some researchers may have seen other papers published where data and/or code are exclusively shared through GitHub, or perhaps you have done this yourself. Journals have increasingly been burned when GitHub repositories are made private, or worse, deleted, rendering the data/code inaccessible through the paper. An increasing number are thus prohibiting data and code from being exclusively shared in this way.
For some of our journals, material may be provided via GitHub, Google drives, Dropbox, or similar services for the review stage, but they must be moved to a permanent, publicly accessible repository during revision. -The Royal Society of London
Not at all! The Research Data Services team routinely uses GitHub in our library and non-library work, and the UT Open Source Program Office recommends GitHub as a key tool in software development. The key point is to use GitHub as it was designed, which is as a great collaborative tool during the research process but not for the long-term preservation of academic research material. Many data repositories, including the Texas Data Repository, have GitHub integrations that facilitate the deposition of content hosted on GitHub into proper repositories. For researchers who are developing code but who may need to archive a static version for a publication, many of these integrations also enable linked versioning of the deposit in a repository (e.g., through making a new release). Code can thus be continually updated in a public platform while ensuring that requirements for scholarly publishing are also met by depositing a static version in a repository. If you have both a GitHub repository and a linked DOI-backed deposit in a proper research repository, you can link/cite both in an associated paper so that readers know that a specific version was used for the paper but that the product is still being worked on (see Jones et al., 2023 [Methods Ecol Evol] for an example).
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.