Skip to Main Content
University of Texas University of Texas Libraries

Research Data Services

replacement for my website

Citing Data & Software

Citing Data

When should data be cited?

All data that were used to carry out your research should be cited to ensure reproducibility of your work. This includes both data that you generated and any data that were previously published (by you or someone else) and re-used in a given study.

How should data citations and references be formatted?

Data should be cited in the same fashion as a journal article; an abbreviated in-text citation should be included wherever appropriate in the main text, and a full reference should be included in the reference. This includes both data that you generated and any data that were previously published (by you or someone else).

How should data citations and references be formatted?

Many of the major reference formats that are used by publishers have specific reference/citation formats for data/datasets, in addition to the better-known styles for journal articles, books, preprints, etc. (e.g., APA). A journal may also have its own specification for how to cite a dataset. In general, dataset references should contain:

  • Author(s)
  • Publication year
  • Title
  • Repository
  • DOI

Additional information like the dataset version (if relevant) and a bracketed description (e.g., "[Data set]") may also be recommended or required. The in-text citation should be formatted like articles or books (e.g., Smith et al., 2025).

Joint Declaration of Data Citation Principles
Dataverse Data Citation Best Practices

Citing Software

When should software be cited?
All software that was used to carry out your research should be cited to ensure reproducibility of your work. This includes software that was used to gather, generate, process, or analyze your research data and can range from relatively small scripts that you created and deposited in a repository like Zenodo to software packages/libraries (e.g., pandas; ggplot2) and fully-fledged programs (e.g., QGIS).
How should data citations and references be formatted?

Software should be cited in the same fashion as a journal article; an abbreviated in-text citation should be included wherever appropriate in the main text, and a full reference should be included in the reference. This includes both software that you produced and any software that were previously released (by you or someone else).

How should software citations be formatted?

Software citation formats may vary by publisher so it is important to following the recommended citation style if one exists. If you find that there is no publisher specific software citation guidance for you to follow, check to see if the developers of the software you have utilized have provided a recommended citation format. If you cannot find specific guidance from the publisher you are working with or from the software developers themselves, a good option is to adhere to the Force11 software citation guidelines and include your software citations in your list of references alongside citations for articles, books, datasets, and other materials referenced by your work.

GitHub Citation Files
GitHub Announcement of Enhanced Support for Software Citation
Katz et al. (2020): Recognizing the value of software: a software citation guide. DOI: 10.12688/f1000research.26932.2.

Why isn't citing the associated paper enough?

Oftentimes, when researchers are reusing data or software that were previously published, they cite the associated paper that first described the data or software. However, this is a poor scholarly practice for several reasons:

  • The paper does not contain the software or the data, it merely uses/analyzes them. As data and software become more common and increase in size and complexity, they are increasingly being separated from the main text of articles, whether in Supplemental Information or in third-party repositories. Therefore, if you need to cite a dataset or a piece of software, it is inappropriate to cite just the paper; it would be like only citing one paper of a certain research group when you actually used four related papers from that group for your argument.
  • Legal attribution requires attribution of that object, not a related object. Researchers often apply Creative Commons licenses like CC BY (Attribution) to data and software. This is typically inappropriate for different reasons (e.g., Creative Commons says not to use CC licenses for software), but chief among them is that legal attribution is not the same as scholarly citation. Where content can actually be copyrighted (e.g., most software, some proprietary data), any content that is reused but that only cites the associated paper (or that doesn't cite anything at all) has committed a copyright violation in failing to properly attribute the data or software specifically. If you wanted to cite an idea put forward by one paper, it would not be acceptable to cite a different paper by the same author(s) that doesn't make that actual point; the same is true for data/software and an associated paper.
  • Citing the associated paper prevents tracking of reuse of the data/software specifically. Many researchers are interested in knowing how many people may be reusing their data and/or software. Basic web traffic attributes like views and downloads are tracked for publicly accessible deposits, but these don't necessarily indicate reuse - someone might just have a glance at your files and then decide not to use them. The clearest way to indicate reuse of data or software, as when we refer to content directly in articles, is a full in-text citation and reference.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.