Sensitive data is research data that can cause harm to the research population being studied if irresponsibly managed or disseminated, whether they are humans or not.
Different entities have different definitions.
Just like how the term ‘data’ has many connotations beyond research data specifically, ‘sensitive data’ can mean different things to different people, even within a university setting. For example, UT Austin uses a three-tiered data classification scheme, but if you look at the examples for the different tiers, you’ll see items like employee salary information; Social Security Numbers; and information on university infrastructure. Most of these aren’t part of university research, and ‘research data’ is treated as one lump category.
Any research data related to humans can be sensitive. It's not just biomedical research or research that requires IRB approval. Harms may include identity theft, legal, financial problems, and emotional or reputational damage. Sensitive data, in the context of research involving human participants, is any datum or set of data that could lead a participant to be re-identified. A definitive harm resulting from being re-identified is not a criterion for data to be considered ‘sensitive.’ It is irrelevant whether the researcher(s) collecting the data do not perceive there to be a clearly defined risk associated with re-identification. In order to ensure participant autonomy, their identity should neither be released nor determinable unless they have consented to this.
Direct identifiers are generally unique to an individual or a small group of individuals. Direct identifiers usually need to be removed from a data set before its release. These include Personally Identifiable information such as name, address, social security number, email address, phone number, gender, race, and address information. It also includes biometric data such as a facial image, fingerprints or voice signature.
In-direct identifiers can be linked with information from other sources (not in your research study) such as: social media, administrative data, or other public datasets, and this linking may result in identification of an individual. These include zip code, birthdate, education level , race, and ethnicity, medical diagnosis, occupation.
You may need to remove direct AND indirect identifiers before sharing.
Data about indigenous people falls under Indigenous data sovereignty which is the right of a nation to govern the collection, ownership, and application of its own data. It derives from tribes' inherent right to govern their peoples, lands, and resources.
This means that when working with data from Indigenous people, resources or land, you must protect it and work with community members to ensure that the data is placed in the right repositories with the appropriate level of anonymization or restriction. The CARE Principles for Indigenous Data Governance (“CARE” stands for “Collective Benefit, Authority to Control, Responsibility, and Ethics”) can help you work with Indigenous data as well as entities such as the U.S. Indigenous Data Sovereignty Network.
Other types of data or data on certain non-human topics can be considered sensitive as well. Common examples include:
In some instances, publishing sensitive non-human data is acceptable if the potential benefits can be shown to outweigh the potential harms or if the data can be sufficiently generalized in nature. In other instances, there will be absolute prohibitions on data sharing (e.g., export control). Or, select parts of the research may be shared, such as the methodology but not the data.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.