Location-based data is another common source of sensitivity that may introduce risk for both human and non-human (e.g., archaeological sites, organisms) entities. Location-based data can come in various forms, ranging from precise GPS coordinates to postal information (e.g., zip codes) to proximity to a landmark (e.g., living within a 5-minute walk of Times Square in New York City). This page provides an overview of some strategies for reducing sensitivity of this type of data, divided into three general categories: coordinates; addresses/postal information; and physical entities/landmarks.
Coordinates are typically the most easily recognized location-based data with potential to be sensitive. Increasing precision of GPS instruments and routine incorporation of GPS into common consumer devices like cellphones make this data both ubiquitous and highly identifying in many instances. Many disciplines also emphasize reporting of high-precision coordinates as metadata for documentation of occurrences of important sites or organisms.
The majority of coordinate data generated by scholarly research pertains to non-humans, either to culturally relevant sites like archeological localities or to non-human organisms like animals. Some common considerations include:
Whether coordinates can be published, and if so, to what precision, is highly context-dependent. Consider the following example:
In general, it is advisable to reduce unnecessary precision (decimal places or seconds) in public reporting of coordinates. Another approach is to generalize points to a centroid or to use an alternative, generalized framework (e.g., township-range in the United States).
Generalizing location-based data is not assured to anonymize the location. Many organisms are restricted to certain types of habitat such that even if coordinates are highly generalized, it may be possible for someone to confirm and essentially locate the organism because such habitat is rare in a broad geographic area (e.g., bodies of water, caves, certain types of forest or vegetation). Sites with generalized coordinates may be similarly identifiable if they are shown (e.g., photos) or described as being near prominent features (e.g., buttes, canyons).
Coordinate data for human participants is less common than other location-based data and can rarely be shared publicly, so it requires careful handling. For both humans and non-human entities, there may be various legal prohibitions imposed by funding, regulatory, or collaborative entities on sharing precise coordinate data (e.g., federal land management agencies, personal data protection frameworks).
In addition to coordinates, postal information can also be identifying and thus sensitive. It is relatively intuitive that full addresses, especially those that relate to a small group of people or a single person (e.g., residential address) are highly sensitive. However, other attributes of postal information, such as zip codes and town/city names, can also be identifying, especially when combined with either intrinsic information about those locations (e.g., racial demographics) or in tandem with other attributes.
For zip codes or similar postal codes, it may be possible to truncate the value to preserve a more generalized form of the location. For zip codes, truncating to the first three digits (e.g., 787 for 78712), as these digits only represent the mail sorting and distribution center for the area.
For qualitative location-based data like names, there are a few options.
In addition to postal information, names of landmarks and physical entities are another form of location-based data. These are more likely to occur in unstructured and qualitative datasets where data collection is not always narrowly constrained (e.g., open-ended, conversational questions). Predictably, people are often likely to mention notable landmarks to help situate others, whether this is a globally-recognized landmark like the Eiffel Tower or a locally-recognized landmark like a local park. Landmarks need not be an officially designated or labeled entity like a business, monument, etc. These landmarks might be related to where a person lives, works, regularly commutes, or experienced an event. Businesses and other types of branded, physical locations are one of the most common landmarks:
It's important to keep in mind that physical entities that don't have an official name/designation (e.g., a particularly large tree) or that are only named generically (e.g., "the market") can still be precisely identifying based on local context (e.g., if there is only one market in a town). If researchers are not local to the area, this may be harder to know.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.