Skip to Main Content
University of Texas University of Texas Libraries

Research Data Services

replacement for my website

Describe and Document


Why describe your data?

If you want your data to be useful for your future self, your colleagues, or other researchers, you must describe it. Beyond what you planned, what you did, what happened, and what you think it means, describe your data in detail. The sooner you start, the more likely you will remember everything!

Where should I start?

  • Scope and structure. Do your best to describe the content, formats, and any internal relationships in your dataset.
  • Glossaries and legends. Define any terms, codes, and variables that you use.
  • Context. Provide details relevant for validation/replication (e.g. funding sources, timetables, collaborators, location, and environmental conditions).
  • Methods. Describe techniques, software, and hardware used in data collection.
  • Analysis. Describe steps you took in processing and analyzing your data.
  • Attribution. Cite your data sources.
  • Access. Provide details about confidentiality, access & use conditions of your data.

What about metadata?

  • Data description is metadata. The formal/structured description associated with your data is the metadata that enables someone (even your future self!) to find it, access it, determine value, and potentially use it.
  • Meet minimum requirements. Depending on how you plan to share and/or archive your data, you may need to meet discipline standards or repository requirements for your metadata.
  • Choose a format. Even just including a general narrative in a text file (e.g. readme.txt) associated with your datasets is better than nothing! Or use standard element sets, input guidelines, and controlled vocabularies along with CSV, XML, or RDF encoding formats for your data description to make it more interoperable and machine-readable.

Associated documentation

Other materials associated with your primary data set, including those listed below, should be digitized and included with your data. They should also at least be referenced in your data description, if not fully described:

  • Code books or lab books
  • Data dictionaries
  • Field notes

Tools and Resources

Metadata are structured information that provides context for research data, and in doing so enables discovery, use, exchange, and preservation of that data.

  • Dublin Core is a domain-agnostic, well-known and widely used standard for simple, generic descriptions.
  • Electronic Lab Notebooks are a useful tool for documenting and managing data throughout your project. There are many options (e.g., LabArchivesEvernote), each with unique features for various workflows.
  • This readme.txt template provides detailed recommendations for how to describe and cite your data.

FAIR Data Principles are for making data Findable, Accessible, Interoperable, and Reusable.

  • Findable – Assign persistent IDs, provide rich metadata, register in a searchable resource.
  • Accessible – Retrievable by ID using a standard protocol, allows for authentication/authorization, metadata remain accessible even if data aren’t.
  • Interoperable – Use standard vocabularies, qualified references, shared and broadly applicable language for knowledge representation.
  • Reusable – Rich, accurate metadata, clear licenses, provenance, use of community standards.
  • Force 11 FAIR Data Principles - Description of FAIR -
  • How FAIR are your data? - A Checklist produced for use at the EUDAT summer school to discuss how FAIR the participant's research data were and what measures could be taken to improve FAIRness.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.