Why describe your data?
If you want your data to be useful for your future self, your colleagues, or other researchers, you must describe it. Beyond what you planned, what you did, what happened, and what you think it means, describe your data in detail. The sooner you start, the more likely you will remember everything!
Where should I start?
- Scope and structure. Do your best to describe the content, formats, and any internal relationships in your dataset.
- Glossaries and legends. Define any terms, codes, and variables that you use.
- Context. Provide details relevant for validation/replication (e.g. funding sources, timetables, collaborators, location, and environmental conditions).
- Methods. Describe techniques, software, and hardware used in data collection.
- Analysis. Describe steps you took in processing and analyzing your data.
- Attribution. Cite your data sources.
- Access. Provide details about confidentiality, access & use conditions of your data.
What about metadata?
- Data description is metadata. The formal/structured description associated with your data is the metadata that enables someone (even your future self!) to find it, access it, determine value, and potentially use it.
- Meet minimum requirements. Depending on how you plan to share and/or archive your data, you may need to meet discipline standards or repository requirements for your metadata.
- Choose a format. Even just including a general narrative in a text file (e.g. readme.txt) associated with your datasets is better than nothing! Or use standard element sets, input guidelines, and controlled vocabularies along with CSV, XML, or RDF encoding formats for your data description to make it more interoperable and machine-readable.
Associated documentation
Other materials associated with your primary data set, including those listed below, should be digitized and included with your data. They should also at least be referenced in your data description, if not fully described:
- Code books or lab books
- Data dictionaries
- Field notes