This page provides an overview of some best practices and tips for creating good data documentation.
In many instances, you may not need to create a data dictionary from scratch, or maybe even at all! You may already have created a file that describes important information, such as how different variables are coded (e.g., 0 = no; 1 = yes; 2 = not answered), units of measurement, or allowable values, as part of designing your experiment and to guide data collection. For data collected through a survey or similar mechanism, a text file with the questions may also work as a data dictionary (provided that allowable answers are listed for non-open-ended questions). Some software programs may also be able to automatically generate a data dictionary or codebook from a given dataset (e.g., SPSS).
Below is a short example of a hypothetical data dictionary. Refer to the right sidebar for some real-world examples from UT Austin researchers.
| Variable | Description | Allowable values | Units | Notes | 
| SL | Skull length | [0,15] | cm | only measured for complete specimens; 'N/A' for incomplete specimens. | 
| m | Mass | [0,55] | kg | only measured for complete specimens; 'N/A' for incomplete specimens. | 
| taxon | Species of Dasypus | ['D. hybridus'; 'D. kappler'; 'D. pilosus'] | N/A | none | 
| stage | Developmental stage | ['pup'; 'juvenile'; 'sub-adult'; 'adult'] | N/A | none | 
Some additional guides to creating data dictionaries:
The information that should be included in a README will vary by discipline and data type, but some information should always be provided. Essential metadata to include are:
Additional metadata will vary by the type of data but could include information such as:
Q: Isn't some of this information redundant with what I fill out in the submission form?
A: Yes, oftentimes, researchers have to enter metadata for data deposits in repositories similar to entering metadata for manuscripts in journal systems. Most of this information (like author names) should still go in the README. One of the major benefits of a README file is that it will be downloaded with the data files, making it harder to separate or lose that metadata; most people are not going to download the XML of a webpage or take a screenshot that they save with the data files, for example.
Cornell has an excellent guide to creating README style metadata, including a template.
Below are some examples of datasets published by UT researchers with good data dictionaries:
Below are some examples of datasets published by UT researchers with good README files:

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.

