There’s no one way to organize your data, but a consistent and descriptive file structure can save you time and money later. Use a system that makes sense to you so that keeping things in order becomes a habit instead of a chore.
Why bother?
- Save money. Well-organized, easy-to-find files make a big impact on the efficiency of your research. This is especially important if you are sharing active data with collaborators.
- Increase your impact. In the long run, well-managed data are more discoverable, accessible, and reusable (including by your future self!). This will help increase the visibility and impact of your work.
- Demonstrate integrity. Discoverable, accessible, and reusable data are fundamental to ensuring reproducible or replicable research.
- It’s required. Most funding agencies now require a data management plan for grant proposals. Publishers increasingly insist on data sharing as a condition of publishing your work. Organized, reusable data help demonstrate compliance.
File Formats
Choosing file formats carefully helps avoid obsolescence. Use formats that are:
- Non-proprietary, open, documented standards (e.g., .tif, .txt, .csv, .pdf)
- Encoded with standard characters (e.g., ASCII, UTF-8)
- Used commonly in your research community
- Unencrypted
- Uncompressed
Examples of open formats
- Containers: TAR, GZIP, ZIP
- Databases: XML, CSV
- Geospatial: SHP, DBF, GeoTIFF, NetCDF
- Moving images: MOV, MPEG, AVI, MXF, WMV
- Sounds: WAVE, AIFF, MP3, MXF
- Statistics: ASCII, DTA, POR, SAS, SAV
- Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
- Tabular data: CSV
- Text: XML, PDF/A, HTML, ASCII, UTF-8
- Web archive: WARC
For more information on recommended file formats, go here: Recommended Formats Statement
File Naming
Adopt a naming convention and use it throughout a project (or throughout your career). Consider including a README.txt file that explains your naming convention and any codes or abbreviations you use. File names should: