Skip to Main Content
University of Texas University of Texas Libraries

Research Data Services

replacement for my website

Organize

Organize

There’s no one way to organize your data, but a consistent and descriptive file structure can save you time and money later. Use a system that makes sense to you so that keeping things in order becomes a habit instead of a chore.

Why bother?

  • Save money. Well-organized, easy-to-find files make a big impact on the efficiency of your research. This is especially important if you are sharing active data with collaborators.
  • Increase your impact. In the long run, well-managed data are more discoverable, accessible, and reusable (including by your future self!). This will help increase the visibility and impact of your work.
  • Demonstrate integrity. Discoverable, accessible, and reusable data are fundamental to ensuring reproducible or replicable research.
  • It’s required. Most funding agencies now require a data management plan for grant proposals. Publishers increasingly insist on data sharing as a condition of publishing your work. Organized, reusable data help demonstrate compliance.

File Formats

Choosing file formats carefully helps avoid obsolescence. Use formats that are:

  • Non-proprietary, open, documented standards (e.g., .tif, .txt, .csv, .pdf)
  • Encoded with standard characters (e.g., ASCII, UTF-8)
  • Used commonly in your research community
  • Unencrypted
  • Uncompressed

Examples of open formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF, WMV
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

For more information on recommended file formats, go here: Recommended Formats Statement

File Naming

Adopt a naming convention and use it throughout a project (or throughout your career). Consider including a README.txt file that explains your naming convention and any codes or abbreviations you use. File names should:

  • Describe the contents of the file, but not be overly long. Avoid generic names (like draft.doc; final2.xls) that can be hard to decipher and easily overwritten.
  • Include dates. Don’t rely on system dates, which can be misleading. Recommended formats look like: YYYYMMDD or YYYY-MM-DD.
  • Reserve 3-letter file extensions for application-specific codes (e.g., .jpg, .mov, .tif).
  • Not contain special characters like "/ \ : * ? " < > [ ] & $. These have meaning in software and operating systems and can cause trouble.
  • Not contain spaces. These are problematic for some operating systems. Use underscores (file_name), dashes (file-name), or camel case (FileName) instead.
  • Be consistent.
  • Example Project_instrument_location_date_time_version.ext

Tools and Resources

Don’t name your files one at a time! Use a free batch-renaming tool:

  • OpenRefine is a powerful, free and open source tool for cleaning messy data.
  • Video tutorial for using OpenRefine: http://openrefine.org/

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.