The way you manage your data during analysis depends entirely on the type of data you’re using and what you’re doing with it. There are, however, several strategies you can adopt to avoid disaster, save time, and improve your ability to make sense of your work later on.
Save your raw data. It is vitally important to maintain a copy of your data in its rawest, least processed form. This allows you to start over if something goes wrong, or to re-analyze the same dataset testing different variables or protocols.
Control your versions. Keeping track of file versions can be done via consistently applied naming conventions. In projects that involve code or software development where there are frequent edits or multiple contributors, consider using a more elaborate version control system. Git is a popular choice, but your research community or lab may have a preferred environment.
Back things up. Proper storage and backup strategies are key to preventing catastrophic data loss due to things like hardware failure, natural disaster, computer viruses, or theft. Maintaining working copies of your data requires thoughtful consideration of hardware, redundant storage locations, and a disaster plan.
Whether for your future self or other researchers, it is crucial that you describe the process of your analysis. This can mean taking good notes, saving log files, or capturing your every step in an electronic lab book. Be sure to keep a copy together with any data or code you produce so that you can follow your trail later on.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.