Properly managing data and other research outputs starts at the beginning of the project and continues throughout. Below are tips for managing the project during three phases.
Before the Project
Plan for the project
Generate a directory for the project.
Design the structure of the directory.
An example of a directory structure could be:
ProjectName/
README.MD
Dataset/
Raw Data/
Processed Data/
YYYY-MM-DDVersion
YYYY-MM-DDVersion
Analysis (or Code)/
Data cleaning/
Data preprocessing/
Output/
Graphs
Tables
Publications/
.tex files
.bib file
In designing the structure of the project directory consider the following:
Put code and data in separate subdirectories.
Plan to separate raw data from processed data.
Always apply the the 5 Cs: be Clear, Concise, Consistent, Correct, and Conformant.
Apply good practices for file naming.
Pick a scheme for organizing data.
Add ReadME file to the directory. The ReadME file should include:
Description of the project and information about the funders and collaborators (if there is any).
Goal statement of the project.
The projected input.
The expected output.
Expected environmental computations.
During the Project
On daily/weekly basis, document the steps you have accomplished.
If you have done analysis or collect dataset, try to document the workflow to do this and include the scripts you have generated for featuring the data.
For each script that does analysis, document it in another file and write what it does, the input, and the output of the script.
Choose file formats that will ensure long-term access (e.g., .txt).
Directly after submitting a scientific paper, document the workflow of the analysis in the paper, from fetching to preprocess the data, until you reach how the graphs and the tables in the paper were generated (i.e., write the steps that allow anyone to repeat the analysis):
List all the steps of the analysis in the paper.
Write a complete description of each algorithm/technique and the source code of these algorithms.
Write clearly the output and the input of each step.
Deposit your data in a repository for long-term preservation.