Skip to Main Content

Reproducible Research Practices: Managing Your Project

This guide presents best practices in documenting scientific research process to make the research reproducible.

Managing your Project

Properly managing data and other research outputs starts at the beginning of the project and continues throughout. Below are tips for managing the project during three phases. 


Before the Project

  • Plan for the project
  • Generate a directory for the project.
  • Design the structure of the directory.
    1. An example of a directory structure could be:
      1. ProjectName/
        • README.MD
        • Dataset/
          • Raw Data/
          • Processed Data/
            • YYYY-MM-DDVersion
            • YYYY-MM-DDVersion
        • Analysis (or Code)/
          • Data cleaning/
          • Data preprocessing/
          • Output/
            • Graphs
            • Tables
        • Publications/
          • .tex files
          • .bib file
  • In designing the structure of the project directory consider the following:
    1. Put code and data in separate subdirectories.
    2. Plan to separate raw data from processed data.
  • Always apply the the 5 Cs: be Clear, Concise, Consistent, Correct, and Conformant.
  • Apply good practices for file naming.
  • Pick a scheme for organizing data.
  • Add ReadME file to the directory. The ReadME file should include:
    1. Description of the project and information about the funders and collaborators (if there is any).
    2. Goal statement of the project.
    3. The projected input.
    4. The expected output.
    5. Expected environmental computations.

During the Project

  • On daily/weekly basis, document the steps you have accomplished.
  • If you have done analysis or collect dataset, try to document the workflow to do this and include the scripts you have generated for featuring the data.
  • For each script that does analysis, document it in another file and write what it does, the input, and the output of the script.
  • Choose file formats that will ensure long-term access (e.g., .txt).
  • Use coding convention (e.g., module naming, comments).
  • Use relative paths in the code (e.g., ../rawdata/example_file.csv).
  • Use tools that help you to automate your analysis, such as R.
  • Use tools that help in documenting the workflow and managing the data:
    1. Tools that capture the experimental environment, such as Docker, ReproZIP, and CDE.
    2. Tools that capture the sequence of the computations such as Jupyter Notebook.
  • Document each major step from the analysis.
  • Backup your data files every while.
  • Use GNU Make to automate the research process.
  • For writing your manuscript use latex and Bibtex.
  • Use version control system such as Git.

Example for README Template

After the Project

  • Directly after submitting a scientific paper, document the workflow of the analysis in the paper, from fetching to preprocess the data, until you reach how the graphs and the tables in the paper were generated (i.e., write the steps that allow anyone to repeat the analysis):
    1. List all the steps of the analysis in the paper.
    2. Write a complete description of each algorithm/technique and the source code of these algorithms.
    3. Write clearly the output and the input of each step.
  • Deposit your data in a repository for long-term preservation.
    1. Institutional Repositories: Dash or Merrit.
    2. Non-institutional repositories: Figshare.
  • Get your Data and your project DOIs.
  • Share your project on Open Science Framework.
  • If you have a sensitive data, check your institution policy to before sharing.
  • License your software.
  • Develop a clear, accurate, and precise user documentation for the project.