This guide provides information and resources on data support and management for students and researchers participating in UC Berkeley's Responsible Conduct of Research (RCR) trainings. The Library Data Services and Research Data Management Programs provide a three part module that addresses proper data management and organization with the goal to increase rigor, transparency, and adhere to funder standards. The following boxes and tabs supplement information taught during the data management components of the training. If you have questions or comments about the content presented in this guide, please email librarydataservices@berkeley.edu.
DMP Tool: Build your Data Management Plan
The DMPTool is an online system that can assist in a developing a data management plan by providing manuals, practice templates, along with giving information on how to meet specific funding agency requirements. Sign in to the tool using UC Berkeley credentials.
Learn More on UC Berkeley Grant Life Cycle
Best practices refers to procedures that are seen as the most acceptable in a business or organizational system. By having continuous development and categorizing data using best practices, data files can be easier to identify, along with managing them to be distributed among others.
Ensure that your filenames have identifiers to aid in organizing and quickly accessing data. It can be helpful to include elements such as specific project names, dates, locations, and version numbers in creating a filename.
Tips:
Create a consistent file-name template for each type of data file, and record the template codes in a README file (see below)
Format dates as YYYYMMDD (four digit year, two digit month, two digit day)
Prevent long file names to maintain organization
Do not use special characters such as ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' "
Avoid spaces by using underscores, dashes, or camelCase
Be careful with placement of periods because they designate file extensions and are used in Regular Expressions as wildcards. A period at the beginning of a file name indicates a configuration or hidden file.
Create a master key using a spreadsheet template can help in naming files
Title of project and dataset
Name and contact information for PI and responsible researcher
File name template, elements, and codes
File formats
Variable names, units, etc.
Data processing: how final data were derived from raw data
Versioning: change log for documenting file versions
Store in top directory to which is applies
Free Downloadable Template courtesy of Cornell University
Following good spreadsheet practices is important to ensure data can be readily understood, analyzed, and reused. Data may not be exported or read correctly if spreadsheets do not follow these guidelines:
Create a safety backup file before making changes
Avoid empty cells. If there is missing data or no data value in a specific cell, indicate this by entering a code such as -999 or -9999.
Avoid empty columns and rows
Do not use special characters
Steer clear of missing headers or headers in multiple places
Do not merge cells
Use entries in additional columns to convey information rather than colorful text or cell backgrounds
Avoid commas
Do not utilize embedded comments
Do not enter multiple data types in a single column
OpenRefine (previously known as Google Refine) is a resource that assists in cleaning and transforming data to make it more consistent and analyzable.
Learn more with the Stanford Libraries Data Best Practices Guide
General list of resources for disciplinary metadata standards consisting of FAIR principles, writing README files, and file naming conventions: