Data Management Basics

NSF Data Management Plan Requirements: Proposals submitted on or after January 18, 2011, must include a supplementary document of no more than two pages labeled “Data Management Plan”.

Collaboration, accessibility and transparency are necessary for data management in modern science., NSF, NIH and other federal agencies mandate data plans with grant proposals.
Start with this list:
---- Make a plan to store your data
---- Find the right medium to store your data
---- Develop a system to organize your data
---- Make sure that your data has easy access
---- Make sure that your data is safe and secure.

Data Management Tools & Services

DMPTool: Data Management Planning Tool: Step-by-step instructions for creating data management plans that meet the requirements of specific funding agencies

DASH: Data Sharing Made Easy:  A self-service tool for researchers to use in publishing their datasets.

EZID: create a persistent DOI or ARK. - long term, unique identifiers.
Berkeley researchers can request a free account contact:

ORCID = Open Researcher and Contributor ID.  free registration of nonproprietary name code for contributor / author identification.
Register: for an ORCID ID.

Data Management Guidelines:

Data management (UCB): information on data requirements, management and sharing [ScienceLibraries@UC Berkeley]

Data Management General Guidance: Guidelines for creating, organizing, managing, and sharing your data [CDL]

NSF data management plan requirements: An outline from the NSF Directorate for Biological Sciences

Preparing data management plans for NSF grant applications: UC Berkeley tutorials and guidelines for NSF data management plans [Science Libraries@UC Berkeley]

UC3, University of California Curation Center - UC/California Digital Library

UC Berkeley Data Services Management, IST.  See comparative table.

Data Repositories: Integrative Biology & the Environment

Atmospheric Radiation Monitoring (ARM) Data Archive preserves data collected through the operations and scientific field experiments of the ARM Climate Research Facility.

Carbon Dioxide Information Analysis Center (CDIAC) the primary climate-change data and information analysis center of the U.S. Department of Energy (DOE). CDIAC's data holdings include records of the concentrations of carbon dioxide and other radiatively active gases in the atmosphere; the role of the terrestrial biosphere and the oceans in the biogeochemical cycles of greenhouse gases; emissions of carbon dioxide to the atmosphere; long-term climate trends; the effects of elevated carbon dioxide on vegetation; and the vulnerability of coastal areas to rising sea level.

Chesapeake Bay Environmental Observatory (CBEO) is available for registering datasets of different types, and searching for CBEO-registered data or for data registered in all projects within the GEON family of federated portals.

Computational and Information Systems Laboratory (CISL) Research Data Archive contains meteorological and oceanographic observations, operational and reanalysis model outputs, and remote sensing datasets to support atmospheric and geosciences research, along with ancillary datasets, such as topography/bathymetry, vegetation, and land use.

Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad is governed by a consortium of journals a that collaboratively promote data archiving and ensure the sustainability of the repository. is a geographic information system (GIS) portal, also known as the Geospatial One-Stop (GOS), that contains thousands of geospatial metadata records and links to live maps, features, catalog services, downloadable data sets, images, clearinghouses, map files, and more.

Global Biodiversity Information Facility (GBIF) allows researchers to publish and discover biodiversity data—taxon primary occurrence data, taxonomic checklists and resource metadata—as part of a distributed global network.

Ecological Society of America's Ecological Archives publishes materials supplemental to articles that appear in the ESA print journals (Ecology, Ecological Applications, and Ecological Monographs), and peer-reviewed Data Papers.

Knowledge Network for Biocomplexity (KNB) is a national network designed to facilitate the discovery and analysis of distributed ecological and environmental datasets.

National Ecological Observatory Network (NEON) is a continental-scale research platform for discovering and understanding the impacts of climate change, land-use change, and invasive species on ecology. It will consist of distributed sensor networks and experiments to record and archive ecological data for at least 30 years using standardized protocols and an open data policy.

Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) seeks to assemble, distribute, and archive data for research, education, and policy formulation in terrestrial biogeochemistry and the ecosystem dynamics of global environmental change. The ORNL DAAC archives data generated by NASA's Terrestrial Ecology Program.

Ocean Biogeographic Information System (OBIS) was established by the Census of Marine Life (CoML). It is an evolving strategic alliance of people and organizations sharing a vision to make marine biogeographic data, from all over the world, freely available over the World Wide Web.

Paleobiology Database provides global, collection-based occurrence and taxonomic data for marine and terrestrial animals and plants of any geological age, as well as web-based software for statistical analysis of the data.

PANGAEA® (Publishing Network for Geoscientific and Environmental Data) is an Open Access library aimed at archiving, publishing and distributing data from earth system research. The system guarantees reference and long-term availability of its content through data set citations using international standard formats and persistent identifiers (DOI).

Smithsonian Tropical Research Institute's (STRI) Center for Tropical Forest Science (CTFS) comprises a global network of large-scale and long-term studies that together monitor more than three million individual tropical trees, representing more than 6,000 tree species — nearly 10% of the world’s entire tropical tree flora.

TreeBASE is a relational database of phylogenetic information hosted by the Yale Peabody Museum. TreeBASE stores phylogenetic trees and the data matrices used to generate them from published research papers. TreeBASE accepts all types of phylogenetic data (e.g., trees of species, trees of populations, trees of genes) representing all biotic taxa.

USA National Phenology Network (USA-NPN) is developing a list of registered phenology data sets to make available to the research community and the general public.

VegBank is a vegetation plot database of the Ecological Society of America's Panel on Vegetation Classification. Vegetation records, community types and plant taxa may be submitted to VegBank and may be subsequently searched, viewed, annotated, revised, interpreted, downloaded, and cited.

VertNet is a global museum database of vertebrate natural history collections. Over 84.3 million vertebrate records are shared online through four distributed database networks organized by biological discipline: MaNIS (mammalogy), HerpNET (herpetology), ORNIS (ornithology) and FishNet (ichthyology).

World Data Center for Human Interactions in the Environment archives and distributes global data sets related to population, sustainability, poverty, health, hazards, conservation, governance and climate. It is hosted by Columbia University Earth Institute's Center for International Earth Science Information Network (CIESIN).

Shared Content Management

Research Data Management, UC Berkeley support for data storage, and more

Technology@Berkeley. UC Berkeley list of services

Research IT at Berkeley: cloud computing, large projects, assistance with hardware.

CITRIS - established to address the most pressing social and environmental issues facing California. To meet this goal, we focus our research on four core initiatives: Energy, Health Care, Intelligent Infrastructure, and Data and Democracy.

Data Services

Data repositories and more

Integrative Biology and the Environment - Data Repositories - selected of websites.

To find additional data repositories:
Open Access Directory of Data Repositories [Simmons University]
DataBib [Institute of Museum and Library Services]
Repositories [DataCite]

USEFUL sites:
Data Conservancy Organization - (NSF) collect, organize, & preserve data.
Many Eyes: data visualization tools from IBM.
Data Management & Publishing Checklist, MIT
NSF Division of Institution & Award Support

US Naval Observatory (USNO)  - Oceanography Portal: includes a range of astronomical data and products, & serves as the official source of time for the U.S. Department of Defense & a standard of time for the entire United States.

Data Management Services

Use the following links for UC Berkeley, University of California, links to federal sites, copyright and more.
University of California Resources
1. UC3, University of California Curation Center - UC/California Digital Library

2. EZID: create a persistant DOI or ARK. ~long term, unique identifiers. (UC/CDL).
    Berkeley researchers can request a free account by contacting

3. DMPTool: Manage Your Data - develop a Data Management Plan, see Funder Requirements (CDL)

4. Scientific Data Services -- ScienceLibraries@UC Berkeley - guide to UC Berkeley services.

5. UC Berkeley Data Services Management, IST.  See comparative table.

