Data Management – ISCN – International Soil Carbon Network

If you are embarking on a new project or proposal, the ISCN Database is well suited to be your solution for data management and long-term accessibility. This page has materials, free for your own adaptation, that can be used to describe how the ISCN Database fits into your data management plans.

General documentation

ISCN Database flyer: This short document has an overview of database structure and function. It may be used as a source of text, or for a general overview of the database before you begin writing your data management section.

Information for NSF Proposals

NSF Data management requirements consist of 5 components, which are described generally in the Grant Proposal Guide. There are also 5 (largely overlapping) requirements associated with RFPs issued by the BIO Directorate, which is likely the origin of most NSF opportunities pursued by ISCN members. These requirements are summarized below, and each is followed by information that describes how contributing your project data to the ISCN Database will meet the intent of the NSF data management requirements.

1. Types of data and science products developed by the project.
  The ISCN Database is a point-based, georeferenced, relational database for all types of soil information. Its fundamental capabilities are the storage of soil chemical and physical data related to carbon inventory, with associated metadata such as ecologic, geographic, analytical, experimental, historical, and contributor information. The most common types of data stored in the ISCN Database are geo-referenced research sites, along with the soil profiles sampled within these sites and the incremental soil layers that comprise the individual soil profiles. Through this hierarchical design, data contributors are able to store comprehensive soil characterization datasets of any size, with associated documentation of methods and uploading of relevant papers and other files also supported. The ISCN Database is dynamic and flexible, and supports storage of and access to other types of data, including soil fractions, spectral datasets, and images. There may also be limited support for storage of physical samples (archived soils) in several member archive facilities; contact ISCN Support for information about these opportunities.
2. Standards to be used for data and metadata format and content.
  The ISCN Database shares standards for data and metadata format with two widely used schemes. Ecologic, geographic, experimental, historical, and contributor data are stored using the same variable names and conventions employed by the FLUXNET Synthesis Network, a sister science effort with which the ISCN Database shares server space and computational resources. Soil characterization data in the ISCN Database are described using the variable names employed by the USDA-NRCS, Soil Survey Lab in its NCSCD and NASIS databases. Basic analytical and methods nomenclature for soil datasets follows USDA-NRCS conventions, but there is considerable flexibility built-in to describe methods that lack widely implemented standards (e.g., soil fractionation schemes). For these types of data that lack widely accepted standards, the design of the ISCN Database encourages development of broader standards by categorizing information at coarse levels (e.g., denoting fractionations as ‘density,’ ‘chemical,’ etc.) while accommodating detailed, user-specified information in fine-level methods description. Input and output of data to/from the ISCN Database is supported for comma-separated value, Microsoft Excel (at present), Access (upon request), and NetCDF (in development) formats.
3. Physical and/or cyber resources and facilities, dissemination methods used to store the data and make them accessible.
  The ISCN Database is housed on infrastructure developed and maintained by DOE-Lawrence Berkeley National Lab, the UC Berkeley Water Center, University of Virginia, and Microsoft Research. The data server infrastructure consists of SQL Server databases and a Sharepoint web portal, designed around maximizing data usability. Data received from contributors is normalized to a standardized template and processed to provide calculation of variables of interest to users (e.g., carbon to 1 meter). A data curation process is in place to ensure that provenance of contributed datasets is clearly tracked over time. The databases, portal, and servers are backed up regularly, and access is provided via download or online viewing of database contents on the International Soil Carbon Network website (soilcarb.net). Users may access pre-packaged Excel reports containing various sub-sets of the database, or create their own Excel reports by specifying variable sets and geographic constraints for the data of interest.
4. Policies for data sharing and access, including provisions for privacy, security, intellectual property, and production of derivatives.
  Access to and use of data from the ISCN Database is controlled by a data policythat ensures reasonable data quality, fair use, and appropriate citation of data contributors. Database access is available on a secured portion of the ISCN website open only to users who have obtained a password-protected account; all visitors to and users of the ISCN website and its resources are logged. The data policy delineates three phases through which each data submission moves from initial contribution (Phase I), to data validation (Phase II) to release for consumption by the ISCN membership (Phase III). These three phases maintain the privacy of newly submitted datasets and provide for a quality assessment period during which the data are unavailable to all but the contributor and authorized associates. Upon release to the database (Phase III), fair use provisions explicitly describe terms for citation of contributors’ published data, and in relevant cases, acknowledgment of unpublished data. These terms ensure that proper credit is given to the individuals, agencies, or institutions that own the data and associated intellectual property. The production of derivatives from the ISCN Database (e.g., datasets provided to other networks) is explicitly discussed in the Data Policy, with the intent of ensuring proper attribution of data sources. In brief, users wishing to redistribute contents of the Database are requested to include data contributor information as part of any transferred dataset, and to cite and acknowledge all data contributors and the ISCN in any products prepared from the transferred data.
5. Rights and obligations of all parties with regard to responsibilities for management and retention of research data.
  Per the ISCN Data Policy, data contributors have two primary obligations: 1)
  identify any restrictions on publishing data location information; 2) declare the phase of the data at the time of submission and upon any data status changes (e.g., publication and release to Phase III). Furthermore, it is highly recommended that the data contributor or authorized representative maintain contact with ISCN Database managers from the time of data submission until after the data are transitioned to Phase III and released to the Database, in order to perform QA/QC and answer questions from data users. The obligations of the ISCN Database team also are twofold: 1) protect location information for sensitive sites; 2) release data to the ISCN membership via the Database according to contributor-approved data phase transitions. Database managers shall also make reasonable efforts to accomplish management, facilitate retention and use of contributed datasets by performing normal database maintenance and curation activities, assisting database users in basic navigation and interpretation of data products, and providing alternative data access mechanisms as possible. Because the ISCN Database is a mechanism for maintaining long-term access to datasets that have been adapted to its format, researchers
  wishing to: a) preserve their data in the form in which they were originally collected or b) permanently archive redundant copies are urged to consider alternate arrangements such as NSF DataNets.