6 dataset-edna
6.1 Introduction
Rationale:
DNA derived data are increasingly being used to document taxon occurrences. To ensure these data are useful to the broadest possible community, GBIF published a guide entitled “Publishing DNA-derived data through biodiversity data platforms.” This guide is supported by the DNA derived data extension for Darwin Core, which incorporates MIxS terms into the Darwin Core standard.
This use case draws on both the guide and the extension to illustrate how to incorporate a DNA derived data extension file into a Darwin Core archive.
For further information on this use case and the DNA Derived data extension in general, see the recording of the OBIS Webinar on Genetic Data.
Project abstract:
The example data employed in this use case are from marine filtered seawater samples collected at a nearshore station in Monterey Bay, California, USA. They were collected by CTD rosette and filtered by a peristaltic pump system. Subsequently, they underwent metabarcoding for the 18S V9 region. The resulting ASVs, their assigned taxonomy, and the metadata associated with their collection are the input data for the conversion scripts presented here.
A selection of samples from this collection were included in the publication “Environmental DNA reveals seasonal shifts and potential interactions in a marine community” which was published with open access in Nature Communications in 2020.
Contacts: - Francisco Chavez - Principle Investigator (chfr@mbari.org) - Kathleen Pitz - Research Associate (kpitz@mbari.org) - Diana LaScala-Gruenewald - Point of Contact (dianalg@mbari.org)
6.2 Published data
6.3 Repo structure
.
+-- README.md :Description of this repository
+-- LICENSE :Repository license
+-- .gitignore :Files and directories to be ignored by git
+-- environment.yml :Conda environment configuration file for Binder
|
+-- raw
| +-- asv_table.csv :Source data containing ASV sequences and number of reads
| +-- taxa_table.csv :Source data containing taxon matches for each ASV
| +-- metadata_table.csv :Source data containing metadata about samples (e.g. collection information)
|
+-- src
| +-- conversion_code.py :Darwin Core mapping script
| +-- conversion_code.ipynb :Darwin Core mapping Jupyter Notebook
| +-- WoRMS.py :Functions for querying the World Register of Marine Species
|
+-- processed
| +-- occurrence.csv :Occurrence file, generated by conversion_code
| +-- dna_extension.csv :DNA Derived Data Extension file, generated by conversion_code