{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"import subprocess\n",
"import sys\n",
"\n",
"COLAB = \"google.colab\" in sys.modules\n",
"\n",
"\n",
"def _install(package):\n",
" if COLAB:\n",
" ans = input(f\"Install { package }? [y/n]:\")\n",
" if ans.lower() in [\"y\", \"yes\"]:\n",
" subprocess.check_call(\n",
" [sys.executable, \"-m\", \"pip\", \"install\", \"--quiet\", package]\n",
" )\n",
" print(f\"{ package } installed!\")\n",
"\n",
"\n",
"def _colab_install_missing_deps(deps):\n",
" import importlib\n",
"\n",
" for dep in deps:\n",
" if importlib.util.find_spec(dep) is None:\n",
" if dep == \"iris\":\n",
" dep = \"scitools-iris\"\n",
" _install(dep)\n",
"\n",
"\n",
"deps = [\"pyworms\"]\n",
"_colab_install_missing_deps(deps)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "heDzRpj3W4Qd"
},
"source": [
"# Aligning Data to Darwin Core\n",
"\n",
"## Creating event core with an occurrence and extended measurement or fact extension using Python\n",
"\n",
"Created: 2020-12-08\n",
"\n",
"*Caution:* This notebook was created for the [IOOS DMAC Code Sprint](https://glos.org/2019-code-sprint/) Biological Data Session.\n",
"The data in this notebook were created specifically as an example and meant solely to be\n",
"illustrative of the process for aligning data to the biological data standard - [Darwin Core](https://dwc.tdwg.org/).\n",
"These data should not be considered actual occurrences of species and any measurements\n",
"are also contrived. This notebook is meant to provide a step by step process for taking\n",
"original data and aligning it to Darwin Core. It has been adapted from the R markdown notebook created by Abby Benson [IOOS_DMAC_DataToDWC_Notebook_event.md](https://github.com/ioos/bio_data_guide/blob/main/datasets/example_script_with_fake_data/IOOS_DMAC_DataToDwC_Notebook_event.md).\n",
"\n",
"First let's bring in the appropriate libraries to work with the tabular data files and generate the appropriate content for the Darwin Core requirements."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "C9vVfp4iW4Qg",
"outputId": "136654f6-23cc-48dc-825f-299355567989"
},
"outputs": [],
"source": [
"import csv\n",
"import pprint\n",
"import uuid\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import pyworms"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "U3K5nG5OW4Qh"
},
"source": [
"Now we need to read in the raw data file using [pandas.read_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html). Here we display the first ten rows of data to give the user an idea of what observations are contained in the [raw file](https://github.com/ioos/notebooks_demos/blob/master/notebooks/data/dwc/raw/MadeUpDataForBiologicalDataTraining.csv)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "TOJtkSgJW4Qi"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" lat | \n",
" lon | \n",
" region | \n",
" station | \n",
" transect | \n",
" scientific name | \n",
" percent cover | \n",
" depth | \n",
" bottom type | \n",
" rugosity | \n",
" temperature | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 7/16/2004 | \n",
" 18.29788 | \n",
" -64.79451 | \n",
" St. John | \n",
" 250 | \n",
" 1 | \n",
" Acropora cervicornis | \n",
" 0 | \n",
" 25 | \n",
" shallow reef flat | \n",
" 0.295833 | \n",
" 25.2 | \n",
"
\n",
" \n",
" 1 | \n",
" 7/16/2004 | \n",
" 18.29788 | \n",
" -64.79451 | \n",
" St. John | \n",
" 250 | \n",
" 1 | \n",
" Madracis auretenra | \n",
" 5 | \n",
" 25 | \n",
" shallow reef flat | \n",
" 0.295833 | \n",
" 25.2 | \n",
"
\n",
" \n",
" 2 | \n",
" 7/16/2004 | \n",
" 18.29788 | \n",
" -64.79451 | \n",
" St. John | \n",
" 250 | \n",
" 1 | \n",
" Mussa angulosa | \n",
" 15 | \n",
" 25 | \n",
" shallow reef flat | \n",
" 0.295833 | \n",
" 25.2 | \n",
"
\n",
" \n",
" 3 | \n",
" 7/16/2004 | \n",
" 18.29788 | \n",
" -64.79451 | \n",
" St. John | \n",
" 250 | \n",
" 1 | \n",
" Siderastrea radians | \n",
" 0 | \n",
" 25 | \n",
" shallow reef flat | \n",
" 0.295833 | \n",
" 25.2 | \n",
"
\n",
" \n",
" 4 | \n",
" 7/16/2004 | \n",
" 18.29788 | \n",
" -64.79451 | \n",
" St. John | \n",
" 250 | \n",
" 2 | \n",
" Acropora cervicornis | \n",
" 0 | \n",
" 35 | \n",
" complex back reef | \n",
" 0.364583 | \n",
" 24.8 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date lat lon region station transect \\\n",
"0 7/16/2004 18.29788 -64.79451 St. John 250 1 \n",
"1 7/16/2004 18.29788 -64.79451 St. John 250 1 \n",
"2 7/16/2004 18.29788 -64.79451 St. John 250 1 \n",
"3 7/16/2004 18.29788 -64.79451 St. John 250 1 \n",
"4 7/16/2004 18.29788 -64.79451 St. John 250 2 \n",
"\n",
" scientific name percent cover depth bottom type rugosity \\\n",
"0 Acropora cervicornis 0 25 shallow reef flat 0.295833 \n",
"1 Madracis auretenra 5 25 shallow reef flat 0.295833 \n",
"2 Mussa angulosa 15 25 shallow reef flat 0.295833 \n",
"3 Siderastrea radians 0 25 shallow reef flat 0.295833 \n",
"4 Acropora cervicornis 0 35 complex back reef 0.364583 \n",
"\n",
" temperature \n",
"0 25.2 \n",
"1 25.2 \n",
"2 25.2 \n",
"3 25.2 \n",
"4 24.8 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url = (\n",
" \"https://raw.githubusercontent.com/ioos/ioos_code_lab/main/\"\n",
" \"jupyterbook/content/code_gallery/data/\"\n",
")\n",
"file = \"MadeUpDataForBiologicalDataTraining.csv\"\n",
"df = pd.read_csv(url + file, header=[0])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TDrmDQIUW4Qi"
},
"source": [
"First we need to to decide if we will build an **occurrence only** version of the data or an **event core** with an **occurrence** and **extended measurement or facts extension (eMoF)** version of the data.\n",
"\n",
"- [**Occurrence only**](https://dwc.tdwg.org/terms/#occurrence):\n",
"\n",
" - Easier to create.\n",
" - It's only one file to produce.\n",
" - However, several pieces of information will be left out if we choose that option.\n",
"\n",
"- **[sampling event](https://dwc.tdwg.org/terms/#event) with [occurrence](https://dwc.tdwg.org/terms/#occurrence) and [extended measurement or fact (eMoF)](https://rs.gbif.org/extensions.html#http://rs.iobis.org/obis/terms/ExtendedMeasurementOrFact)**:\n",
"\n",
" - More difficult to create.\n",
" - composed of several files.\n",
" - Can capture all of the data in the file creating a lossless version.\n",
"\n",
"Here we decide to use the second option, **extended measurement or fact (eMoF)**, to include as much information as we can.\n",
"\n",
"First let's create the `eventID` and `occurrenceID` in the original file so that information can be reused for all necessary files down the line."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "GikiB-SdW4Qj"
},
"outputs": [],
"source": [
"df[\"eventID\"] = df[[\"region\", \"station\", \"transect\"]].apply(\n",
" lambda x: \"_\".join(x.astype(str)), axis=1\n",
")\n",
"df[\"occurrenceID\"] = uuid.uuid4()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jeyo2tT3W4Qj"
},
"source": [
"We will need to create *three* separate files to comply with the **sampling event** format.\n",
"We'll start with the **event file** but we only need to include the columns that are relevant\n",
"to the event file."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HRAnbhZgW4Qk"
},
"source": [
"## Event file"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "12oScm9UW4Qk"
},
"source": [
"More information on the **event** category in Darwin Core can be found at [https://dwc.tdwg.org/terms/#event](https://dwc.tdwg.org/terms/#event).\n",
"\n",
"Let's first make a copy of the DataFrame we pulled in. Only using the data fields of interest for the **event file**)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "LBPN4sBVW4Qk"
},
"outputs": [],
"source": [
"event = df[\n",
" [\n",
" \"date\",\n",
" \"lat\",\n",
" \"lon\",\n",
" \"region\",\n",
" \"station\",\n",
" \"transect\",\n",
" \"depth\",\n",
" \"bottom type\",\n",
" \"eventID\",\n",
" ]\n",
"].copy()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZAuKnZjTW4Ql"
},
"source": [
"Next we need to rename any columns of data to match directly to Darwin Core."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "Zc4p-JU9W4Qm"
},
"outputs": [],
"source": [
"event[\"decimalLatitude\"] = event[\"lat\"]\n",
"event[\"decimalLongitude\"] = event[\"lon\"]\n",
"event[\"minimumDepthInMeters\"] = event[\"depth\"]\n",
"event[\"maximumDepthInMeters\"] = event[\"depth\"]\n",
"event[\"habitat\"] = event[\"bottom type\"]\n",
"event[\"island\"] = event[\"region\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EAW0foQBW4Qm"
},
"source": [
"We need to appropriately read in the date field, so we can export it to [ISO format](https://en.wikipedia.org/wiki/ISO_8601). Also add any missing, required, fields."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "MmXkxHU9W4Qm"
},
"outputs": [],
"source": [
"event[\"eventDate\"] = pd.to_datetime(event[\"date\"], format=\"%m/%d/%Y\")\n",
"event[\"basisOfRecord\"] = \"HumanObservation\"\n",
"event[\"geodeticDatum\"] = \"EPSG:4326 WGS84\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QIaTwZihW4Qm"
},
"source": [
"Then we'll remove any fields that we no longer need to clean things up a bit."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "hAl-DpwbW4Qn"
},
"outputs": [],
"source": [
"event.drop(\n",
" columns=[\n",
" \"date\",\n",
" \"lat\",\n",
" \"lon\",\n",
" \"region\",\n",
" \"station\",\n",
" \"transect\",\n",
" \"depth\",\n",
" \"bottom type\",\n",
" ],\n",
" inplace=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-Xo3NB95W4Qn"
},
"source": [
"We have too many repeating rows of information. We can pare this down using eventID which\n",
"is a unique identifier for each sampling event in the data."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "QQShHV_jW4Qn"
},
"outputs": [],
"source": [
"event.drop_duplicates(subset=\"eventID\", inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D5nIIqFjW4Qn"
},
"source": [
"Finally, we write out the [event file](https://github.com/ioos/notebooks_demos/blob/master/notebooks/data/dwc/processed/MadeUpData_event.csv), specifying the ISO date format. We've printed ten random rows of the DataFrame to give an example of what the resultant file will look like."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "NRMqDSKaW4Qn",
"outputId": "4c389f7f-f76e-4196-aae1-359c13681514"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" eventID | \n",
" decimalLatitude | \n",
" decimalLongitude | \n",
" minimumDepthInMeters | \n",
" maximumDepthInMeters | \n",
" habitat | \n",
" island | \n",
" eventDate | \n",
" basisOfRecord | \n",
" geodeticDatum | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" St. John_250_2 | \n",
" 18.29788 | \n",
" -64.79451 | \n",
" 35 | \n",
" 35 | \n",
" complex back reef | \n",
" St. John | \n",
" 2004-07-16 | \n",
" HumanObservation | \n",
" EPSG:4326 WGS84 | \n",
"
\n",
" \n",
" 8 | \n",
" St. John_250_3 | \n",
" 18.29788 | \n",
" -64.79451 | \n",
" 85 | \n",
" 85 | \n",
" deep reef | \n",
" St. John | \n",
" 2004-07-16 | \n",
" HumanObservation | \n",
" EPSG:4326 WGS84 | \n",
"
\n",
" \n",
" 12 | \n",
" St. John_356_1 | \n",
" 18.27609 | \n",
" -64.75740 | \n",
" 28 | \n",
" 28 | \n",
" complex back reef | \n",
" St. John | \n",
" 2004-07-17 | \n",
" HumanObservation | \n",
" EPSG:4326 WGS84 | \n",
"
\n",
" \n",
" 16 | \n",
" St. John_356_2 | \n",
" 18.27609 | \n",
" -64.75740 | \n",
" 16 | \n",
" 16 | \n",
" shallow reef flat | \n",
" St. John | \n",
" 2004-07-17 | \n",
" HumanObservation | \n",
" EPSG:4326 WGS84 | \n",
"
\n",
" \n",
" 20 | \n",
" St. John_356_3 | \n",
" 18.27609 | \n",
" -64.75740 | \n",
" 90 | \n",
" 90 | \n",
" deep reef | \n",
" St. John | \n",
" 2004-07-17 | \n",
" HumanObservation | \n",
" EPSG:4326 WGS84 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" eventID decimalLatitude decimalLongitude minimumDepthInMeters \\\n",
"4 St. John_250_2 18.29788 -64.79451 35 \n",
"8 St. John_250_3 18.29788 -64.79451 85 \n",
"12 St. John_356_1 18.27609 -64.75740 28 \n",
"16 St. John_356_2 18.27609 -64.75740 16 \n",
"20 St. John_356_3 18.27609 -64.75740 90 \n",
"\n",
" maximumDepthInMeters habitat island eventDate \\\n",
"4 35 complex back reef St. John 2004-07-16 \n",
"8 85 deep reef St. John 2004-07-16 \n",
"12 28 complex back reef St. John 2004-07-17 \n",
"16 16 shallow reef flat St. John 2004-07-17 \n",
"20 90 deep reef St. John 2004-07-17 \n",
"\n",
" basisOfRecord geodeticDatum \n",
"4 HumanObservation EPSG:4326 WGS84 \n",
"8 HumanObservation EPSG:4326 WGS84 \n",
"12 HumanObservation EPSG:4326 WGS84 \n",
"16 HumanObservation EPSG:4326 WGS84 \n",
"20 HumanObservation EPSG:4326 WGS84 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url = \"https://github.com/ioos/notebooks_demos/raw/master/notebooks/data/dwc/processed/\"\n",
"file = \"MadeUpData_event.csv\"\n",
"\n",
"event.to_csv(url + file, header=True, index=False, date_format=\"%Y-%m-%d\")\n",
"\n",
"event.sample(n=5).sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-aHsrOUhW4Qo"
},
"source": [
"## Occurrence file\n",
"\n",
"More information on the **occurrence** category in Darwin Core can be found at [https://dwc.tdwg.org/terms/#occurrence](https://dwc.tdwg.org/terms/#occurrence).\n",
"\n",
"For creating the **occurrence** file, we start by creating the DataFrame and renaming the fields that align directly with Darwin Core. Then, we'll add the required information that is missing."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "wKML01VNW4Qo"
},
"outputs": [],
"source": [
"occurrence = df[[\"scientific name\", \"eventID\", \"occurrenceID\", \"percent cover\"]].copy()\n",
"occurrence[\"scientificName\"] = occurrence[\"scientific name\"]\n",
"occurrence[\"occurrenceStatus\"] = np.where(\n",
" occurrence[\"percent cover\"] == 0, \"absent\", \"present\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CzjlzVcYW4Qo"
},
"source": [
"### Taxonomic Name Matching\n",
"\n",
"A requirement for [OBIS](https://obis.org/) is that all scientific names match to the [World Register of\n",
"Marine Species (WoRMS)](https://www.marinespecies.org/) and a `scientificNameID` is included. A `scientificNameID` looks\n",
"like this `urn:lsid:marinespecies.org:taxname:275730` with the last digits after\n",
"the colon being the **WoRMS aphia ID**. We'll need to go out to WoRMS to grab this\n",
"information. So, we create a lookup table of the unique scientific names found in the **occurrence** data we created above."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "Fgw5O0YeW4Qo"
},
"outputs": [],
"source": [
"lut_worms = pd.DataFrame(\n",
" columns=[\"scientificName\"], data=occurrence[\"scientificName\"].unique()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "trYhL8HTW4Qp"
},
"source": [
"Next, we add the known columns that we can grab information from [WoRMS](https://www.marinespecies.org/) including the required `scientificNameID` and populate the look up table with empty values for those fields (to initialize the DataFrame for population later)."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"id": "e7n1HW7tW4Qp"
},
"outputs": [],
"source": [
"headers = [\n",
" \"acceptedname\",\n",
" \"acceptedID\",\n",
" \"scientificNameID\",\n",
" \"kingdom\",\n",
" \"phylum\",\n",
" \"class\",\n",
" \"order\",\n",
" \"family\",\n",
" \"genus\",\n",
" \"scientificNameAuthorship\",\n",
" \"taxonRank\",\n",
"]\n",
"\n",
"for head in headers:\n",
" lut_worms[head] = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dzK3YKHdW4Qp"
},
"source": [
"Next, we perform a taxonomic lookup using the library [pyworms](https://pyworms.readthedocs.io/en/latest/). Using the function `pyworms.aphiaRecordsByMatchNames()` to collect the information and populate the look up table.\n",
"\n",
"Here we print the scientific name of the species we are looking up and the matching response from WoRMS with the detailed species information."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6T2aVKRmW4Qq",
"outputId": "45838a8e-d2c3-4558-b613-9f09dc7563d9"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"**Searching for scientific name = Acropora cervicornis**\n",
"{'AphiaID': 206989,\n",
" 'authority': '(Lamarck, 1816)',\n",
" 'citation': 'Hoeksema, B. W.; Cairns, S. (2021). World List of Scleractinia. '\n",
" 'Acropora cervicornis (Lamarck, 1816). Accessed through: World '\n",
" 'Register of Marine Species at: '\n",
" 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=206989 on '\n",
" '2021-08-30',\n",
" 'class': 'Anthozoa',\n",
" 'family': 'Acroporidae',\n",
" 'genus': 'Acropora',\n",
" 'isBrackish': 0,\n",
" 'isExtinct': None,\n",
" 'isFreshwater': 0,\n",
" 'isMarine': 1,\n",
" 'isTerrestrial': 0,\n",
" 'kingdom': 'Animalia',\n",
" 'lsid': 'urn:lsid:marinespecies.org:taxname:206989',\n",
" 'match_type': 'exact',\n",
" 'modified': '2018-08-27T16:36:11.490Z',\n",
" 'order': 'Scleractinia',\n",
" 'parentNameUsageID': 205469,\n",
" 'phylum': 'Cnidaria',\n",
" 'rank': 'Species',\n",
" 'scientificname': 'Acropora cervicornis',\n",
" 'status': 'accepted',\n",
" 'taxonRankID': 220,\n",
" 'unacceptreason': None,\n",
" 'url': 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=206989',\n",
" 'valid_AphiaID': 206989,\n",
" 'valid_authority': '(Lamarck, 1816)',\n",
" 'valid_name': 'Acropora cervicornis'}\n",
"\n",
"**Searching for scientific name = Madracis auretenra**\n",
"{'AphiaID': 430664,\n",
" 'authority': 'Locke, Weil & Coates, 2007',\n",
" 'citation': 'Hoeksema, B. W.; Cairns, S. (2021). World List of Scleractinia. '\n",
" 'Madracis auretenra Locke, Weil & Coates, 2007. Accessed through: '\n",
" 'World Register of Marine Species at: '\n",
" 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=430664 on '\n",
" '2021-08-30',\n",
" 'class': 'Anthozoa',\n",
" 'family': 'Pocilloporidae',\n",
" 'genus': 'Madracis',\n",
" 'isBrackish': 0,\n",
" 'isExtinct': None,\n",
" 'isFreshwater': 0,\n",
" 'isMarine': 1,\n",
" 'isTerrestrial': 0,\n",
" 'kingdom': 'Animalia',\n",
" 'lsid': 'urn:lsid:marinespecies.org:taxname:430664',\n",
" 'match_type': 'exact',\n",
" 'modified': '2020-04-10T07:30:40.497Z',\n",
" 'order': 'Scleractinia',\n",
" 'parentNameUsageID': 135125,\n",
" 'phylum': 'Cnidaria',\n",
" 'rank': 'Species',\n",
" 'scientificname': 'Madracis auretenra',\n",
" 'status': 'accepted',\n",
" 'taxonRankID': 220,\n",
" 'unacceptreason': None,\n",
" 'url': 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=430664',\n",
" 'valid_AphiaID': 430664,\n",
" 'valid_authority': 'Locke, Weil & Coates, 2007',\n",
" 'valid_name': 'Madracis auretenra'}\n",
"\n",
"**Searching for scientific name = Mussa angulosa**\n",
"{'AphiaID': 216135,\n",
" 'authority': '(Pallas, 1766)',\n",
" 'citation': 'Hoeksema, B. W.; Cairns, S. (2021). World List of Scleractinia. '\n",
" 'Mussa angulosa (Pallas, 1766). Accessed through: World Register '\n",
" 'of Marine Species at: '\n",
" 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=216135 on '\n",
" '2021-08-30',\n",
" 'class': 'Anthozoa',\n",
" 'family': 'Faviidae',\n",
" 'genus': 'Mussa',\n",
" 'isBrackish': 0,\n",
" 'isExtinct': 0,\n",
" 'isFreshwater': 0,\n",
" 'isMarine': 1,\n",
" 'isTerrestrial': 0,\n",
" 'kingdom': 'Animalia',\n",
" 'lsid': 'urn:lsid:marinespecies.org:taxname:216135',\n",
" 'match_type': 'exact',\n",
" 'modified': '2020-06-28T17:27:59.150Z',\n",
" 'order': 'Scleractinia',\n",
" 'parentNameUsageID': 206306,\n",
" 'phylum': 'Cnidaria',\n",
" 'rank': 'Species',\n",
" 'scientificname': 'Mussa angulosa',\n",
" 'status': 'accepted',\n",
" 'taxonRankID': 220,\n",
" 'unacceptreason': None,\n",
" 'url': 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=216135',\n",
" 'valid_AphiaID': 216135,\n",
" 'valid_authority': '(Pallas, 1766)',\n",
" 'valid_name': 'Mussa angulosa'}\n",
"\n",
"**Searching for scientific name = Siderastrea radians**\n",
"{'AphiaID': 207517,\n",
" 'authority': '(Pallas, 1766)',\n",
" 'citation': 'Hoeksema, B. W.; Cairns, S. (2021). World List of Scleractinia. '\n",
" 'Siderastrea radians (Pallas, 1766). Accessed through: World '\n",
" 'Register of Marine Species at: '\n",
" 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=207517 on '\n",
" '2021-08-30',\n",
" 'class': 'Anthozoa',\n",
" 'family': 'Siderastreidae',\n",
" 'genus': 'Siderastrea',\n",
" 'isBrackish': 0,\n",
" 'isExtinct': None,\n",
" 'isFreshwater': 0,\n",
" 'isMarine': 1,\n",
" 'isTerrestrial': 0,\n",
" 'kingdom': 'Animalia',\n",
" 'lsid': 'urn:lsid:marinespecies.org:taxname:207517',\n",
" 'match_type': 'exact',\n",
" 'modified': '2014-06-02T10:15:47.813Z',\n",
" 'order': 'Scleractinia',\n",
" 'parentNameUsageID': 204291,\n",
" 'phylum': 'Cnidaria',\n",
" 'rank': 'Species',\n",
" 'scientificname': 'Siderastrea radians',\n",
" 'status': 'accepted',\n",
" 'taxonRankID': 220,\n",
" 'unacceptreason': None,\n",
" 'url': 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=207517',\n",
" 'valid_AphiaID': 207517,\n",
" 'valid_authority': '(Pallas, 1766)',\n",
" 'valid_name': 'Siderastrea radians'}\n"
]
}
],
"source": [
"for index, row in lut_worms.iterrows():\n",
" print(f\"\\n**Searching for scientific name = {row[\"scientificName\"]}**\")\n",
" resp = pyworms.aphiaRecordsByMatchNames(row[\"scientificName\"])[0][0]\n",
" pprint.pprint(resp)\n",
" lut_worms.loc[index, \"acceptedname\"] = resp[\"valid_name\"]\n",
" lut_worms.loc[index, \"acceptedID\"] = resp[\"valid_AphiaID\"]\n",
" lut_worms.loc[index, \"scientificNameID\"] = resp[\"lsid\"]\n",
" lut_worms.loc[index, \"kingdom\"] = resp[\"kingdom\"]\n",
" lut_worms.loc[index, \"phylum\"] = resp[\"phylum\"]\n",
" lut_worms.loc[index, \"class\"] = resp[\"class\"]\n",
" lut_worms.loc[index, \"order\"] = resp[\"order\"]\n",
" lut_worms.loc[index, \"family\"] = resp[\"family\"]\n",
" lut_worms.loc[index, \"genus\"] = resp[\"genus\"]\n",
" lut_worms.loc[index, \"scientificNameAuthorship\"] = resp[\"authority\"]\n",
" lut_worms.loc[index, \"taxonRank\"] = resp[\"rank\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lP1_R_2LW4Qq"
},
"source": [
"We then merge the lookup table of unique scientific names back into the **occurrence** data. Matching on the field `scientificName`. Then, we remove any unnecessary columns to clean up the DataFrame for writing."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "QPPii8qCW4Qq"
},
"outputs": [],
"source": [
"occurrence = pd.merge(occurrence, lut_worms, how=\"left\", on=\"scientificName\")\n",
"\n",
"occurrence.drop(columns=[\"scientific name\", \"percent cover\"], inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IsryY8nOW4Qq"
},
"source": [
"Finally, we write out the [occurrence file](https://github.com/ioos/notebooks_demos/blob/master/notebooks/data/dwc/processed/MadeUpData_Occurrence.csv). We've printed ten random rows of the DataFrame to give an example of what the resultant file will look like."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 293
},
"id": "CjeMwUYZW4Qq",
"outputId": "5728403a-479d-4bd9-fc76-a1e96abde34f"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" eventID | \n",
" occurrenceID | \n",
" scientificName | \n",
" occurrenceStatus | \n",
" acceptedname | \n",
" acceptedID | \n",
" scientificNameID | \n",
" kingdom | \n",
" phylum | \n",
" class | \n",
" order | \n",
" family | \n",
" genus | \n",
" scientificNameAuthorship | \n",
" taxonRank | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" St. John_250_2 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Acropora cervicornis | \n",
" absent | \n",
" Acropora cervicornis | \n",
" 206989 | \n",
" urn:lsid:marinespecies.org:taxname:206989 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Acroporidae | \n",
" Acropora | \n",
" (Lamarck, 1816) | \n",
" Species | \n",
"
\n",
" \n",
" 5 | \n",
" St. John_250_2 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Madracis auretenra | \n",
" present | \n",
" Madracis auretenra | \n",
" 430664 | \n",
" urn:lsid:marinespecies.org:taxname:430664 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Pocilloporidae | \n",
" Madracis | \n",
" Locke, Weil & Coates, 2007 | \n",
" Species | \n",
"
\n",
" \n",
" 7 | \n",
" St. John_250_2 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Siderastrea radians | \n",
" absent | \n",
" Siderastrea radians | \n",
" 207517 | \n",
" urn:lsid:marinespecies.org:taxname:207517 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Siderastreidae | \n",
" Siderastrea | \n",
" (Pallas, 1766) | \n",
" Species | \n",
"
\n",
" \n",
" 10 | \n",
" St. John_250_3 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Mussa angulosa | \n",
" present | \n",
" Mussa angulosa | \n",
" 216135 | \n",
" urn:lsid:marinespecies.org:taxname:216135 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Faviidae | \n",
" Mussa | \n",
" (Pallas, 1766) | \n",
" Species | \n",
"
\n",
" \n",
" 12 | \n",
" St. John_356_1 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Acropora cervicornis | \n",
" present | \n",
" Acropora cervicornis | \n",
" 206989 | \n",
" urn:lsid:marinespecies.org:taxname:206989 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Acroporidae | \n",
" Acropora | \n",
" (Lamarck, 1816) | \n",
" Species | \n",
"
\n",
" \n",
" 13 | \n",
" St. John_356_1 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Madracis auretenra | \n",
" present | \n",
" Madracis auretenra | \n",
" 430664 | \n",
" urn:lsid:marinespecies.org:taxname:430664 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Pocilloporidae | \n",
" Madracis | \n",
" Locke, Weil & Coates, 2007 | \n",
" Species | \n",
"
\n",
" \n",
" 19 | \n",
" St. John_356_2 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Siderastrea radians | \n",
" present | \n",
" Siderastrea radians | \n",
" 207517 | \n",
" urn:lsid:marinespecies.org:taxname:207517 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Siderastreidae | \n",
" Siderastrea | \n",
" (Pallas, 1766) | \n",
" Species | \n",
"
\n",
" \n",
" 21 | \n",
" St. John_356_3 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Madracis auretenra | \n",
" absent | \n",
" Madracis auretenra | \n",
" 430664 | \n",
" urn:lsid:marinespecies.org:taxname:430664 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Pocilloporidae | \n",
" Madracis | \n",
" Locke, Weil & Coates, 2007 | \n",
" Species | \n",
"
\n",
" \n",
" 22 | \n",
" St. John_356_3 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Mussa angulosa | \n",
" absent | \n",
" Mussa angulosa | \n",
" 216135 | \n",
" urn:lsid:marinespecies.org:taxname:216135 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Faviidae | \n",
" Mussa | \n",
" (Pallas, 1766) | \n",
" Species | \n",
"
\n",
" \n",
" 23 | \n",
" St. John_356_3 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Siderastrea radians | \n",
" present | \n",
" Siderastrea radians | \n",
" 207517 | \n",
" urn:lsid:marinespecies.org:taxname:207517 | \n",
" Animalia | \n",
" Cnidaria | \n",
" Anthozoa | \n",
" Scleractinia | \n",
" Siderastreidae | \n",
" Siderastrea | \n",
" (Pallas, 1766) | \n",
" Species | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" eventID occurrenceID \\\n",
"4 St. John_250_2 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"5 St. John_250_2 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"7 St. John_250_2 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"10 St. John_250_3 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"12 St. John_356_1 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"13 St. John_356_1 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"19 St. John_356_2 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"21 St. John_356_3 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"22 St. John_356_3 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"23 St. John_356_3 f470068c-998a-4e9b-b026-02bf02118de7 \n",
"\n",
" scientificName occurrenceStatus acceptedname acceptedID \\\n",
"4 Acropora cervicornis absent Acropora cervicornis 206989 \n",
"5 Madracis auretenra present Madracis auretenra 430664 \n",
"7 Siderastrea radians absent Siderastrea radians 207517 \n",
"10 Mussa angulosa present Mussa angulosa 216135 \n",
"12 Acropora cervicornis present Acropora cervicornis 206989 \n",
"13 Madracis auretenra present Madracis auretenra 430664 \n",
"19 Siderastrea radians present Siderastrea radians 207517 \n",
"21 Madracis auretenra absent Madracis auretenra 430664 \n",
"22 Mussa angulosa absent Mussa angulosa 216135 \n",
"23 Siderastrea radians present Siderastrea radians 207517 \n",
"\n",
" scientificNameID kingdom phylum class \\\n",
"4 urn:lsid:marinespecies.org:taxname:206989 Animalia Cnidaria Anthozoa \n",
"5 urn:lsid:marinespecies.org:taxname:430664 Animalia Cnidaria Anthozoa \n",
"7 urn:lsid:marinespecies.org:taxname:207517 Animalia Cnidaria Anthozoa \n",
"10 urn:lsid:marinespecies.org:taxname:216135 Animalia Cnidaria Anthozoa \n",
"12 urn:lsid:marinespecies.org:taxname:206989 Animalia Cnidaria Anthozoa \n",
"13 urn:lsid:marinespecies.org:taxname:430664 Animalia Cnidaria Anthozoa \n",
"19 urn:lsid:marinespecies.org:taxname:207517 Animalia Cnidaria Anthozoa \n",
"21 urn:lsid:marinespecies.org:taxname:430664 Animalia Cnidaria Anthozoa \n",
"22 urn:lsid:marinespecies.org:taxname:216135 Animalia Cnidaria Anthozoa \n",
"23 urn:lsid:marinespecies.org:taxname:207517 Animalia Cnidaria Anthozoa \n",
"\n",
" order family genus scientificNameAuthorship \\\n",
"4 Scleractinia Acroporidae Acropora (Lamarck, 1816) \n",
"5 Scleractinia Pocilloporidae Madracis Locke, Weil & Coates, 2007 \n",
"7 Scleractinia Siderastreidae Siderastrea (Pallas, 1766) \n",
"10 Scleractinia Faviidae Mussa (Pallas, 1766) \n",
"12 Scleractinia Acroporidae Acropora (Lamarck, 1816) \n",
"13 Scleractinia Pocilloporidae Madracis Locke, Weil & Coates, 2007 \n",
"19 Scleractinia Siderastreidae Siderastrea (Pallas, 1766) \n",
"21 Scleractinia Pocilloporidae Madracis Locke, Weil & Coates, 2007 \n",
"22 Scleractinia Faviidae Mussa (Pallas, 1766) \n",
"23 Scleractinia Siderastreidae Siderastrea (Pallas, 1766) \n",
"\n",
" taxonRank \n",
"4 Species \n",
"5 Species \n",
"7 Species \n",
"10 Species \n",
"12 Species \n",
"13 Species \n",
"19 Species \n",
"21 Species \n",
"22 Species \n",
"23 Species "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# sort the columns on scientificName\n",
"occurrence.sort_values(\"scientificName\", inplace=True)\n",
"\n",
"# reorganize column order to be consistent with R example:\n",
"columns = [\n",
" \"scientificName\",\n",
" \"eventID\",\n",
" \"occurrenceID\",\n",
" \"occurrenceStatus\",\n",
" \"acceptedname\",\n",
" \"acceptedID\",\n",
" \"scientificNameID\",\n",
" \"kingdom\",\n",
" \"phylum\",\n",
" \"class\",\n",
" \"order\",\n",
" \"family\",\n",
" \"genus\",\n",
" \"scientificNameAuthorship\",\n",
" \"taxonRank\",\n",
"]\n",
"\n",
"\n",
"url = \"https://github.com/ioos/notebooks_demos/raw/master/notebooks/data/dwc/processed/\"\n",
"file = \"MadeUpData_Occurrence.csv\"\n",
"\n",
"occurrence.to_csv(\n",
" url + file, header=True, index=False, quoting=csv.QUOTE_ALL, columns=columns\n",
")\n",
"\n",
"occurrence.sample(n=10).sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dUufvx_uW4Qr"
},
"source": [
"## Extended Measurement Or Fact (eMoF)\n",
"\n",
"The last file we need to create is the **extended measurement or fact (eMoF)** file. The measurement or fact includes measurements/facts about the event (temp, salinity, etc) as well as about the occurrence (percent cover, abundance, weight, length, etc). They are linked to the events using `eventID` and to the occurrences using `occurrenceID`. [Extended Measurements Or Facts](https://rs.gbif.org/extensions.html#http://rs.iobis.org/obis/terms/ExtendedMeasurementOrFact) are any other generic observations that are associated with resources that are described using Darwin Core (eg. water temperature observations). See the [DwC implementation guide](https://dwc.tdwg.org/rdf/#2-implementation-guide) for more information.\n",
"\n",
"For the various `TypeID` fields (eg. `measurementTypeID`) include URI's from the [BODC NERC vocabulary](https://vocab.nerc.ac.uk/search_nvs/) or other *nearly permanent* source, where possible. For example, [water temperature](http://vocab.nerc.ac.uk/collection/P25/current/WTEMP/) in the BODC NERC vocabulary, the URI is `http://vocab.nerc.ac.uk/collection/P25/current/WTEMP/`.\n",
"\n",
"We then populate the appropriate fields with the information we have available. The `measurementValue` field is populated with the observed values of the measurement described in the `measurementType` and `measurementUnit` field.\n",
"\n",
"For measurement or facts of the **occurrence** (eg. percent cover, length, density, biomass, etc), we want to be sure to include the `occurrenceID` from the **occurrence** record as those observations are measurements of/from the organism. Other observations are tied to the **event** via the `eventID` (eg. water temperature, rugosity, etc).\n",
"\n",
"Below we walk through creating three independent DataFrames for *temperature*, *rugosity*, and *percent cover*. Populating each DataFrame with all of the information we have available and removing duplicative fields. We finally concatenate all the **extended measurements or facts** together into one DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"id": "2NDVEm92W4Qr"
},
"outputs": [],
"source": [
"temperature = df[[\"eventID\", \"temperature\", \"date\"]].copy()\n",
"temperature[\"occurrenceID\"] = \"\"\n",
"temperature[\"measurementType\"] = \"temperature\"\n",
"temperature[\"measurementTypeID\"] = (\n",
" \"http://vocab.nerc.ac.uk/collection/P25/current/WTEMP/\"\n",
")\n",
"temperature[\"measurementValue\"] = temperature[\"temperature\"]\n",
"temperature[\"measurementUnit\"] = \"Celsius\"\n",
"temperature[\"measurementUnitID\"] = (\n",
" \"http://vocab.nerc.ac.uk/collection/P06/current/UPAA/\"\n",
")\n",
"temperature[\"measurementAccuracy\"] = 3\n",
"temperature[\"measurementDeterminedDate\"] = pd.to_datetime(\n",
" temperature[\"date\"], format=\"%m/%d/%Y\"\n",
")\n",
"temperature[\"measurementMethod\"] = \"\"\n",
"temperature.drop(columns=[\"temperature\", \"date\"], inplace=True)\n",
"\n",
"rugosity = df[[\"eventID\", \"rugosity\", \"date\"]].copy()\n",
"rugosity[\"occurrenceID\"] = \"\"\n",
"rugosity[\"measurementType\"] = \"rugosity\"\n",
"rugosity[\"measurementTypeID\"] = \"\"\n",
"rugosity[\"measurementValue\"] = rugosity[\"rugosity\"].map(\"{:,.6f}\".format)\n",
"rugosity[\"measurementUnit\"] = \"\"\n",
"rugosity[\"measurementUnitID\"] = \"\"\n",
"rugosity[\"measurementAccuracy\"] = \"\"\n",
"rugosity[\"measurementDeterminedDate\"] = pd.to_datetime(\n",
" rugosity[\"date\"], format=\"%m/%d/%Y\"\n",
")\n",
"rugosity[\"measurementMethod\"] = \"\"\n",
"rugosity.drop(columns=[\"rugosity\", \"date\"], inplace=True)\n",
"\n",
"percent_cover = df[[\"eventID\", \"occurrenceID\", \"percent cover\", \"date\"]].copy()\n",
"percent_cover[\"measurementType\"] = \"Percent Cover\"\n",
"percent_cover[\"measurementTypeID\"] = (\n",
" \"http://vocab.nerc.ac.uk/collection/P01/current/SDBIOL10/\"\n",
")\n",
"percent_cover[\"measurementValue\"] = percent_cover[\"percent cover\"]\n",
"percent_cover[\"measurementUnit\"] = \"Percent/100m^2\"\n",
"percent_cover[\"measurementUnitID\"] = \"\"\n",
"percent_cover[\"measurementAccuracy\"] = 5\n",
"percent_cover[\"measurementDeterminedDate\"] = pd.to_datetime(\n",
" percent_cover[\"date\"], format=\"%m/%d/%Y\"\n",
")\n",
"percent_cover[\"measurementMethod\"] = \"\"\n",
"percent_cover.drop(columns=[\"percent cover\", \"date\"], inplace=True)\n",
"\n",
"measurementorfact = pd.concat([temperature, rugosity, percent_cover])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9JlJk0uEW4Qt"
},
"source": [
"Finally, we write the [measurement or fact file](https://github.com/ioos/notebooks_demos/blob/master/notebooks/data/dwc/processed/MadeUpData_mof.csv), again specifying the ISO date format. We've printed ten random rows of the DataFrame to give an example of what the resultant file will look like."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 293
},
"id": "usxtVcc5W4Qu",
"outputId": "c958e01f-c877-4203-eb85-bf3c7e175159"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" eventID | \n",
" occurrenceID | \n",
" measurementType | \n",
" measurementTypeID | \n",
" measurementValue | \n",
" measurementUnit | \n",
" measurementUnitID | \n",
" measurementAccuracy | \n",
" measurementDeterminedDate | \n",
" measurementMethod | \n",
"
\n",
" \n",
" \n",
" \n",
" 6 | \n",
" St. John_250_2 | \n",
" | \n",
" temperature | \n",
" http://vocab.nerc.ac.uk/collection/P25/current... | \n",
" 24.8 | \n",
" Celsius | \n",
" http://vocab.nerc.ac.uk/collection/P06/current... | \n",
" 3 | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
" 18 | \n",
" St. John_356_2 | \n",
" | \n",
" rugosity | \n",
" | \n",
" 0.158489 | \n",
" | \n",
" | \n",
" | \n",
" 2004-07-17 | \n",
" | \n",
"
\n",
" \n",
" 4 | \n",
" St. John_250_2 | \n",
" | \n",
" temperature | \n",
" http://vocab.nerc.ac.uk/collection/P25/current... | \n",
" 24.8 | \n",
" Celsius | \n",
" http://vocab.nerc.ac.uk/collection/P06/current... | \n",
" 3 | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
" 11 | \n",
" St. John_250_3 | \n",
" | \n",
" temperature | \n",
" http://vocab.nerc.ac.uk/collection/P25/current... | \n",
" 23.1 | \n",
" Celsius | \n",
" http://vocab.nerc.ac.uk/collection/P06/current... | \n",
" 3 | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
" 6 | \n",
" St. John_250_2 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Percent Cover | \n",
" http://vocab.nerc.ac.uk/collection/P01/current... | \n",
" 0 | \n",
" Percent/100m^2 | \n",
" | \n",
" 5 | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
" 4 | \n",
" St. John_250_2 | \n",
" | \n",
" rugosity | \n",
" | \n",
" 0.364583 | \n",
" | \n",
" | \n",
" | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
" 4 | \n",
" St. John_250_2 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Percent Cover | \n",
" http://vocab.nerc.ac.uk/collection/P01/current... | \n",
" 0 | \n",
" Percent/100m^2 | \n",
" | \n",
" 5 | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
" 20 | \n",
" St. John_356_3 | \n",
" | \n",
" rugosity | \n",
" | \n",
" 0.489574 | \n",
" | \n",
" | \n",
" | \n",
" 2004-07-17 | \n",
" | \n",
"
\n",
" \n",
" 2 | \n",
" St. John_250_1 | \n",
" f470068c-998a-4e9b-b026-02bf02118de7 | \n",
" Percent Cover | \n",
" http://vocab.nerc.ac.uk/collection/P01/current... | \n",
" 15 | \n",
" Percent/100m^2 | \n",
" | \n",
" 5 | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
" 2 | \n",
" St. John_250_1 | \n",
" | \n",
" temperature | \n",
" http://vocab.nerc.ac.uk/collection/P25/current... | \n",
" 25.2 | \n",
" Celsius | \n",
" http://vocab.nerc.ac.uk/collection/P06/current... | \n",
" 3 | \n",
" 2004-07-16 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" eventID occurrenceID measurementType \\\n",
"6 St. John_250_2 temperature \n",
"18 St. John_356_2 rugosity \n",
"4 St. John_250_2 temperature \n",
"11 St. John_250_3 temperature \n",
"6 St. John_250_2 f470068c-998a-4e9b-b026-02bf02118de7 Percent Cover \n",
"4 St. John_250_2 rugosity \n",
"4 St. John_250_2 f470068c-998a-4e9b-b026-02bf02118de7 Percent Cover \n",
"20 St. John_356_3 rugosity \n",
"2 St. John_250_1 f470068c-998a-4e9b-b026-02bf02118de7 Percent Cover \n",
"2 St. John_250_1 temperature \n",
"\n",
" measurementTypeID measurementValue \\\n",
"6 http://vocab.nerc.ac.uk/collection/P25/current... 24.8 \n",
"18 0.158489 \n",
"4 http://vocab.nerc.ac.uk/collection/P25/current... 24.8 \n",
"11 http://vocab.nerc.ac.uk/collection/P25/current... 23.1 \n",
"6 http://vocab.nerc.ac.uk/collection/P01/current... 0 \n",
"4 0.364583 \n",
"4 http://vocab.nerc.ac.uk/collection/P01/current... 0 \n",
"20 0.489574 \n",
"2 http://vocab.nerc.ac.uk/collection/P01/current... 15 \n",
"2 http://vocab.nerc.ac.uk/collection/P25/current... 25.2 \n",
"\n",
" measurementUnit measurementUnitID \\\n",
"6 Celsius http://vocab.nerc.ac.uk/collection/P06/current... \n",
"18 \n",
"4 Celsius http://vocab.nerc.ac.uk/collection/P06/current... \n",
"11 Celsius http://vocab.nerc.ac.uk/collection/P06/current... \n",
"6 Percent/100m^2 \n",
"4 \n",
"4 Percent/100m^2 \n",
"20 \n",
"2 Percent/100m^2 \n",
"2 Celsius http://vocab.nerc.ac.uk/collection/P06/current... \n",
"\n",
" measurementAccuracy measurementDeterminedDate measurementMethod \n",
"6 3 2004-07-16 \n",
"18 2004-07-17 \n",
"4 3 2004-07-16 \n",
"11 3 2004-07-16 \n",
"6 5 2004-07-16 \n",
"4 2004-07-16 \n",
"4 5 2004-07-16 \n",
"20 2004-07-17 \n",
"2 5 2004-07-16 \n",
"2 3 2004-07-16 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url = \"https://github.com/ioos/notebooks_demos/raw/master/notebooks/data/dwc/processed/\"\n",
"file = \"MadeUpData_mof.csv\"\n",
"\n",
"measurementorfact.to_csv(url + file, index=False, header=True, date_format=\"%Y-%m-%d\")\n",
"measurementorfact.sample(n=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Author:** Mathew Biddle"
]
}
],
"metadata": {
"colab": {
"include_colab_link": true,
"name": "2020-12-08-DataToDwC.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}