Using pyobis to query for known datasets#

Created: 2022-11-23

Updated: 2023-03-24

Author: Mathew Biddle

This notebook uses the pyobis Python package to query the OBIS API for datasets associated with projects funded under the United States Marine Biodiversity Observation Network. The notebook walks through the process for querying the OBIS api for a specific institution, then using the resultant datasets to gather the locations of all the occurrences using the pyobis package.

image.png

The US Marine Biodiversity Observation Network (US MBON) is an interagency initiative that seeks to coordinate across sectors and government to characterize biodiversity and understand drivers of change. US MBON represents a broad, collaborative effort to address the need for systematic collection and sharing of marine life information, ensure that information is available for decision-making and management from local to national levels, and document marine biodiversity status and trends in the face of human- and climate-induced change using a range of technologies and approaches. Through the National Oceanographic Partnership Program, NOAA, NASA, Office of Naval Research, and BOEM have invested in US MBON since 2014, most recently announcing new five year projects in 2022.

import pyobis

pyobis.__version__
'1.3.1'

First off, let’s look through the available institutions at OBIS and find the US MBON one.

Hint: We know the name of the institution is United States Marine Biodiversity Observation Network.

import pandas as pd

pd.set_option("max_colwidth", 400)

url = "https://api.obis.org/v3/institute"

institutes = pd.read_json(url)

df_inst = pd.DataFrame.from_records(institutes["results"])

df_inst.dropna(inplace=True)

df_inst["id"] = df_inst["id"].astype(int)

institution = df_inst.loc[
    df_inst["name"].str.contains(
        "United States Marine Biodiversity Observation", na=False
    )
]

institution
id name country records
35 23070 United States Marine Biodiversity Observation Network United States of America 2000024

Well that looks like the institution we’re after!

Using the id we can check it out on the OBIS website: https://obis.org/institute/23070

Yes, that does look like what we want. Now let’s use that id to query OBIS for all associated datasets.

institution_id = institution["id"].tolist()[0]

institution_id
23070

Collect metadata about all of the datasets#

Here we use the institution_id collected from OBIS to query the API for metadata on the datasets associated with that id.

First, we build the API call, then we bring it into a Pandas DataFrame and create additional fields for use later.

import pyobis

query = pyobis.dataset.search(instituteid=institution_id)

df = pd.DataFrame(query.execute())

df_meta = pd.DataFrame.from_records(df["results"])

# create a column for the human readable short name of the datasets
df_meta["short_name"] = df_meta["url"].str.split("=", expand=True)[1]

# create another column to group datasets by, which removes the year from the short names
df_meta["short_name_group"] = df_meta["short_name"].replace("\d", "", regex=True)

df_meta.head(5)
id url archive published created updated core extensions statistics extent ... intellectualrights feed institutes contacts nodes keywords downloads records short_name short_name_group
0 cfceb150-bbe2-4efb-8682-14cfc7167e7c https://www1.usgs.gov/obis-usa/ipt/resource?r=rcca_transects https://www1.usgs.gov/obis-usa/ipt/archive.do?r=rcca_transects 2023-01-18T20:38:20.000Z 2020-12-08T01:07:21.790Z 2023-01-20T11:13:43.819Z event [measurementorfact, occurrence] {'Event': 19375, 'absence': 487162, 'dropped': 0, 'Occurrence': 724871, 'DNADerivedData': 0, 'MeasurementOrFact': 255300} POLYGON((-124.294724 32.694233000000004,-124.294724 42.045155,-117.264999 42.045155,-117.264999 32.694233000000004,-124.294724 32.694233000000004)) ... To the extent possible under law, the publisher has waived all rights to these data and has dedicated them to the Public Domain (CC0 1.0) {'id': '8112d75b-7e78-4641-9e09-9a2cc7db7375', 'url': 'https://www1.usgs.gov/obis-usa/ipt/rss.do'} [{'name': 'U.S. Geological Survey HQ', 'oceanexpert_id': 12976, 'oceanexpert_parent_id': None}, {'name': 'United States Marine Biodiversity Observation Network', 'oceanexpert_id': 23070, 'oceanexpert_parent_id': None}, {'name': 'Central & Northern California Ocean Observing System', 'oceanexpert_id': 23204, 'oceanexpert_parent_id': None}, {'name': 'Reef Check Foundation', 'oceanexpert_id': 232... [{'role': None, 'type': 'creator', 'givenname': 'Jan', 'surname': 'Friewald', 'organization': 'Reef Check Foundation', 'position': 'Executive Director', 'email': 'jfreiwald@reefcheck.org', 'url': None, 'organization_oceanexpert_id': 23205, 'type_display': 'Creator'}, {'role': None, 'type': 'contact', 'givenname': 'Jan', 'surname': 'Friewald', 'organization': 'Reef Check Foundation', 'position'... [{'id': 'b7c47783-a020-4173-b390-7b57c4fa1426', 'name': 'OBIS USA'}] [{'keyword': 'Samplingevent', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type_2015-07-10.xml'}] [{'year': 2023, 'downloads': 965, 'records': 156467824}, {'year': 2022, 'downloads': 4191, 'records': 426899064}, {'year': 2021, 'downloads': 13776, 'records': 497036713}, {'year': 2020, 'downloads': 747, 'records': 26199396}] 237709 rcca_transects rcca_transects
1 d50e0443-4d84-4bd9-a5ad-4d417a7607e2 https://www1.usgs.gov/obis-usa/ipt/resource?r=2009floridakeysrvc https://www1.usgs.gov/obis-usa/ipt/archive.do?r=2009floridakeysrvc 2023-01-18T17:14:48.000Z None 2023-03-10T23:18:25.452Z event [measurementorfact, occurrence] {'Event': 2542, 'absence': 353964, 'dropped': 0, 'Occurrence': 451581, 'DNADerivedData': 0, 'MeasurementOrFact': 905704} POLYGON((-81.9975333 24.4313,-81.9975333 25.7526,-80.08755 25.7526,-80.08755 24.4313,-81.9975333 24.4313)) ... To the extent possible under law, the publisher has waived all rights to these data and has dedicated them to the Public Domain (CC0 1.0) {'id': '8112d75b-7e78-4641-9e09-9a2cc7db7375', 'url': 'https://www1.usgs.gov/obis-usa/ipt/rss.do'} [{'name': 'NOAA, Southeast Fisheries Science Center, National Marine Fisheries Service Southeast Miami Laboratory', 'oceanexpert_id': 7534, 'oceanexpert_parent_id': 11379}, {'name': 'Rosenstiel School of Marine & Atmospheric Science, University of Miami', 'oceanexpert_id': 8124, 'oceanexpert_parent_id': None}, {'name': 'U.S. Geological Survey HQ', 'oceanexpert_id': 12976, 'oceanexpert_parent_i... [{'role': None, 'type': 'creator', 'givenname': 'Jerald', 'surname': 'Ault', 'organization': 'University of Miami, Rosenstiel School of Marine & Atmospheric Science', 'position': 'Program Lead', 'email': 'jault@rsmas.miami.edu', 'url': None, 'organization_oceanexpert_id': 8124, 'type_display': 'Creator'}, {'role': None, 'type': 'creator', 'givenname': 'Jim', 'surname': 'Bohnsack', 'organiz... [{'id': 'b7c47783-a020-4173-b390-7b57c4fa1426', 'name': 'OBIS USA'}] [{'keyword': 'Samplingevent', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type.xml'}, {'keyword': 'Samplingevent', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type_2015-07-10.xml'}] [{'year': 2023, 'downloads': 2245, 'records': 64178093}, {'year': 2022, 'downloads': 7478, 'records': 175113790}, {'year': 2021, 'downloads': 17549, 'records': 204507663}, {'year': 2020, 'downloads': 3443, 'records': 68153160}, {'year': 2019, 'downloads': 293, 'records': 8867930}, {'year': 2018, 'downloads': 1, 'records': 97617}] 97617 2009floridakeysrvc floridakeysrvc
2 b4ac3d44-b4bf-49ef-a6e9-e2c02adff2fa https://www1.usgs.gov/obis-usa/ipt/resource?r=drytortugasreefvisualcensus2012 https://www1.usgs.gov/obis-usa/ipt/archive.do?r=drytortugasreefvisualcensus2012 2023-01-18T18:00:05.000Z None 2023-03-11T20:13:45.168Z event [measurementorfact, occurrence] {'Event': 2280, 'absence': 297153, 'dropped': 0, 'Occurrence': 384195, 'DNADerivedData': 0, 'MeasurementOrFact': 770670} POLYGON((-83.1035167 24.5519333,-83.1035167 24.7284333,-82.7729667 24.7284333,-82.7729667 24.5519333,-83.1035167 24.5519333)) ... To the extent possible under law, the publisher has waived all rights to these data and has dedicated them to the Public Domain (CC0 1.0) {'id': '8112d75b-7e78-4641-9e09-9a2cc7db7375', 'url': 'https://www1.usgs.gov/obis-usa/ipt/rss.do'} [{'name': 'NOAA, Southeast Fisheries Science Center, National Marine Fisheries Service Southeast Miami Laboratory', 'oceanexpert_id': 7534, 'oceanexpert_parent_id': 11379}, {'name': 'Rosenstiel School of Marine & Atmospheric Science, University of Miami', 'oceanexpert_id': 8124, 'oceanexpert_parent_id': None}, {'name': 'Gulf of Mexico Coastal Ocean Observing System', 'oceanexpert_id': 18936, '... [{'role': None, 'type': 'creator', 'givenname': 'Jerald', 'surname': 'Ault', 'organization': 'University of Miami, Rosenstiel School of Marine & Atmospheric Science', 'position': 'Project Lead', 'email': 'jault@rsmas.miami.edu', 'url': None, 'organization_oceanexpert_id': 8124, 'type_display': 'Creator'}, {'role': None, 'type': 'creator', 'givenname': 'Jim', 'surname': 'Bohnsack', 'organiz... [{'id': 'b7c47783-a020-4173-b390-7b57c4fa1426', 'name': 'OBIS USA'}] [{'keyword': 'Occurrence', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type.xml'}, {'keyword': 'Observation', 'thesaurus': 'GBIF Dataset Subtype Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_subtype.xml'}, {'keyword': 'Samplingevent', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type_2015-07-10.xml'}] [{'year': 2023, 'downloads': 2246, 'records': 57273171}, {'year': 2022, 'downloads': 7273, 'records': 158822384}, {'year': 2021, 'downloads': 17801, 'records': 178906115}, {'year': 2020, 'downloads': 3353, 'records': 59225761}, {'year': 2019, 'downloads': 299, 'records': 7648000}, {'year': 2018, 'downloads': 1, 'records': 87042}] 87042 drytortugasreefvisualcensus2012 drytortugasreefvisualcensus
3 03b75eb2-4672-42de-8211-f7c1daa39648 https://www1.usgs.gov/obis-usa/ipt/resource?r=floridakeysreefvisualcensus2018 https://www1.usgs.gov/obis-usa/ipt/archive.do?r=floridakeysreefvisualcensus2018 2023-01-25T19:09:26.000Z 2023-01-23T22:41:07.550Z 2023-03-11T17:44:46.161Z event [measurementorfact, occurrence] {'Event': 2254, 'absence': 328345, 'dropped': 0, 'Occurrence': 413933, 'DNADerivedData': 0, 'MeasurementOrFact': 416187} POLYGON((-82.00503 24.43293,-82.00503 25.74674,-80.0887 25.74674,-80.0887 24.43293,-82.00503 24.43293)) ... To the extent possible under law, the publisher has waived all rights to these data and has dedicated them to the Public Domain (CC0 1.0) {'id': '8112d75b-7e78-4641-9e09-9a2cc7db7375', 'url': 'https://www1.usgs.gov/obis-usa/ipt/rss.do'} [{'name': 'NOAA, Southeast Fisheries Science Center, National Marine Fisheries Service Southeast Miami Laboratory', 'oceanexpert_id': 7534, 'oceanexpert_parent_id': 11379}, {'name': 'Rosenstiel School of Marine & Atmospheric Science, University of Miami', 'oceanexpert_id': 8124, 'oceanexpert_parent_id': None}, {'name': 'National Oceanic and Atmospheric Administration, Washington', 'oceanexpert... [{'role': None, 'type': 'creator', 'givenname': 'Jerald', 'surname': 'Ault', 'organization': 'University of Miami, Rosenstiel School of Marine & Atmospheric Science', 'position': 'Project Lead', 'email': 'jault@rsmas.miami.edu', 'url': 'https://people.miami.edu/profile/1e434959ea929dc1c9b4145a8e0afae4', 'organization_oceanexpert_id': 8124, 'type_display': 'Creator'}, {'role': None, 'type':... [{'id': 'b7c47783-a020-4173-b390-7b57c4fa1426', 'name': 'OBIS USA'}] [{'keyword': 'Samplingevent', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type.xml'}, {'keyword': 'Samplingevent', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type_2015-07-10.xml'}, {'keyword': 'CORAL REEF', 'thesaurus': 'Global Change Master Directory (GCMD) Keywords'}, {'keyword': 'FISH', 'thesaurus': 'Glob... [{'year': 2023, 'downloads': 2031, 'records': 44010493}] 85588 floridakeysreefvisualcensus2018 floridakeysreefvisualcensus
4 bcd1a4f5-c99b-404d-b01a-47e8a5d80e2b https://www1.usgs.gov/obis-usa/ipt/resource?r=floridakeysreefvisualcensus2014 https://www1.usgs.gov/obis-usa/ipt/archive.do?r=floridakeysreefvisualcensus2014 2023-01-18T20:35:14.000Z None 2023-03-11T15:44:36.023Z event [measurementorfact, occurrence] {'Event': 2187, 'absence': 204067, 'dropped': 0, 'Occurrence': 287870, 'DNADerivedData': 0, 'MeasurementOrFact': 577927} POLYGON((-81.98934 24.43184,-81.98934 25.7456,-80.08762 25.7456,-80.08762 24.43184,-81.98934 24.43184)) ... To the extent possible under law, the publisher has waived all rights to these data and has dedicated them to the Public Domain (CC0 1.0) {'id': '8112d75b-7e78-4641-9e09-9a2cc7db7375', 'url': 'https://www1.usgs.gov/obis-usa/ipt/rss.do'} [{'name': 'NOAA, Southeast Fisheries Science Center, National Marine Fisheries Service Southeast Miami Laboratory', 'oceanexpert_id': 7534, 'oceanexpert_parent_id': 11379}, {'name': 'Rosenstiel School of Marine & Atmospheric Science, University of Miami', 'oceanexpert_id': 8124, 'oceanexpert_parent_id': None}, {'name': 'Gulf of Mexico Coastal Ocean Observing System', 'oceanexpert_id': 18936, '... [{'role': None, 'type': 'creator', 'givenname': 'Jerald', 'surname': 'Ault', 'organization': 'University of Miami, Rosenstiel School of Marine & Atmospheric Science', 'position': 'Project Lead', 'email': 'jault@rsmas.miami.edu', 'url': None, 'organization_oceanexpert_id': 8124, 'type_display': 'Creator'}, {'role': None, 'type': 'creator', 'givenname': 'Jim', 'surname': 'Bohnsack', 'organiz... [{'id': 'b7c47783-a020-4173-b390-7b57c4fa1426', 'name': 'OBIS USA'}] [{'keyword': 'Occurrence', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type.xml'}, {'keyword': 'Observation', 'thesaurus': 'GBIF Dataset Subtype Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_subtype.xml'}, {'keyword': 'Samplingevent', 'thesaurus': 'GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type_2015-07-10.xml'}] [{'year': 2023, 'downloads': 2182, 'records': 55190511}, {'year': 2022, 'downloads': 7100, 'records': 151781210}, {'year': 2021, 'downloads': 17310, 'records': 176120712}, {'year': 2020, 'downloads': 3464, 'records': 58588841}, {'year': 2019, 'downloads': 343, 'records': 7715800}, {'year': 2018, 'downloads': 1, 'records': 83800}] 83803 floridakeysreefvisualcensus2014 floridakeysreefvisualcensus

5 rows × 24 columns

df_meta[["short_name", "short_name_group"]].describe()
short_name short_name_group
count 45 45
unique 45 20
top rcca_transects floridakeysreefvisualcensus
freq 1 18

Read to geopandas and plot the bounding boxes#

Next we want to make a simple map of the coverage each dataset has. We can use the extent field directly in geopandas to make a simple map.

from datetime import datetime

import geopandas
import matplotlib.pyplot as plt

gdf = geopandas.GeoDataFrame(df_meta)

gdf["geometry"] = geopandas.GeoSeries.from_wkt(gdf["extent"])

gdf.set_crs("epsg:4326", inplace=True)

fig, ax = plt.subplots(
    figsize=(20, 15),
)

gdf.plot(
    ax=ax,
    zorder=10,
    alpha=0.5,
    column="short_name_group",
    legend=True,
    cmap="turbo",
)

leg = ax.get_legend()

leg.set_bbox_to_anchor((0.0, 0.0, 1.2, 0.9))

world = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))

world = world.to_crs(gdf.crs)

# bound map to US.
world.loc[world["iso_a3"] == "USA"].plot(
    ax=ax, color="lightgrey", edgecolor="black", zorder=1
)

today = datetime.now().strftime("%Y-%m-%d")

ax.set_title(
    f"Sampling coverage map\nTotal dataset records contributed to OBIS from the US MBON network as of {today}: {gdf.records.sum() }"
)

ax.set_axis_off();
../../../_images/6050a43b836ce6f1962418f974d2dd06ace4f4a57c2c5e6f677a9732714374c5.png
gdf.explore(column="short_name")
Make this Notebook Trusted to load map: File -> Trust Notebook

Save as a geojson#

Since we have the data, why don’t we write this out to a geoJson file to interact with on GitHub.

fname = f"US_MBON_bounding_boxes_{today.replace('-','')}.geojson"

# can"t export fields with lists or dictionaries - breaks the format.
gdf[["id", "title", "url", "geometry"]].to_file(fname, driver="GeoJSON")

Query the OBIS aggregated points#

Now that we know the general area for where our observations are made, let’s dig a little deeper into each dataset.

We can use pyobis.occurrences.getpoints() to return all of the points for each dataset, aggregated to Geohash precision 8.

If we’d like a little more metadata about each of the aggregated points, we can query by dataset to collect the aggregated points and some useful metadata.

combined = pd.DataFrame()

query = pyobis.dataset.search(instituteid=institution_id)

df = pd.DataFrame(query.execute())

df_meta = pd.DataFrame.from_records(df["results"])

for datasetid in df_meta["id"]:
    dset = pyobis.occurrences.getpoints(datasetid=datasetid).execute()

    meta = pyobis.dataset.get(id=datasetid).execute()["results"][0]
    short_name = meta["url"].split("=")[-1]

    df = pd.DataFrame(dset)

    df["dataset_id"] = datasetid
    df["short_name"] = meta["url"].split("=")[-1]
    df["short_name_group"] = df["short_name"].replace("\d", "", regex=True)
    df["url"] = meta["url"]
    df["metadata"] = str(meta)

    df[["decimalLongitude", "decimalLatitude"]] = pd.DataFrame(
        df["coordinates"].tolist()
    )

    combined = pd.concat([combined, df], ignore_index=True)


gdf_ghsh = geopandas.GeoDataFrame(
    combined[
        ["dataset_id", "url", "short_name_group", "decimalLatitude", "decimalLongitude"]
    ],
    geometry=geopandas.points_from_xy(
        combined.decimalLongitude, combined.decimalLatitude
    ),
    crs="epsg:4326",
)


fig, ax = plt.subplots(figsize=(20, 15))


gdf_ghsh.plot(
    ax=ax,
    zorder=10,
    alpha=0.5,
    column="short_name_group",
    legend=True,
)

world = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))

world = world.to_crs(gdf.crs)

world.loc[world["iso_a3"] == "USA"].plot(
    ax=ax, color="lightgrey", edgecolor="white", zorder=1
)

now = datetime.now()

today = now.strftime("%Y-%m-%d")

ax.set_title(
    f"Total aggregated occurrence records contributed to OBIS from the US MBON network as of {today}: {gdf_ghsh.shape[0]}"
)

ax.set_axis_off();
../../../_images/9893d67f862476be972180e1c3bb6773e0dff4aaf25c8ad53956f92c454926cb.png

Let’s explore those points a little more with geopandas.GeoDataFrame.explore().#

This allows you to create an interactive map based on folium/leaflet.jsInteractive map based on GeoPandas and folium/leaflet.js

gdf_ghsh.explore(column="short_name_group")
Make this Notebook Trusted to load map: File -> Trust Notebook