Programmatically query the IOOS Data Catalog for a specific observation type

Programmatically query the IOOS Data Catalog for a specific observation type#

Created: 2024-09-17

Updated: 2025-03-06

In this notebook we highlight the ability to search the IOOS Data Catalog for a specific subset of observations using the CKAN web accessible Application Programming Interface (API).

For this example, we want to look for observations of oxygen in the water column across the IOOS Catalog. As part of the IOOS Metadata Profile, which the US IOOS community uses to publish datasets, we know that each Regional Association and DAC will be following the Climate and Forecast (CF) Conventions and using CF standard_names to describe their datasets. So, with that assumption, we can search across the IOOS Data catalog for datasets with the CF standard names that contain oxygen and sea_water. Then, we can build a simple map to show the geographical distribution of those datasets.

Build CKAN API query base.#

Uses ckan/ckanapi

from ckanapi import RemoteCKAN

ioos_catalog = RemoteCKAN(
    address="https://data.ioos.us",
    user_agent="ckanapiioos/1.0 (+https://ioos.us/)",
)


ioos_catalog

<ckanapi.remoteckan.RemoteCKAN at 0x7aa69aba7e00>

What organizations are in the catalog?#

Tell me what organizations are there.

orgs = ioos_catalog.action.organization_list()
print(orgs)

['aoos', 'caricoos', 'cdip', 'cencoos', 'comt', 'gcoos', 'glider-dac', 'glos', 'hf-radar-dac', 'ioos', 'maracoos', 'nanoos', 'neracoos', 'noaa-co-ops', 'noaa-ndbc', 'oceansites', 'pacioos', 'sccoos', 'secoora', 'unidata', 'usgs', 'us-navy']

How many datasets are we searching across?#

Grab all the datasets available and return the count.

datasets = ioos_catalog.action.package_search()
datasets["count"]

Grab the most recent applicable CF standard names#

Collect CF standard names that contain oxygen and sea_water from the CF standard name list.

import pandas as pd

url = "https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml"

tbl_version = pd.read_xml(url, xpath="./*")["version_number"][0].astype(int)
df = pd.read_xml(url, xpath="entry")

std_names = df.loc[
    (df["id"].str.contains("oxygen") & df["id"].str.contains("sea_water"))
]

print(f"CF Standard Name Table: {tbl_version}")

std_names[["id", "description"]]

CF Standard Name Table: 90

	id	description
471	depth_at_shallowest_local_minimum_in_vertical_...	Depth is the vertical distance below the surfa...
627	fractional_saturation_of_oxygen_in_sea_water	Fractional saturation is the ratio of some mea...
1368	mass_concentration_of_oxygen_in_sea_water	Mass concentration means mass per unit volume ...
1737	mole_concentration_of_dissolved_molecular_oxyg...	Mole concentration means number of moles per u...
1738	mole_concentration_of_dissolved_molecular_oxyg...	"Mole concentration at saturation" means the m...
1739	mole_concentration_of_dissolved_molecular_oxyg...	Mole concentration means number of moles per u...
1838	mole_concentration_of_preformed_dissolved_mole...	"Mole concentration" means the number of moles...
2009	moles_of_oxygen_per_unit_mass_in_sea_water	moles_of_X_per_unit_mass_inY is also called "m...
3225	surface_molecular_oxygen_partial_pressure_diff...	The surface called "surface" means the lower b...
3725	temperature_of_sensor_for_oxygen_in_sea_water	Temperature_of_sensor_for_oxygen_in_sea_water ...
4825	volume_fraction_of_oxygen_in_sea_water	"Volume fraction" is used in the construction ...
4830	volume_mixing_ratio_of_oxygen_at_stp_in_sea_water	"ratio_of_X_to_Y" means X/Y. "stp" means stand...

Search across IOOS Data Catalog using CKAN API#

Search the IOOS Data Catalog for CF standard names that match those above.

import time

from ckanapi import RemoteCKAN
from ckanapi.errors import CKANAPIError
from requests.exceptions import ChunkedEncodingError
from urllib3.exceptions import IncompleteRead

ua = "ckanapiioos/1.0 (+https://ioos.us/)"

ioos_catalog = RemoteCKAN("https://data.ioos.us", user_agent=ua)
ioos_catalog

df_out = pd.DataFrame()

for std_name in std_names["id"]:

    print(std_name)

    fq = f"+cf_standard_names:{std_name}"

    result_count = 0

    df_std_name = pd.DataFrame()

    while True:
        try:
            datasets = ioos_catalog.action.package_search(
                fq=fq, rows=500, start=result_count
            )
        except (CKANAPIError, IncompleteRead, ChunkedEncodingError):
            continue

        num_results = datasets["count"]

        print(f"num_results: {num_results}, result_count: {result_count}")

        for dataset in datasets["results"]:
            df = pd.DataFrame(
                {
                    "title": [dataset["title"]],
                    "url": [dataset["resources"][0]["url"]],
                    "org": [dataset["organization"]["title"]],
                    "std_name": std_name,
                }
            )

            df_std_name = pd.concat([df_std_name, df], ignore_index=True)
            result_count = df_std_name.shape[0]

        time.sleep(1)

        if result_count >= num_results:
            print(f"num_results: {num_results}, result_count: {result_count}")
            break

    df_out = pd.concat([df_out, df_std_name], ignore_index=True)

    print(
        f"num_results: {num_results}, result_count: {result_count}, total_result_count: {df_out.shape[0]}"
    )

df_out.shape

depth_at_shallowest_local_minimum_in_vertical_profile_of_mole_concentration_of_dissolved_molecular_oxygen_in_sea_water
num_results: 0, result_count: 0
num_results: 0, result_count: 0
num_results: 0, result_count: 0, total_result_count: 0
fractional_saturation_of_oxygen_in_sea_water
num_results: 5110, result_count: 0
num_results: 5110, result_count: 500
num_results: 5110, result_count: 1000
num_results: 5110, result_count: 1500
num_results: 5110, result_count: 2000
num_results: 5110, result_count: 2500
num_results: 5110, result_count: 3000
num_results: 5110, result_count: 3500
num_results: 5110, result_count: 4000
num_results: 5110, result_count: 4500
num_results: 5110, result_count: 5000
num_results: 5110, result_count: 5110
num_results: 5110, result_count: 5110, total_result_count: 5110
mass_concentration_of_oxygen_in_sea_water
num_results: 4107, result_count: 0
num_results: 4107, result_count: 500
num_results: 4107, result_count: 1000
num_results: 4107, result_count: 1500
num_results: 4107, result_count: 2000
num_results: 4107, result_count: 2500
num_results: 4107, result_count: 3000
num_results: 4107, result_count: 3500
num_results: 4107, result_count: 4000
num_results: 4107, result_count: 4107
num_results: 4107, result_count: 4107, total_result_count: 9217
mole_concentration_of_dissolved_molecular_oxygen_in_sea_water
num_results: 352, result_count: 0
num_results: 352, result_count: 352
num_results: 352, result_count: 352, total_result_count: 9569
mole_concentration_of_dissolved_molecular_oxygen_in_sea_water_at_saturation
num_results: 0, result_count: 0
num_results: 0, result_count: 0
num_results: 0, result_count: 0, total_result_count: 9569
mole_concentration_of_dissolved_molecular_oxygen_in_sea_water_at_shallowest_local_minimum_in_vertical_profile
num_results: 0, result_count: 0
num_results: 0, result_count: 0
num_results: 0, result_count: 0, total_result_count: 9569
mole_concentration_of_preformed_dissolved_molecular_oxygen_in_sea_water
num_results: 0, result_count: 0
num_results: 0, result_count: 0
num_results: 0, result_count: 0, total_result_count: 9569
moles_of_oxygen_per_unit_mass_in_sea_water
num_results: 796, result_count: 0
num_results: 796, result_count: 500
num_results: 796, result_count: 796
num_results: 796, result_count: 796, total_result_count: 10365
surface_molecular_oxygen_partial_pressure_difference_between_sea_water_and_air
num_results: 0, result_count: 0
num_results: 0, result_count: 0
num_results: 0, result_count: 0, total_result_count: 10365
temperature_of_sensor_for_oxygen_in_sea_water
num_results: 213, result_count: 0
num_results: 213, result_count: 213
num_results: 213, result_count: 213, total_result_count: 10578
volume_fraction_of_oxygen_in_sea_water
num_results: 2183, result_count: 0
num_results: 2183, result_count: 500
num_results: 2183, result_count: 1000
num_results: 2183, result_count: 1500
num_results: 2183, result_count: 2000
num_results: 2183, result_count: 2183
num_results: 2183, result_count: 2183, total_result_count: 12761
volume_mixing_ratio_of_oxygen_at_stp_in_sea_water
num_results: 0, result_count: 0
num_results: 0, result_count: 0
num_results: 0, result_count: 0, total_result_count: 12761

(12761, 4)

Do some summarizing of the responses#

The DataFrame of the matching datasets is quite large. I wonder what the distribution of those datasets across organizations looks like? Let’s use pandas.groupby() to generate some statistics about how many datasets are provided, matching our criteria, by which organization.

df_out.groupby(by="org").count()

	title	url	std_name
org
AOOS	28	28	28
CeNCOOS	416	416	416
GCOOS	9617	9617	9617
Glider DAC	2233	2233	2233
MARACOOS	221	221	221
NANOOS	4	4	4
NERACOOS	48	48	48
PacIOOS	16	16	16
SCCOOS	1	1	1
SECOORA	177	177	177

Drop the Glider DAC data#

Glider DAC data are already making it to NCEI, so we can drop those entries.

df_out_no_glider = df_out.loc[~df_out["org"].str.contains("Glider DAC")]
df_out_no_glider.groupby(by="org").count()

	title	url	std_name
org
AOOS	28	28	28
CeNCOOS	416	416	416
GCOOS	9617	9617	9617
MARACOOS	221	221	221
NANOOS	4	4	4
NERACOOS	48	48	48
PacIOOS	16	16	16
SCCOOS	1	1	1
SECOORA	177	177	177

Digging into some of the nuances#

There are still quite a lot of datasets from each organization. As our search above looked for each CF standard_name across all the datasets, there might be duplicate datasets which have multiple matching CF standard names. ie. one dataset might have both mass_concentration_of_oxygen_in_sea_water and fractional_saturation_of_oxygen_in_sea_water, but we only need to know that it’s one dataset.

As we only need to know about the unique datasets, let’s count how many unique dataset urls we have.

df_out_no_glider.groupby(by="url").count()

	title	org	std_name
url
http://www.humboldt.edu	4	4	4
http://www.neracoos.org/erddap/tabledap/A01_aanderaa_o2_all	1	1	1
http://www.neracoos.org/erddap/tabledap/A01_optode_all	2	2	2
http://www.neracoos.org/erddap/tabledap/A01_sbe16_disox_all	2	2	2
http://www.neracoos.org/erddap/tabledap/C02_sbe16_disox_all	2	2	2
...	...	...	...
https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/nss_012	2	2	2
https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/nss_013	2	2	2
https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/wqb_04	2	2	2
https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/wqb_05	2	2	2
https://www.nodc.noaa.gov/ocads/oceans/time_series_moorings.html	4	4	4

7654 rows × 3 columns

Drop duplicate records#

As you can see above, there are a lot of duplicate dataset urls which we can simplify down. We identify duplicates by looking at the URL, which should be unique for each dataset, and drop the duplicates.

df_out_nodups_no_glider = df_out_no_glider.drop_duplicates(subset=["url"], keep="last")

df_out_nodups_no_glider

	title	url	org	std_name
855	Great Bay,NH. Oyster River WQ station	http://www.neracoos.org/erddap/tabledap/GRBORW...	NERACOOS	fractional_saturation_of_oxygen_in_sea_water
857	Great Bay,NH. Squamscott River WQ station	http://www.neracoos.org/erddap/tabledap/GRBSQW...	NERACOOS	fractional_saturation_of_oxygen_in_sea_water
868	Great Bay,NH. Great Bay WQ station	http://www.neracoos.org/erddap/tabledap/GRBGBW...	NERACOOS	fractional_saturation_of_oxygen_in_sea_water
870	Great Bay,NH. Lamprey River WQ station	http://www.neracoos.org/erddap/tabledap/GRBLRW...	NERACOOS	fractional_saturation_of_oxygen_in_sea_water
891	Monterey Bay Aquarium Seawater Intake	https://erddap.cencoos.org/erddap/tabledap/mon...	CeNCOOS	fractional_saturation_of_oxygen_in_sea_water
...	...	...	...	...
12756	Walton-Smith CTD, WS1102, WS1102_028, 2011-02-...	https://gcoos5.geos.tamu.edu/erddap/tabledap/W...	GCOOS	volume_fraction_of_oxygen_in_sea_water
12757	Walton-Smith CTD, WS0718, WS0718_WS0718_056, 2...	https://gcoos5.geos.tamu.edu/erddap/tabledap/W...	GCOOS	volume_fraction_of_oxygen_in_sea_water
12758	Walton-Smith CTD, WS0802, WS0802_WS0802_Wet_40...	https://gcoos5.geos.tamu.edu/erddap/tabledap/W...	GCOOS	volume_fraction_of_oxygen_in_sea_water
12759	Walton-Smith CTD, WS20342, WS20342_WS20342_stn...	https://gcoos5.geos.tamu.edu/erddap/tabledap/W...	GCOOS	volume_fraction_of_oxygen_in_sea_water
12760	Walton-Smith CTD, WS22141, WS22141_WS22141_Stn...	https://gcoos5.geos.tamu.edu/erddap/tabledap/W...	GCOOS	volume_fraction_of_oxygen_in_sea_water

7654 rows × 4 columns

How many endpoints are not ERDDAP?#

Now we have a unique list of datasets which match our CF standard name criteria. Since we have some background in using ERDDAP to query for data, let’s take a look at what other endpoints each of the datasets are using.

Hint: We know ERDDAP systems typically have erddap in their urls.

df_out_nodups_no_glider.loc[~df_out_nodups_no_glider["url"].str.contains("erddap")]

	title	url	org	std_name
9096	CeNCOOS in situ water monitoring data at Trini...	http://www.humboldt.edu	CeNCOOS	mass_concentration_of_oxygen_in_sea_water
9176	MAPCO2 Buoy: Maihi Bay, Hawaii Island, Hawaii	https://www.nodc.noaa.gov/ocads/oceans/time_se...	PacIOOS	mass_concentration_of_oxygen_in_sea_water

What’s the remaining distribution?#

This is the distribution of unique datasets found in the IOOS Data Catalog which have a CF Standard Name that contains the work oxygen and sea_water. We’ve dropped out the Glider DAC datasets as, theoretically, those are in NCEI already.

df_out_nodups_no_glider.groupby(by="org").count()

	title	url	std_name
org
AOOS	13	13	13
CeNCOOS	217	217	217
GCOOS	7152	7152	7152
MARACOOS	137	137	137
NANOOS	2	2	2
NERACOOS	32	32	32
PacIOOS	7	7	7
SCCOOS	1	1	1
SECOORA	93	93	93

Ingest data#

Let’s rip through all of the datasets, grab the data as a table (including units) and make a monster dictionary. This takes a bit.

import multiprocessing
from urllib.error import HTTPError

import joblib
import stamina
from tqdm import tqdm


@stamina.retry(on=HTTPError, attempts=3)
def request_df(url):
    """Thin layer to handle retries."""
    return pd.read_csv(url, low_memory=False)


def error_handling_layer(row):
    """Even with stamina we may hit servers that will fail."""
    title = row["title"]
    # Requesting only the position.
    url = f"{row['url']}.csvp?latitude,longitude&distinct()"
    try:
        df = request_df(url)
    except Exception as err:
        msg = f"Failed to fetch {url}. {err}."
        print(msg)
        df = None
    return title, df


n_iter = len(df_out_nodups_no_glider)
num_cores = multiprocessing.cpu_count()

downloads = [
    r
    for r in tqdm(
        joblib.Parallel(return_as="generator", n_jobs=num_cores, max_nbytes=5000)(
            joblib.delayed(error_handling_layer)(row)
            for _, row in df_out_nodups_no_glider.iterrows()
        ),
        total=n_iter,
    )
]


dict_out_final = dict(downloads)

  0%|                                                                                                                                      | 1/7654 [00:01<2:15:29,  1.06s/it]stamina.retry_scheduled
stamina.retry_scheduled
 22%|█████████████████████████████▋                                                                                                       | 1706/7654 [01:33<16:15,  6.10it/s]stamina.retry_scheduled
stamina.retry_scheduled
 23%|███████████████████████████████                                                                                                      | 1786/7654 [02:12<39:54,  2.45it/s]stamina.retry_scheduled
stamina.retry_scheduled
 25%|█████████████████████████████████▏                                                                                                   | 1911/7654 [02:44<20:35,  4.65it/s]stamina.retry_scheduled
stamina.retry_scheduled
 70%|████████████████████████████████████████████████████████████████████████████████████████████▊                                        | 5344/7654 [05:50<02:54, 13.21it/s]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████                                        | 5359/7654 [06:10<30:29,  1.25it/s]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▎                                       | 5371/7654 [06:29<30:52,  1.23it/s]stamina.retry_scheduled
stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▎                                       | 5372/7654 [06:31<35:42,  1.07it/s]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 70%|███████████████████████████████████████████████████████████████████████████████████████████▉                                       | 5373/7654 [06:42<2:03:47,  3.26s/it]stamina.retry_scheduled
stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▍                                       | 5379/7654 [06:47<46:42,  1.23s/it]stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▌                                       | 5383/7654 [06:49<30:15,  1.25it/s]stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▌                                       | 5384/7654 [06:50<32:27,  1.17it/s]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 70%|████████████████████████████████████████████████████████████████████████████████████████████▏                                      | 5385/7654 [07:01<1:53:10,  2.99s/it]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▋                                       | 5392/7654 [07:07<35:12,  1.07it/s]stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▋                                       | 5395/7654 [07:09<29:00,  1.30it/s]stamina.retry_scheduled
 70%|█████████████████████████████████████████████████████████████████████████████████████████████▊                                       | 5396/7654 [07:10<34:15,  1.10it/s]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 71%|████████████████████████████████████████████████████████████████████████████████████████████▎                                      | 5397/7654 [07:20<2:18:04,  3.67s/it]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 71%|█████████████████████████████████████████████████████████████████████████████████████████████▉                                       | 5404/7654 [07:26<36:53,  1.02it/s]stamina.retry_scheduled
 71%|█████████████████████████████████████████████████████████████████████████████████████████████▉                                       | 5407/7654 [07:29<30:13,  1.24it/s]stamina.retry_scheduled
 71%|█████████████████████████████████████████████████████████████████████████████████████████████▉                                       | 5408/7654 [07:30<32:02,  1.17it/s]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 71%|████████████████████████████████████████████████████████████████████████████████████████████▌                                      | 5409/7654 [07:40<1:52:56,  3.02s/it]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 71%|████████████████████████████████████████████████████████████████████████████████████████████▌                                      | 5410/7654 [07:42<1:44:49,  2.80s/it]stamina.retry_scheduled
 71%|██████████████████████████████████████████████████████████████████████████████████████████████                                       | 5414/7654 [07:45<49:45,  1.33s/it]stamina.retry_scheduled
stamina.retry_scheduled
 71%|██████████████████████████████████████████████████████████████████████████████████████████████▏                                      | 5417/7654 [07:47<39:10,  1.05s/it]stamina.retry_scheduled
stamina.retry_scheduled
 71%|██████████████████████████████████████████████████████████████████████████████████████████████▏                                      | 5419/7654 [07:48<32:11,  1.16it/s]stamina.retry_scheduled

 71%|██████████████████████████████████████████████████████████████████████████████████████████████▏                                      | 5420/7654 [07:49<34:28,  1.08it/s]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 71%|██████████████████████████████████████████████████████████████████████████████████████████████▏                                      | 5421/7654 [07:53<58:48,  1.58s/it]stamina.retry_scheduled
 71%|██████████████████████████████████████████████████████████████████████████████████████████████▏                                      | 5423/7654 [07:54<40:47,  1.10s/it]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
 71%|██████████████████████████████████████████████████████████████████████████████████████████████▎                                      | 5424/7654 [07:55<38:10,  1.03s/it]stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7654/7654 [12:19<00:00, 10.35it/s]

Let’s take a quick look at one of the DataFrames.

Transpose it when we print, so we can see all the columns.

dict_out_final['"Deepwater CTD - pe972218.ctd.nc - 29.25N, -87.89W - 1997-03-21"'].head(
    5
).T

	0
latitude (degrees_north)	29.247200
longitude (degrees_east)	-87.888901

Let’s make a nice map of the distribution of observations#

Below we create a mapping function to plot the unique dataset points on a map. Then, we use that function with our full response. We have to do a little reorganizing of the data to build one DataFrame for all the coordinates.

import cartopy.io.shapereader as shpreader
import geopandas as gpd
import matplotlib.pyplot as plt


def make_map(df):
    fig, ax = plt.subplots(figsize=(8, 6))
    shpfilename = shpreader.natural_earth(
        resolution="110m",
        category="cultural",
        name="admin_0_countries",
    )
    countries = gpd.read_file(shpfilename)

    countries[countries["NAME"] == "United States of America"].plot(
        color="lightgrey", ax=ax
    )

    df.plot(
        x="longitude (degrees_east)",
        y="latitude (degrees_north)",
        kind="scatter",
        ax=ax,
    )

    ax.grid(visible=True, alpha=0.5)

    return ax

df_coords_clean = pd.concat(dict_out_final).dropna().drop_duplicates(ignore_index=True)

# Clean value outside of valid lon, lat.
df_coords_clean = df_coords_clean.loc[df_coords_clean["latitude (degrees_north)"] <= 90]
df_coords_clean = df_coords_clean.loc[
    df_coords_clean["latitude (degrees_north)"] >= -90
]
df_coords_clean = df_coords_clean.loc[
    df_coords_clean["longitude (degrees_east)"] <= 180
]
df_coords_clean = df_coords_clean.loc[
    df_coords_clean["longitude (degrees_east)"] >= -180
]
df_coords_clean

make_map(df_coords_clean)

<Axes: xlabel='longitude (degrees_east)', ylabel='latitude (degrees_north)'>

../../../_images/2252520f5e02acf0d236356f4b49821636d7346e94ffca36762ae919f97e8b64.png

Lets explore those points on an interactive map#

Just for fun, we can us folium’s MarkerCluster to plot many points on an interactive map to browse around. We have +7k points, marker cluster is quite robust but that many point can crash the browser, so let’s limit to the first 1k.

locations = list(
    zip(
        df_coords_clean["latitude (degrees_north)"][0:1000],
        df_coords_clean["longitude (degrees_east)"][0:1000],
    )
)

import folium
from folium.plugins import MarkerCluster

m = folium.Map(zoom_start=5)

marker_cluster = MarkerCluster().add_to(m)

icon_create_function = """\
function(cluster) {
    return L.divIcon({
    html: '<b>' + cluster.getChildCount() + '</b>',
    className: 'marker-cluster marker-cluster-large',
    iconSize: new L.Point(20, 20)
    });
}"""


marker_cluster = MarkerCluster(
    locations=locations,
    overlay=True,
    control=True,
    icon_create_function=icon_create_function,
)

marker_cluster.add_to(m)

folium.LayerControl().add_to(m)


m.fit_bounds(m.get_bounds())
m

Make this Notebook Trusted to load map: File -> Trust Notebook

Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1988.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1989.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1994.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1977.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2005.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1983.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2006.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1997.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1988.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_2002.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1986.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_2005.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1992.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.aoos.org/erddap/tabledap/homer-dolphin-surface-water-q.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1998.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_2000.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1996.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_2002.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1985.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1996.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1993.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/monterey-bay-aquarium-seawate.csvp?latitude,longitude&distinct(). HTTP Error 500: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2000.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1984.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_2005.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2002.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1975.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1983.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/nerrs_elkapwq.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1989.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1990.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1987.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_2003.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_2001.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1986.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/nerrs_elkvmwq.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1980.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1987.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1974.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1995.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_2003.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1992.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/monterey-wharf-real-time-samplin.csvp?latitude,longitude&distinct(). HTTP Error 500: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1999.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1999.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1982.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1994.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2004.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1993.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://www.nodc.noaa.gov/ocads/oceans/time_series_moorings.html.csvp?latitude,longitude&distinct(). HTTP Error 404: Not Found.
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/nerrs_sfbccwq.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/mlml_monterey.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1997.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1998.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1981.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_2004.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2003.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_2001.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1976.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/nerrs_elksmwq.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/morro-bay-bs1.csvp?latitude,longitude&distinct(). HTTP Error 500: .
Failed to fetch https://erddap.secoora.org/erddap/tabledap/pivers-island-coastal-observa.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1990.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_2000.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1991.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1981.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1978.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_2004.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/nerrs_elknmwq.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1990.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1997.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1991.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1984.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1982.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1995.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/nerrs_tjroswq.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch http://www.humboldt.edu.csvp?latitude,longitude&distinct(). <urlopen error [Errno -2] Name or service not known>.
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_1998.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1973.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1991.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1979.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1994.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1992.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1996.csvp?latitude,longitude&distinct(). HTTP Error 404: .

Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1999.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2007.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1985.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Alabama_CPUE_Trawl_study_DATA_2001.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Florida_CPUE_Trawl_study_DATA_1995.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos4.geos.tamu.edu/erddap/tabledap/CAGES_Mississippi_CPUE_Trawl_study_DATA_1993.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://erddap.cencoos.org/erddap/tabledap/nerrs_tjrsbwq.csvp?latitude,longitude&distinct(). HTTP Error 404: .
Failed to fetch https://gcoos5.geos.tamu.edu/erddap/tabledap/WS1202_WS1202_deck-tst.csvp?latitude,longitude&distinct(). <urlopen error [Errno 110] Connection timed out>.

We hope this example demonstrates the flexibility of direct requests to the IOOS Data Catalog CKAN server and all the possibilities it provides. In this notebook we:

Search the IOOS Data Catalog CKAN API with keywords.
Found datasets matching our specified criteria.
Collected all the data from each of the datasets matching our criteria.
Created a simple map of the distribution of datasets which match our criteria.

To take this one step further, since we collected all the data from each of the datasets (in the dictionary dict_out_final) a user could integrate all of the oxygen observations together and start to build a comprehensive dataset.

Additionally, a user could modify the CKAN query to search for terms outside of the CF standard names to potentially gather more datasets.