{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Searching datasets\n", "\n", "Erddapy can wrap the same form-like search capabilities of ERDDAP with the\n", "_search_for_ keyword.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:44:45.474161Z", "iopub.status.busy": "2026-03-06T15:44:45.473988Z", "iopub.status.idle": "2026-03-06T15:44:45.848336Z", "shell.execute_reply": "2026-03-06T15:44:45.847559Z" } }, "outputs": [], "source": [ "from erddapy import ERDDAP\n", "\n", "e = ERDDAP(\n", " server=\"https://pae-paha.pacioos.hawaii.edu/erddap\",\n", " protocol=\"griddap\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Single word search.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:44:45.850251Z", "iopub.status.busy": "2026-03-06T15:44:45.850072Z", "iopub.status.idle": "2026-03-06T15:44:46.097321Z", "shell.execute_reply": "2026-03-06T15:44:46.096571Z" } }, "outputs": [ { "data": { "text/plain": [ "0 etopo1_bedrock\n", "1 etopo1_bedrock_lon360\n", "2 etopo1_ice\n", "3 etopo1_ice_lon360\n", "4 etopo5\n", "5 etopo5_lon180\n", "Name: Dataset ID, dtype: str" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "search_for = \"etopo\"\n", "\n", "url = e.get_search_url(search_for=search_for, response=\"csv\")\n", "\n", "pd.read_csv(url)[\"Dataset ID\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filtering the search with extra words.\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:44:46.127571Z", "iopub.status.busy": "2026-03-06T15:44:46.127337Z", "iopub.status.idle": "2026-03-06T15:44:46.443669Z", "shell.execute_reply": "2026-03-06T15:44:46.442789Z" } }, "outputs": [ { "data": { "text/plain": [ "0 etopo5\n", "1 etopo5_lon180\n", "Name: Dataset ID, dtype: str" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search_for = \"etopo5\"\n", "\n", "url = e.get_search_url(search_for=search_for, response=\"csv\")\n", "\n", "pd.read_csv(url)[\"Dataset ID\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filtering the search with words that should **not** be found.\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:44:46.445293Z", "iopub.status.busy": "2026-03-06T15:44:46.445120Z", "iopub.status.idle": "2026-03-06T15:44:46.968929Z", "shell.execute_reply": "2026-03-06T15:44:46.968105Z" } }, "outputs": [ { "data": { "text/plain": [ "0 etopo5\n", "1 etopo5_lon180\n", "Name: Dataset ID, dtype: str" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search_for = \"etopo5 -lon360\"\n", "\n", "url = e.get_search_url(search_for=search_for, response=\"csv\")\n", "\n", "pd.read_csv(url)[\"Dataset ID\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quoted search or \"phrase search,\" first let us try the unquoted search.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:44:46.970525Z", "iopub.status.busy": "2026-03-06T15:44:46.970341Z", "iopub.status.idle": "2026-03-06T15:44:47.482100Z", "shell.execute_reply": "2026-03-06T15:44:47.481272Z" }, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "68" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search_for = \"ocean bathymetry\"\n", "\n", "url = e.get_search_url(search_for=search_for, response=\"csv\")\n", "\n", "len(pd.read_csv(url)[\"Dataset ID\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Too many datasets because wind, speed, and wind speed are matched. Now let's use\n", "the quoted search to reduce the number of results to only wind speed.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:44:47.483754Z", "iopub.status.busy": "2026-03-06T15:44:47.483589Z", "iopub.status.idle": "2026-03-06T15:44:47.717179Z", "shell.execute_reply": "2026-03-06T15:44:47.716440Z" } }, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search_for = '\"ocean bathymetry\"'\n", "\n", "url = e.get_search_url(search_for=search_for, response=\"csv\")\n", "\n", "len(pd.read_csv(url)[\"Dataset ID\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another common search operation would be to search multiple servers instead of\n", "only one. In erddapy we can achieve that with `search_servers`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:44:47.718824Z", "iopub.status.busy": "2026-03-06T15:44:47.718661Z", "iopub.status.idle": "2026-03-06T15:45:17.822479Z", "shell.execute_reply": "2026-03-06T15:45:17.821555Z" } }, "outputs": [], "source": [ "from erddapy.multiple_server_search import search_servers\n", "\n", "df = search_servers(\n", " query=\"glider\",\n", " servers_list=None,\n", " parallel=True,\n", " protocol=\"tabledap\",\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:45:17.824049Z", "iopub.status.busy": "2026-03-06T15:45:17.823888Z", "iopub.status.idle": "2026-03-06T15:45:17.826936Z", "shell.execute_reply": "2026-03-06T15:45:17.826191Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 6493 entries in this search!\n" ] } ], "source": [ "print(f\"There are {len(df)} entries in this search!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are the servers that have glider data according to our query.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:45:17.828494Z", "iopub.status.busy": "2026-03-06T15:45:17.828306Z", "iopub.status.idle": "2026-03-06T15:45:17.833525Z", "shell.execute_reply": "2026-03-06T15:45:17.832726Z" }, "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "{'http://erddap.cencoos.org/erddap/',\n", " 'http://erddap.secoora.org/erddap/',\n", " 'http://tds.marine.rutgers.edu/erddap/',\n", " 'https://basin.ceoe.udel.edu/erddap/',\n", " 'https://coastwatch.pfeg.noaa.gov/erddap/',\n", " 'https://cwcgom.aoml.noaa.gov/erddap/',\n", " 'https://data.cioospacific.ca/erddap/',\n", " 'https://data.pmel.noaa.gov/pmel/erddap/',\n", " 'https://erddap.bco-dmo.org/erddap/',\n", " 'https://erddap.emodnet-physics.eu/erddap/',\n", " 'https://erddap.griidc.org/erddap/',\n", " 'https://erddap.observations.voiceoftheocean.org/erddap/',\n", " 'https://erddap.ondeckdata.com/erddap/',\n", " 'https://erddap.sensors.ioos.us/erddap/',\n", " 'https://gliders.ioos.us/erddap/',\n", " 'https://linkedsystems.uk/erddap/',\n", " 'https://pae-paha.pacioos.hawaii.edu/erddap/',\n", " 'https://polarwatch.noaa.gov/erddap/',\n", " 'https://spraydata.ucsd.edu/erddap/',\n", " 'https://upwell.pfeg.noaa.gov/erddap/',\n", " 'https://www.ifremer.fr/erddap/',\n", " 'https://www.smartatlantic.ca/erddap/'}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(df[\"Server url\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One way to reduce is to search a subset of the servers with the `servers_list`\n", "argument. We can also use it to search servers that are not part of the awesome\n", "ERDDAP list (https://github.com/IrishMarineInstitute/awesome-erddap).\n", "\n", "One can also perform an advanced search with ERDDAP constraints\n", "`advanced_search_servers`.\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2026-03-06T15:45:17.835137Z", "iopub.status.busy": "2026-03-06T15:45:17.834963Z", "iopub.status.idle": "2026-03-06T15:45:18.288238Z", "shell.execute_reply": "2026-03-06T15:45:18.287483Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | Title | \n", "Institution | \n", "Dataset ID | \n", "Server url | \n", "
|---|---|---|---|---|
| 0 | \n", "Coastal Endurance: Oregon Inshore Surface Moor... | \n", "Ocean Observatories Initiative (OOI) | \n", "ooi-ce01issm-rid16-02-flortd000 | \n", "https://erddap.dataexplorer.oceanobservatories... | \n", "
| 1 | \n", "Coastal Endurance: Oregon Inshore Surface Moor... | \n", "Ocean Observatories Initiative (OOI) | \n", "ooi-ce01issm-rid16-03-ctdbpc000 | \n", "https://erddap.dataexplorer.oceanobservatories... | \n", "
| 2 | \n", "Coastal Endurance: Oregon Inshore Surface Moor... | \n", "Ocean Observatories Initiative (OOI) | \n", "ooi-ce01issm-rid16-03-dostad000 | \n", "https://erddap.dataexplorer.oceanobservatories... | \n", "
| 3 | \n", "Coastal Endurance: Oregon Inshore Surface Moor... | \n", "Ocean Observatories Initiative (OOI) | \n", "ooi-ce01issm-rid16-07-nutnrb000 | \n", "https://erddap.dataexplorer.oceanobservatories... | \n", "
| 4 | \n", "Coastal Endurance: Oregon Inshore Surface Moor... | \n", "Ocean Observatories Initiative (OOI) | \n", "ooi-ce01issm-rid16-06-phsend000 | \n", "https://erddap.dataexplorer.oceanobservatories... | \n", "