Searching datasets

erddapy can wrap the same form-like search capabilities of ERDDAP with the search_for keyword.

[1]:
from erddapy import ERDDAP


e = ERDDAP(server="https://pae-paha.pacioos.hawaii.edu/erddap", protocol="griddap")

Single word search.

[2]:
import pandas as pd

search_for = "etopo"

url = e.get_search_url(search_for=search_for, response="csv")

pd.read_csv(url)["Dataset ID"]
[2]:
0           etopo1_bedrock
1    etopo1_bedrock_lon360
2               etopo1_ice
3        etopo1_ice_lon360
4                   etopo5
5            etopo5_lon180
Name: Dataset ID, dtype: object

Filtering the search with extra words.

[3]:
search_for = "etopo5"

url = e.get_search_url(search_for=search_for, response="csv")

pd.read_csv(url)["Dataset ID"]
[3]:
0           etopo5
1    etopo5_lon180
Name: Dataset ID, dtype: object

Filtering the search with words that should not be found.

[4]:
search_for = "etopo5 -lon360"

url = e.get_search_url(search_for=search_for, response="csv")

pd.read_csv(url)["Dataset ID"]
[4]:
0           etopo5
1    etopo5_lon180
Name: Dataset ID, dtype: object

Quoted search or “phrase search,” first let us try the unquoted search.

[5]:
search_for = "ocean bathymetry"

url = e.get_search_url(search_for=search_for, response="csv")

len(pd.read_csv(url)["Dataset ID"])
[5]:
70

Too many datasets because wind, speed, and wind speed are matched. Now let’s use the quoted search to reduce the number of results to only wind speed.

[6]:
search_for = '"ocean bathymetry"'

url = e.get_search_url(search_for=search_for, response="csv")

len(pd.read_csv(url)["Dataset ID"])
[6]:
6

Another common search operation would be to search multiple servers instead of only one. In erddapy we can achieve that with search_servers:

[7]:
from erddapy.multiple_server_search import search_servers

df = search_servers(
    query="glider",
    servers_list=None,
    parallel=True,
    protocol="tabledap",
)
[8]:
print(f"There are {len(df)} entries in this search!")
There are 4869 entries in this search!

These are the servers that have glider data according to our query.

[9]:
set(df["Server url"])
[9]:
{'http://erddap.cencoos.org/erddap/',
 'http://erddap.secoora.org/erddap/',
 'http://erddap.sochic-h2020.eu/erddap/',
 'http://tds.marine.rutgers.edu/erddap/',
 'https://cwcgom.aoml.noaa.gov/erddap/',
 'https://erddap-goldcopy.dataexplorer.oceanobservatories.org/erddap/',
 'https://erddap.axiomdatascience.com/erddap/',
 'https://erddap.bco-dmo.org/erddap/',
 'https://erddap.emodnet-physics.eu/erddap/',
 'https://erddap.griidc.org/erddap/',
 'https://erddap.sensors.ioos.us/erddap/',
 'https://gliders.ioos.us/erddap/',
 'https://pae-paha.pacioos.hawaii.edu/erddap/',
 'https://polarwatch.noaa.gov/erddap/',
 'https://spraydata.ucsd.edu/erddap/',
 'https://upwell.pfeg.noaa.gov/erddap/',
 'https://www.ifremer.fr/erddap/',
 'https://www.smartatlantic.ca/erddap/'}

One way to reduce is to search a subset of the servers with the servers_list argument. We can also use it to search servers that are not part of the awesome ERDDAP list (https://github.com/IrishMarineInstitute/awesome-erddap).

One can also perform an advanced search with ERDDAP constraints advanced_search_servers.

[10]:
from erddapy.multiple_server_search import advanced_search_servers


min_time = "2017-07-01T00:00:00Z"
max_time = "2017-09-01T00:00:00Z"
min_lon, max_lon = -127, -123.75
min_lat, max_lat = 43, 48
standard_name = "sea_water_practical_salinity"


kw = {
    "standard_name": standard_name,
    "min_lon": min_lon,
    "max_lon": max_lon,
    "min_lat": min_lat,
    "max_lat": max_lat,
    "min_time": min_time,
    "max_time": max_time,
    "cdm_data_type": "timeseries",  # let's exclude AUV's tracks
}


servers = {
    "ooi": "https://erddap.dataexplorer.oceanobservatories.org/erddap/",
    "ioos": "https://erddap.sensors.ioos.us/erddap/",
}


df = advanced_search_servers(servers_list=servers.values(), **kw)

df.head()
[10]:
Title Institution Dataset ID Server url
0 Coastal Endurance: Oregon Inshore Surface Moor... Ocean Observatories Initiative (OOI) ooi-ce01issm-rid16-02-flortd000 https://erddap.dataexplorer.oceanobservatories...
1 Coastal Endurance: Oregon Inshore Surface Moor... Ocean Observatories Initiative (OOI) ooi-ce01issm-rid16-03-ctdbpc000 https://erddap.dataexplorer.oceanobservatories...
2 Coastal Endurance: Oregon Inshore Surface Moor... Ocean Observatories Initiative (OOI) ooi-ce01issm-rid16-03-dostad000 https://erddap.dataexplorer.oceanobservatories...
3 Coastal Endurance: Oregon Inshore Surface Moor... Ocean Observatories Initiative (OOI) ooi-ce01issm-rid16-07-nutnrb000 https://erddap.dataexplorer.oceanobservatories...
4 Coastal Endurance: Oregon Inshore Surface Moor... Ocean Observatories Initiative (OOI) ooi-ce01issm-rid16-06-phsend000 https://erddap.dataexplorer.oceanobservatories...