system-test

IOOS DMAC System Integration Test project

Catalog based search for the IOOS Regional Associations using UUID

In the previous example we investigated if it was possible to query the NGDC CSW Catalog to extract records matching an IOOS RA acronym. However, we could not trust the results. Some RAs results in just a few records or no record at all, like AOOS and PacIOOS respectively.

We can make a more robust search using the UUID rather than the acronym. The advantage is that all records will be associated to an UUID, hence a more robust search. The disadvantage is that we need to keep track of a long and unintelligible identification.

As usual let's start by instantiating the csw catalog object.

In [3]:
from owslib.csw import CatalogueServiceWeb

endpoint = 'http://www.ngdc.noaa.gov/geoportal/csw'
csw = CatalogueServiceWeb(endpoint, timeout=30)

We will use the same list of all the Regional Associations as before, but now we will match them with the corresponding UUID from the IOOS registry.

In [4]:
import pandas as pd

ioos_ras = ['AOOS',      # Alaska
            'CaRA',      # Caribbean
            'CeNCOOS',   # Central and Northern California
            'GCOOS',     # Gulf of Mexico
            'GLOS',      # Great Lakes
            'MARACOOS',  # Mid-Atlantic
            'NANOOS',    # Pacific Northwest 
            'NERACOOS',  # Northeast Atlantic 
            'PacIOOS',   # Pacific Islands 
            'SCCOOS',    # Southern California
            'SECOORA']   # Southeast Atlantic

url = 'https://raw.githubusercontent.com/ioos/registry/master/uuid.csv'

df = pd.read_csv(url, index_col=0, header=0, names=['UUID'])
df['UUID'] = df['UUID'].str.strip()

The function below is similar to the one we used before. Note the same matching PropertyIsEqualTo, but different property name (sys.siteuuid rather than apiso:Keywords).

That is the key difference for the robustness of the search. Whereas keywords are not always defined, and might return bogus matching, UUID will always mean one RA.

In [5]:
from owslib.fes import PropertyIsEqualTo

def query_ra(csw, uuid):
    q = PropertyIsEqualTo(propertyname='sys.siteuuid', literal='%s' % uuid)
    csw.getrecords2(constraints=[q], maxrecords=2000, esn='full')
    return csw
Here is what we got:
In [6]:
for ra in ioos_ras:
    try:
        uuid = df.ix[ra]['UUID']
        csw = query_ra(csw, uuid)
        ret = csw.results['returned']
        word = 'records' if ret > 1 else 'record'
        print("{0:>8} has {1:>4} {2}".format(ra, ret, word))
        csw.records.clear()
    except KeyError:
        pass
    AOOS has   74 records
   GCOOS has    8 records
    GLOS has   20 records
MARACOOS has  468 records
  NANOOS has    8 records
NERACOOS has 1109 records
 PacIOOS has  192 records
  SCCOOS has   23 records
 SECOORA has  100 records

Compare the results above with cell [6] from before. Note that now we got 192 records for PacIOOS and 74 for AOOS!

You can see the original notebook here.

In [7]:
HTML(html)
Out[7]:

This post was written as an IPython notebook. It is available for download. You can also try an interactive version on binder.

Comments