IOOS QARTOD software (ioos_qc)#
Created: 2020-02-14
Updated: 2022-05-23
This post will demonstrate how to run ioos_qc
on a time-series dataset. ioos_qc
implements the Quality Assurance / Quality Control of Real Time Oceanographic Data (QARTOD).
We will be using the water level data from a fixed station in Kotzebue, AK.
Below we create a simple Quality Assurance/Quality Control (QA/QC) configuration that will be used as input for ioos_qc
. All the interval values are in the same units as the data.
For more information on the tests and recommended values for QA/QC check the documentation of each test and its inputs: https://ioos.github.io/ioos_qc/api/ioos_qc.html#module-ioos_qc.qartod
qc_config = {
"qartod": {
"gross_range_test": {"fail_span": [-10, 10], "suspect_span": [-2, 3]},
"flat_line_test": {
"tolerance": 0.001,
"suspect_threshold": 10800,
"fail_threshold": 21600,
},
"spike_test": {
"suspect_threshold": 0.8,
"fail_threshold": 3,
},
}
}
Now we are ready to load the data, run tests and plot results!
We will get the data from the AOOS ERDDAP server.
import cf_xarray
print(cf_xarray.__version__)
from erddapy import ERDDAP
e = ERDDAP(server="https://erddap.aoos.org/erddap/", protocol="tabledap")
e.dataset_id = "kotzebue-alaska-water-level"
e.constraints = {
"time>=": "2018-09-05T21:00:00Z",
"time<=": "2019-07-10T19:00:00Z",
}
data = e.to_xarray()
data.cf
0.8.4
Discrete Sampling Geometry:
CF Roles: timeseries_id: ['station']
Coordinates:
CF Axes: X: ['longitude']
Y: ['latitude']
T: ['time']
Z: n/a
CF Coordinates: longitude: ['longitude']
latitude: ['latitude']
time: ['time']
vertical: n/a
Cell Measures: area, volume: n/a
Standard Names: latitude: ['latitude']
longitude: ['longitude']
time: ['time']
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: aggregate_quality_flag: ['sea_surface_height_above_sea_level_geoid_mhhw_qc_agg']
altitude: ['z']
sea_surface_height_above_sea_level: ['sea_surface_height_above_sea_level_geoid_mhhw']
sea_surface_height_above_sea_level quality_flag: ['sea_surface_height_above_sea_level_geoid_mhhw_qc_tests']
Bounds: n/a
Grid Mappings: n/a
from ioos_qc.config import QcConfig
qc = QcConfig(qc_config)
# The result is always a list but we only want the first, one and only in this case, variable.
variable_name = data.cf.standard_names["sea_surface_height_above_sea_level"][0]
qc_results = qc.run(
inp=data[variable_name],
tinp=data.cf["T"].to_numpy(),
)
qc_results
defaultdict(collections.OrderedDict,
{'qartod': OrderedDict([('gross_range_test',
array([1, 1, 1, ..., 1, 1, 1], dtype=uint8)),
('flat_line_test',
array([1, 1, 1, ..., 1, 1, 1], dtype=uint8)),
('spike_test',
array([2, 1, 1, ..., 1, 1, 2], dtype=uint8))])})
The results are returned in a dictionary format, similar to the input configuration, with a mask for each test. While the mask is a masked array it should not be applied as such. The results range from 1 to 4 meaning:
data passed the QA/QC
did not run on this data point
flag as suspect
flag as failed
Now we can write a plotting function that will read these results and flag the data.
import matplotlib.pyplot as plt
import numpy as np
def plot_results(data, variable_name, results, title, test_name):
time = data.cf["time"]
obs = data[variable_name]
qc_test = results["qartod"][test_name]
qc_pass = np.ma.masked_where(qc_test != 1, obs)
qc_suspect = np.ma.masked_where(qc_test != 3, obs)
qc_fail = np.ma.masked_where(qc_test != 4, obs)
qc_notrun = np.ma.masked_where(qc_test != 2, obs)
fig, ax = plt.subplots(figsize=(15, 3.75))
fig.set_title = f"{test_name}: {title}"
ax.set_xlabel("Time")
ax.set_ylabel("Observation Value")
kw = {"marker": "o", "linestyle": "none"}
ax.plot(time, obs, label="obs", color="#A6CEE3")
ax.plot(
time, qc_notrun, markersize=2, label="qc not run", color="gray", alpha=0.2, **kw
)
ax.plot(
time, qc_pass, markersize=4, label="qc pass", color="green", alpha=0.5, **kw
)
ax.plot(
time,
qc_suspect,
markersize=4,
label="qc suspect",
color="orange",
alpha=0.7,
**kw,
)
ax.plot(time, qc_fail, markersize=6, label="qc fail", color="red", alpha=1.0, **kw)
ax.grid(True)
title = "Water Level [MHHW] [m] : Kotzebue, AK"
The gross range test test should fail data outside the \(\\pm\) 10 range and suspect data below -2, and greater than 3. As one can easily see all the major spikes are flagged as expected.
An actual spike test, based on a data increase threshold, flags similar spikes to the gross range test but also indetifies other suspect unusual increases in the series.
The flat line test identifies issues with the data where values are “stuck.”
ioos_qc
succefully identified a huge portion of the data where that happens and flagged a smaller one as suspect. (Zoom in the red point to the left to see this one.)
This notebook was adapted from Jessica Austin and Kyle Wilcox’s original ioos_qc examples. Please see the ioos_qc
documentation for more examples.