QARTOD - Single Test

This notebook shows the simplest use case for the IOOS QARTOD package - a single test performed on a timeseries loaded into a Pandas DataFrame. It shows how to define the test configuration and how the output is structured. At the end, there is an example of how to use the flags in data visualization.

Setup

[1]:
from bokeh.plotting import output_notebook

output_notebook()
Loading BokehJS ...

Load data

Loads data from a local .csv file and put it into a Pandas DataFrame.

The data are some Water level from a fixed station in Kotzebue, AK.

[2]:
import pandas as pd


url = "https://github.com/ioos/ioos_qc/raw/master/docs/source/examples"
fname = f"{url}/water_level_example.csv"

variable_name = "sea_surface_height_above_sea_level"

data = pd.read_csv(fname, parse_dates=["time"])
data.head()
[2]:
time timestamp longitude latitude z sea_surface_height_above_sea_level
0 2018-09-05 21:00:00+00:00 1536181200 NaN NaN 0 0.4785
1 2018-09-05 22:00:00+00:00 1536184800 NaN NaN 0 0.4420
2 2018-09-05 23:00:00+00:00 1536188400 NaN NaN 0 0.4968
3 2018-09-06 01:00:00+00:00 1536195600 NaN NaN 0 0.5456
4 2018-09-06 02:00:00+00:00 1536199200 NaN NaN 0 0.5761

Call test method directly

You can all individual QARTOD tests directly, manually passing in data and parameters.

[3]:
from ioos_qc import qartod

qc_results = qartod.spike_test(
    inp=data[variable_name], suspect_threshold=0.8, fail_threshold=3
)

print(qc_results)
[2 1 1 ... 1 1 2]

QC configuration and Running

While you can call qartod methods directly, we recommend using a QcConfig object instead. This object encapsulates the test method and parameters into a single dict or JSON object. This makes your configuration more understandable and portable.

The QcConfig object is a special configuration object that determines which tests are run and defines the configuration for each test. The object’s run() function runs the appropriate tests and returns a resulting dictionary of flag values.

Descriptions of each test and its inputs can be found in the ioos_qc.qartod module documentation

QartodFlags defines the flag meanings.

The configuration object can be initialized using a dictionary or a YAML file. Here is one example:

[4]:
from ioos_qc.config import QcConfig


qc_config = {
    "qartod": {
        "spike_test": {
            "suspect_threshold": 0.8,
            "fail_threshold": 3
        }
    }
}
qc = QcConfig(qc_config)

and now we can run the test.

[5]:
qc_results = qc.run(
    inp=data[variable_name], tinp=data["time"]
)

qc_results
[5]:
defaultdict(collections.OrderedDict,
            {'qartod': OrderedDict([('spike_test',
                           array([2, 1, 1, ..., 1, 1, 2], dtype=uint8))])})

These results can be visualized using Bokeh.

[6]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show


title = "Water Level [MHHW] [m] : Kotzebue, AK"
time = data["time"]
qc_test = qc_results["qartod"]["spike_test"]

p1 = figure(x_axis_type="datetime", title=f"Spike Test : {title}")
p1.grid.grid_line_alpha = 0.3
p1.xaxis.axis_label = "Time"
p1.yaxis.axis_label = "Spike Test Result"
p1.line(time, qc_test, color="blue")

show(gridplot([[p1]], width=800, height=400))

Alternative Configuration Method

Here is the same example but using the YAML file instead.

[7]:
qc = QcConfig("./spike_test.yaml")

qc_results = qc.run(
    inp=data[variable_name],
    tinp=data["timestamp"],
)

qc_results
[7]:
defaultdict(collections.OrderedDict,
            {'qartod': OrderedDict([('spike_test',
                           array([2, 1, 1, ..., 1, 1, 2], dtype=uint8))])})

Using the Flags

The array of flags can then be used to filter data or color plots

[8]:
import numpy as np


def plot_results(data, var_name, results, title, test_name):
    """Plot timeseries of original data colored by quality flag

    Args:
        data: pd.DataFrame of original data including a time variable
        var_name: string name of the variable to plot
        results: Ordered Dictionary of qartod test results
        title: string to add to plot title
        test_name: name of the test to determine which flags to use
    """
    # Set-up
    time = data["time"]
    obs = data[var_name]
    qc_test = results["qartod"][test_name]

    # Create a separate timeseries of each flag value
    qc_pass = np.ma.masked_where(qc_test != 1, obs)
    qc_suspect = np.ma.masked_where(qc_test != 3, obs)
    qc_fail = np.ma.masked_where(qc_test != 4, obs)
    qc_notrun = np.ma.masked_where(qc_test != 2, obs)

    # start the figure
    p1 = figure(x_axis_type="datetime", title=test_name + " : " + title)
    p1.grid.grid_line_alpha = 0.3
    p1.xaxis.axis_label = "Time"
    p1.yaxis.axis_label = "Observation Value"

    # plot the data, and the data colored by flag
    p1.line(time, obs, legend_label="obs", color="#A6CEE3")
    p1.circle(
        time, qc_notrun, size=2, legend_label="qc not run", color="gray", alpha=0.2
    )
    p1.circle(time, qc_pass, size=4, legend_label="qc pass", color="green", alpha=0.5)
    p1.circle(
        time, qc_suspect, size=4, legend_label="qc suspect", color="orange", alpha=0.7
    )
    p1.circle(time, qc_fail, size=6, legend_label="qc fail", color="red", alpha=1.0)

    # show the plot
    show(gridplot([[p1]], width=800, height=400))
[9]:
plot_results(data, variable_name, qc_results, title, "spike_test")

Plot flag values again for comparison.

[10]:
show(gridplot([[p1]], width=800, height=400))