QARTOD - Single Test¶

This notebook shows the simplest use case for the IOOS QARTOD package - a single test performed on a timeseries loaded into a Pandas DataFrame. It shows how to define the test configuration and how the output is structured. At the end, there is an example of how to use the flags in data visualization.

Setup¶

[1]:

from bokeh.plotting import output_notebook

output_notebook()

Loading BokehJS ...

Load data¶

Loads data from a local .csv file and put it into a Pandas DataFrame.

The data are some Water level from a fixed station in Kotzebue, AK.

[2]:

import pandas as pd

url = "https://github.com/ioos/ioos_qc/raw/master/docs/source/examples"
fname = f"{url}/water_level_example.csv"

variable_name = "sea_surface_height_above_sea_level"

data = pd.read_csv(fname, parse_dates=["time"])
data.head()

[2]:

	time	timestamp	longitude	latitude	sea_surface_height_above_sea_level
0	2018-09-05 21:00:00+00:00	1536181200	NaN	NaN	0.4785
1	2018-09-05 22:00:00+00:00	1536184800	NaN	NaN	0.4420
2	2018-09-05 23:00:00+00:00	1536188400	NaN	NaN	0.4968
3	2018-09-06 01:00:00+00:00	1536195600	NaN	NaN	0.5456
4	2018-09-06 02:00:00+00:00	1536199200	NaN	NaN	0.5761

Call test method directly¶

You can all individual QARTOD tests directly, manually passing in data and parameters.

[3]:

from ioos_qc import qartod

qc_results = qartod.spike_test(
    inp=data[variable_name],
    suspect_threshold=0.8,
    fail_threshold=3,
)

print(qc_results)

[2 1 1 ... 1 1 2]

QC configuration and Running¶

While you can call qartod methods directly, we recommend using a QcConfig object instead. This object encapsulates the test method and parameters into a single dict or JSON object. This makes your configuration more understandable and portable.

The QcConfig object is a special configuration object that determines which tests are run and defines the configuration for each test. The object’s run() function runs the appropriate tests and returns a resulting dictionary of flag values.

Descriptions of each test and its inputs can be found in the ioos_qc.qartod module documentation

QartodFlags defines the flag meanings.

The configuration object can be initialized using a dictionary or a YAML file. Here is one example:

[4]:

from ioos_qc.config import QcConfig

qc_config = {
    "qartod": {
        "spike_test": {
            "suspect_threshold": 0.8,
            "fail_threshold": 3,
        },
    },
}
qc = QcConfig(qc_config)

and now we can run the test.

[5]:

qc_results = qc.run(
    inp=data[variable_name],
    tinp=data["time"],
)

qc_results

/home/filipe/ioos_qc/ioos_qc/utils.py:195: UserWarning: no explicit representation of timezones available for np.datetime64
  return np.array(dates, dtype="datetime64[ns]")

[5]:

defaultdict(collections.OrderedDict,
            {'qartod': OrderedDict([('spike_test',
                           array([2, 1, 1, ..., 1, 1, 2], shape=(7241,), dtype=uint8))])})

These results can be visualized using Bokeh.

[6]:

from bokeh.layouts import gridplot
from bokeh.plotting import figure, show

title = "Water Level [MHHW] [m] : Kotzebue, AK"
time = data["time"]
qc_test = qc_results["qartod"]["spike_test"]

p1 = figure(x_axis_type="datetime", title=f"Spike Test : {title}")
p1.grid.grid_line_alpha = 0.3
p1.xaxis.axis_label = "Time"
p1.yaxis.axis_label = "Spike Test Result"
p1.line(time, qc_test, color="blue")

show(gridplot([[p1]], width=800, height=400))

/home/filipe/micromamba/envs/IOOSQC/lib/python3.13/site-packages/bokeh/util/serialization.py:242: UserWarning: no explicit representation of timezones available for np.datetime64
  return convert(array.astype("datetime64[us]"))

Alternative Configuration Method¶

Here is the same example but using the YAML file instead.

[7]:

qc = QcConfig("./spike_test.yaml")

qc_results = qc.run(
    inp=data[variable_name],
    tinp=data["timestamp"],
)

qc_results

[7]:

defaultdict(collections.OrderedDict,
            {'qartod': OrderedDict([('spike_test',
                           array([2, 1, 1, ..., 1, 1, 2], shape=(7241,), dtype=uint8))])})

Using the Flags¶

The array of flags can then be used to filter data or color plots

[8]:

import numpy as np


def plot_results(data, var_name, results, title, test_name):
    """Plot timeseries of original data colored by quality flag

    Args:
    ----
        data: pd.DataFrame of original data including a time variable
        var_name: string name of the variable to plot
        results: Ordered Dictionary of qartod test results
        title: string to add to plot title
        test_name: name of the test to determine which flags to use

    """
    # Set-up
    time = data["time"]
    obs = data[var_name]
    qc_test = results["qartod"][test_name]

    # Create a separate timeseries of each flag value
    qc_pass = np.ma.masked_where(qc_test != 1, obs)
    qc_suspect = np.ma.masked_where(qc_test != 3, obs)
    qc_fail = np.ma.masked_where(qc_test != 4, obs)
    qc_notrun = np.ma.masked_where(qc_test != 2, obs)

    # start the figure
    p1 = figure(x_axis_type="datetime", title=test_name + " : " + title)
    p1.grid.grid_line_alpha = 0.3
    p1.xaxis.axis_label = "Time"
    p1.yaxis.axis_label = "Observation Value"

    # plot the data, and the data colored by flag
    p1.line(time, obs, legend_label="obs", color="#A6CEE3")
    p1.circle(
        time,
        qc_notrun,
        size=2,
        legend_label="qc not run",
        color="gray",
        alpha=0.2,
    )
    p1.circle(time, qc_pass, size=4, legend_label="qc pass", color="green", alpha=0.5)
    p1.circle(
        time,
        qc_suspect,
        size=4,
        legend_label="qc suspect",
        color="orange",
        alpha=0.7,
    )
    p1.circle(time, qc_fail, size=6, legend_label="qc fail", color="red", alpha=1.0)

    # show the plot
    show(gridplot([[p1]], width=800, height=400))

[9]:

plot_results(data, variable_name, qc_results, title, "spike_test")

BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
/home/filipe/micromamba/envs/IOOSQC/lib/python3.13/site-packages/bokeh/util/serialization.py:242: UserWarning: no explicit representation of timezones available for np.datetime64
  return convert(array.astype("datetime64[us]"))

Plot flag values again for comparison.

[10]:

show(gridplot([[p1]], width=800, height=400))