QARTOD - Single Test¶
This notebook shows the simplest use case for the IOOS QARTOD package - a single test performed on a timeseries loaded into a Pandas DataFrame. It shows how to define the test configuration and how the output is structured. At the end, there is an example of how to use the flags in data visualization.
Setup¶
[1]:
from bokeh.plotting import output_notebook
output_notebook()
Load data¶
Loads data from a local .csv file and put it into a Pandas DataFrame.
The data are some Water level from a fixed station in Kotzebue, AK.
[2]:
import pandas as pd
url = "https://github.com/ioos/ioos_qc/raw/master/docs/source/examples"
fname = f"{url}/water_level_example.csv"
variable_name = "sea_surface_height_above_sea_level"
data = pd.read_csv(fname, parse_dates=["time"])
data.head()
[2]:
time | timestamp | longitude | latitude | z | sea_surface_height_above_sea_level | |
---|---|---|---|---|---|---|
0 | 2018-09-05 21:00:00+00:00 | 1536181200 | NaN | NaN | 0 | 0.4785 |
1 | 2018-09-05 22:00:00+00:00 | 1536184800 | NaN | NaN | 0 | 0.4420 |
2 | 2018-09-05 23:00:00+00:00 | 1536188400 | NaN | NaN | 0 | 0.4968 |
3 | 2018-09-06 01:00:00+00:00 | 1536195600 | NaN | NaN | 0 | 0.5456 |
4 | 2018-09-06 02:00:00+00:00 | 1536199200 | NaN | NaN | 0 | 0.5761 |
Call test method directly¶
You can all individual QARTOD tests directly, manually passing in data and parameters.
[3]:
from ioos_qc import qartod
qc_results = qartod.spike_test(
inp=data[variable_name], suspect_threshold=0.8, fail_threshold=3
)
print(qc_results)
[2 1 1 ... 1 1 2]
QC configuration and Running¶
While you can call qartod methods directly, we recommend using a QcConfig
object instead. This object encapsulates the test method and parameters into a single dict or JSON object. This makes your configuration more understandable and portable.
The QcConfig
object is a special configuration object that determines which tests are run and defines the configuration for each test. The object’s run()
function runs the appropriate tests and returns a resulting dictionary of flag values.
Descriptions of each test and its inputs can be found in the ioos_qc.qartod module documentation
QartodFlags defines the flag meanings.
The configuration object can be initialized using a dictionary or a YAML file. Here is one example:
[4]:
from ioos_qc.config import QcConfig
qc_config = {
"qartod": {
"spike_test": {
"suspect_threshold": 0.8,
"fail_threshold": 3
}
}
}
qc = QcConfig(qc_config)
and now we can run the test.
[5]:
qc_results = qc.run(
inp=data[variable_name], tinp=data["time"]
)
qc_results
[5]:
defaultdict(collections.OrderedDict,
{'qartod': OrderedDict([('spike_test',
array([2, 1, 1, ..., 1, 1, 2], dtype=uint8))])})
These results can be visualized using Bokeh.
[6]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show
title = "Water Level [MHHW] [m] : Kotzebue, AK"
time = data["time"]
qc_test = qc_results["qartod"]["spike_test"]
p1 = figure(x_axis_type="datetime", title=f"Spike Test : {title}")
p1.grid.grid_line_alpha = 0.3
p1.xaxis.axis_label = "Time"
p1.yaxis.axis_label = "Spike Test Result"
p1.line(time, qc_test, color="blue")
show(gridplot([[p1]], width=800, height=400))
Alternative Configuration Method¶
Here is the same example but using the YAML file instead.
[7]:
qc = QcConfig("./spike_test.yaml")
qc_results = qc.run(
inp=data[variable_name],
tinp=data["timestamp"],
)
qc_results
[7]:
defaultdict(collections.OrderedDict,
{'qartod': OrderedDict([('spike_test',
array([2, 1, 1, ..., 1, 1, 2], dtype=uint8))])})
Using the Flags¶
The array of flags can then be used to filter data or color plots
[8]:
import numpy as np
def plot_results(data, var_name, results, title, test_name):
"""Plot timeseries of original data colored by quality flag
Args:
data: pd.DataFrame of original data including a time variable
var_name: string name of the variable to plot
results: Ordered Dictionary of qartod test results
title: string to add to plot title
test_name: name of the test to determine which flags to use
"""
# Set-up
time = data["time"]
obs = data[var_name]
qc_test = results["qartod"][test_name]
# Create a separate timeseries of each flag value
qc_pass = np.ma.masked_where(qc_test != 1, obs)
qc_suspect = np.ma.masked_where(qc_test != 3, obs)
qc_fail = np.ma.masked_where(qc_test != 4, obs)
qc_notrun = np.ma.masked_where(qc_test != 2, obs)
# start the figure
p1 = figure(x_axis_type="datetime", title=test_name + " : " + title)
p1.grid.grid_line_alpha = 0.3
p1.xaxis.axis_label = "Time"
p1.yaxis.axis_label = "Observation Value"
# plot the data, and the data colored by flag
p1.line(time, obs, legend_label="obs", color="#A6CEE3")
p1.circle(
time, qc_notrun, size=2, legend_label="qc not run", color="gray", alpha=0.2
)
p1.circle(time, qc_pass, size=4, legend_label="qc pass", color="green", alpha=0.5)
p1.circle(
time, qc_suspect, size=4, legend_label="qc suspect", color="orange", alpha=0.7
)
p1.circle(time, qc_fail, size=6, legend_label="qc fail", color="red", alpha=1.0)
# show the plot
show(gridplot([[p1]], width=800, height=400))
[9]:
plot_results(data, variable_name, qc_results, title, "spike_test")
Plot flag values again for comparison.
[10]:
show(gridplot([[p1]], width=800, height=400))