QARTOD - NetCDF Examples¶
This notebook provides examples of running QARTOD on a netCDF file. For background, see XarrayStream and CFNetCDFStore in the docs.
There are multiple ways that you can integrate ioos_qc
into your netcdf-based workflow.
Option A: Store test configurations externally, pass your configuration and netcdf file to ioos_qc
, and manually update netcdf variables with results of the test
In this case, you extract variables from the netcdf file, use
methods to run tests, and then manually update the netcdf file with resultsThis provides the most control, but doesn’t take advantage of shared code in the
libraryIt’s up to you to ensure your resulting netcdf is self-describing and CF-compliant
Option B: Store test configurations externally, then pass your configuration and netcdf file to ioos_qc
, and let it run tests and update the file with results
This takes advantage of
code to store results and configuration in the netCDF file, and ensure a self-describing, CF-compliant fileManaging your test configurations outside the file is better when dealing with a large number of datasets/configurations
Option C: Store test configurations in your netcdf file, then pass that file to ioos_qc
and let it run tests and update the file with results
You only need to add test configurations to the file one time, and after that you could run tests over and over again on the same file
This option is the most portable, since the data, configuration, and results are all in one place
The downside is, test configuration management is difficult since it’s stored in the file instead of some common external location
from bokeh.plotting import output_notebook
def plot_ncresults(ncdata, var_name, results, title, test_name):
"""Helper method to plot QC results using Bokeh."""
qc_test = next(r for r in results if r.stream_id == var_name and r.test == test_name)
time = np.array(qc_test.tinp)
obs = np.array(
results = qc_test.results
qc_pass = != 1, obs)
num_pass = (results == 1).sum()
qc_suspect = != 3, obs)
num_suspect = (results == 3).sum()
qc_fail = != 4, obs)
num_fail = (results == 4).sum()
qc_notrun = != 2, obs)
p1 = figure(
title=f"{test_name} : {title} : p/s/f= {num_pass}/{num_suspect}/{num_fail}",
p1.grid.grid_line_alpha = 0.3
p1.xaxis.axis_label = "Time"
p1.yaxis.axis_label = "Observation Value"
p1.line(time, obs, legend_label="obs", color="#A6CEE3")
legend_label="qc not run",
), qc_pass, size=4, legend_label="qc pass", color="green", alpha=0.5)
legend_label="qc suspect",
), qc_fail, size=6, legend_label="qc fail", color="red", alpha=1.0)
show(gridplot([[p1]], width=800, height=400))
import os
import shutil
import tempfile
Load the netCDF dataset¶
The example netCDF dataset is a pCO2 sensor from the Ocean Observatories Initiative (OOI) Coastal Endurance Inshore Surface Mooring instrument frame at 7 meters depth located on the Oregon Shelf break.
import xarray as xr
from erddapy.core.url import urlopen
from netCDF4 import Dataset
def open_from_https(url):
data = urlopen(fname)
nc = Dataset("pco2_netcdf_example",
return xr.open_dataset(xr.backends.NetCDF4DataStore(nc))
url = ""
fname = f"{url}/"
pco2 = open_from_https(fname)
<xarray.Dataset> Size: 3MB Dimensions: (time: 7339, spectrum: 14) Coordinates: obs (time) int64 59kB ... * time (time) datetime64[ns] 59kB 2015... lat (time) float64 59kB ... lon (time) float64 59kB ... Dimensions without coordinates: spectrum Data variables: (12/30) deployment (time) int32 29kB ... id (time) |S64 470kB ... dcl_controller_timestamp (time) object 59kB ... driver_timestamp (time) datetime64[ns] 59kB ... ingestion_timestamp (time) datetime64[ns] 59kB ... internal_timestamp (time) datetime64[ns] 59kB ... ... ... absorbance_ratio_620_qc_executed (time) float32 29kB ... absorbance_ratio_620_qc_results (time) float32 29kB ... pco2w_thermistor_temperature_qc_executed (time) float32 29kB ... pco2w_thermistor_temperature_qc_results (time) float32 29kB ... pco2_seawater_qc_executed (time) float32 29kB ... pco2_seawater_qc_results (time) float32 29kB ... Attributes: (12/71) node: RID16 comment: publisher_email: sourceUrl: collection_method: recovered_host stream: pco2w_abc_dcl_instrument_recovered ... ... geospatial_vertical_units: meters geospatial_vertical_resolution: 0.1 geospatial_vertical_positive: down DODS.strlen: 36 DODS.dimName: string36 DODS_EXTRA.Unlimited_Dimension: obs
Plot the raw data.
import numpy as np
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show
data = pco2["pco2_seawater"]
t = np.array(pco2["time"])
x = np.array(data)
p1 = figure(x_axis_type="datetime", title="pco2_seawater")
p1.grid.grid_line_alpha = 0.3
p1.xaxis.axis_label = "Time"
p1.yaxis.axis_label = data.units
p1.line(t, x)
show(gridplot([[p1]], width=800, height=400))
QC Configuration¶
Here we define the generic config object for multiple QARTOD tests, plus the aggregate/rollup flag.
The key “pco2_seawater” indicates which variable in the netcdf file this config should run against.
from ioos_qc.config import Config
config = {
"pco2_seawater": {
"qartod": {
"gross_range_test": {"suspect_span": [200, 2400], "fail_span": [0, 3000]},
"spike_test": {"suspect_threshold": 500, "fail_threshold": 1000},
"location_test": {"bbox": [-124.5, 44, -123.5, 45]},
"flat_line_test": {
"tolerance": 1,
"suspect_threshold": 3600,
"fail_threshold": 86400,
"aggregate": {},
config = """
- streams:
suspect_span: [200, 2400]
fail_span: [0, 3000]
suspect_threshold: 500
fail_threshold: 1000
bbox: [-124.5, 44, -123.5, 45]
tolerance: 1
suspect_threshold: 3600
fail_threshold: 86400
c = Config(config)
Option A: Manually run tests and store results¶
Store test configurations externally, pass your configuration and netcdf file to ioos_qc
, and manually update netcdf variables with results of the test.
Note: For tests that need tinp, zinp, etc, use args to define the t, x, y, z dimensions. In this case, we need latitude and longitude for the location test.
from ioos_qc.qartod import aggregate
from ioos_qc.results import CollectedResult, collect_results
from ioos_qc.streams import XarrayStream
qc = XarrayStream(pco2, lon="lon", lat="lat")
# Store as a list to run QC now
runner = list(
results = collect_results(runner, how="list")
agg = CollectedResult(
[<CollectedResult stream_id=pco2_seawater package=qartod test=gross_range_test>,
<CollectedResult stream_id=pco2_seawater package=qartod test=spike_test>,
<CollectedResult stream_id=pco2_seawater package=qartod test=location_test>,
<CollectedResult stream_id=pco2_seawater package=qartod test=flat_line_test>,
<CollectedResult stream_id= package=qartod test=qc_rollup>]
plot_ncresults(pco2, "pco2_seawater", results, "pCO2 seawater", "gross_range_test")
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
plot_ncresults(pco2, "pco2_seawater", results, "pCO2 seawater", "spike_test")
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
plot_ncresults(pco2, "pco2_seawater", results, "pCO2 seawater", "flat_line_test")
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
plot_ncresults(pco2, "pco2_seawater", results, "pCO2 seawater", "location_test")
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
# To see overall results, use the aggregate test
plot_ncresults(pco2, "", results, "pCO2 seawater", "qc_rollup")
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
# Store results manually
# This is just a simple example and stores the aggregate test flag as a variable.
# You can expand upon this, or use the ioos_qc library to store the results for you (see subsequent examples)
# and use xarray's to_xxx methods to serialize results to whichever format you prefer
agg_da = xr.DataArray(agg.results, {}, ("time",))
output_xds = pco2.assign(
<xarray.DataArray 'qartod_aggregate' (time: 7339)> Size: 7kB array([4, 4, 4, ..., 3, 1, 1], shape=(7339,), dtype=uint8) Coordinates: obs (time) int64 59kB ... * time (time) datetime64[ns] 59kB 2015-10-08T19:35:30.569000448 ... 201... lat (time) float64 59kB 44.66 44.66 44.66 44.66 ... 44.66 44.66 44.66 lon (time) float64 59kB -124.1 -124.1 -124.1 ... -124.1 -124.1 -124.1
Option B¶
Store test configurations externally, then pass your configuration and netcdf file to ioos_qc
, and let it run tests and update the file with results
# Using the CFNetCDFStore Store we can serialize our results back to a CF compliant netCDF file easily
from pocean.dsg import OrthogonalMultidimensionalTimeseries
from ioos_qc.stores import CFNetCDFStore
# We use the `results` from Option A so we don't repeat ourselves.
store = CFNetCDFStore(runner)
outfile_b = os.path.join(tempfile.gettempdir(), "")
if os.path.exists(outfile_b):
qc_all =
# The netCDF file to export OR append to
# The DSG class to save the results as
# The QC config that was run
# Should we write the data or just metadata? Defaults to false
# Compute a total aggregate?
# Any kwargs to pass to the DSG class
reduce_dims=True, # Remove dimensions of size 1
unlimited=False, # Don't make the record dimension unlimited
unique_dims=True, # Support loading into xarray
# Explore results: qc test variables are named [variable_name]_qartod_[test_name]
out_b = xr.open_dataset(outfile_b)
<xarray.Dataset> Size: 294kB Dimensions: (time_dim: 7339) Coordinates: time (time_dim) datetime64[ns] 59kB ... lat float64 8B ... lon float64 8B ... z float64 8B ... Dimensions without coordinates: time_dim Data variables: crs int32 4B ... station int32 4B ... pco2_seawater (time_dim) float64 59kB ... pco2_seawater_qartod_gross_range_test (time_dim) float32 29kB ... pco2_seawater_qartod_spike_test (time_dim) float32 29kB ... pco2_seawater_qartod_location_test (time_dim) float32 29kB ... pco2_seawater_qartod_flat_line_test (time_dim) float64 59kB ... qartod_qc_rollup (time_dim) float32 29kB ... Attributes: Conventions: CF-1.6 date_created: 2025-02-04T19:19:00Z featureType: timeseries cdm_data_type: Timeseries
# Gross range test
# Note how the config used is stored in the ioos_qc_* variables
<xarray.DataArray 'pco2_seawater_qartod_gross_range_test' (time_dim: 7339)> Size: 29kB [7339 values with dtype=float32] Coordinates: time (time_dim) datetime64[ns] 59kB ... lat float64 8B ... lon float64 8B ... z float64 8B ... Dimensions without coordinates: time_dim Attributes: standard_name: gross_range_test_quality_flag long_name: Gross Range Test Quality Flag flag_values: [1 2 3 4 9] flag_meanings: GOOD UNKNOWN SUSPECT FAIL MISSING valid_min: 1 valid_max: 9 ioos_qc_module: qartod ioos_qc_test: gross_range_test ioos_qc_target: pco2_seawater ioos_qc_config: {"suspect_span": [200, 2400], "fail_span": [0, 3000]}
# Aggregate/rollup flag
<xarray.DataArray 'qartod_qc_rollup' (time_dim: 7339)> Size: 29kB [7339 values with dtype=float32] Coordinates: time (time_dim) datetime64[ns] 59kB ... lat float64 8B ... lon float64 8B ... z float64 8B ... Dimensions without coordinates: time_dim Attributes: standard_name: aggregate_quality_flag long_name: Aggregate Quality Flag flag_values: [1 2 3 4 9] flag_meanings: GOOD UNKNOWN SUSPECT FAIL MISSING valid_min: 1 valid_max: 9 ioos_qc_module: qartod ioos_qc_test: qc_rollup ioos_qc_target:
Option C¶
Store test configurations in your netcdf file, then pass that file to ioos_qc
and let it run tests and update the file with results.
In the example above, we used the library to store results and config in the netcdf file itself. At this point, we can load that same file and run tests again, without having to re-define config. This is very powerful!
# Create a copy of the output from B
infile_c = os.path.join(tempfile.gettempdir(), "")
shutil.copy(outfile_b, infile_c)
# Load this file into the Config object
input_c = xr.open_dataset(infile_c)
qc_config_c = Config(input_c)
# The QC functions that will be run are extracted from the netCDF attributes
[<Call stream_id=pco2_seawater function=qartod.gross_range_test(suspect_span=[200, 2400], fail_span=[0, 3000])>,
<Call stream_id=pco2_seawater function=qartod.spike_test(suspect_threshold=500, fail_threshold=1000)>,
<Call stream_id=pco2_seawater function=qartod.location_test(bbox=[-124.5, 44, -123.5, 45])>,
<Call stream_id=pco2_seawater function=qartod.flat_line_test(tolerance=1, suspect_threshold=3600, fail_threshold=86400)>]
# We can use that Config just like any other config object.
# Here we will re-run the netCDF file with the config extracted from the same netCDF file
# Setup input stream from file
xrs_c = XarrayStream(input_c, lon="lon", lat="lat")
# Setup config run
runner_c = list(
# Collect QC run results
results = collect_results(runner_c, how="list")
# Compute Aggregate
agg = CollectedResult(
store = CFNetCDFStore(runner)
outfile_c = os.path.join(tempfile.gettempdir(), "")
if os.path.exists(outfile_c):
qc_all =
# The netCDF file to export OR append to
# The DSG class to save the results as
# The QC config that was run
# Should we write the data or just metadata? Defaults to false
# Compute a total aggregate?
# Any kwargs to pass to the DSG class
reduce_dims=True, # Remove dimensions of size 1
unlimited=False, # Don't make the record dimension unlimited
unique_dims=True, # Support loading into xarray
# Explore results: qc test variables are named [variable_name]_qartod_[test_name]
out_c = xr.open_dataset(outfile_c)
<xarray.Dataset> Size: 294kB Dimensions: (time_dim: 7339) Coordinates: time (time_dim) datetime64[ns] 59kB ... lat float64 8B ... lon float64 8B ... z float64 8B ... Dimensions without coordinates: time_dim Data variables: crs int32 4B ... station int32 4B ... pco2_seawater (time_dim) float64 59kB ... pco2_seawater_qartod_gross_range_test (time_dim) float32 29kB ... pco2_seawater_qartod_spike_test (time_dim) float32 29kB ... pco2_seawater_qartod_location_test (time_dim) float32 29kB ... pco2_seawater_qartod_flat_line_test (time_dim) float64 59kB ... qartod_qc_rollup (time_dim) float32 29kB ... Attributes: Conventions: CF-1.6 date_created: 2025-02-04T19:19:00Z featureType: timeseries cdm_data_type: Timeseries
# Aggregate/rollup flag
<xarray.DataArray 'qartod_qc_rollup' (time_dim: 7339)> Size: 29kB [7339 values with dtype=float32] Coordinates: time (time_dim) datetime64[ns] 59kB ... lat float64 8B ... lon float64 8B ... z float64 8B ... Dimensions without coordinates: time_dim Attributes: standard_name: aggregate_quality_flag long_name: Aggregate Quality Flag flag_values: [1 2 3 4 9] flag_meanings: GOOD UNKNOWN SUSPECT FAIL MISSING valid_min: 1 valid_max: 9 ioos_qc_module: qartod ioos_qc_test: qc_rollup ioos_qc_target: