ioos_qc package¶
Subpackages¶
- ioos_qc.config_creator package
CreatorConfig
QcConfigCreator
QcConfigCreator.allowed_stats
QcConfigCreator.allowed_operators
QcConfigCreator._create_test_section()
QcConfigCreator._determine_dataset_years()
QcConfigCreator._get_stats()
QcConfigCreator._get_subset()
QcConfigCreator._load_datasets()
QcConfigCreator._var2var_in_file()
QcConfigCreator.create_config()
QcConfigCreator.var2dataset()
QcVariableConfig
- Submodules
- ioos_qc.config_creator.config_creator module
CreatorConfig
QcConfigCreator
QcConfigCreator.allowed_stats
QcConfigCreator.allowed_operators
QcConfigCreator._create_test_section()
QcConfigCreator._determine_dataset_years()
QcConfigCreator._get_stats()
QcConfigCreator._get_subset()
QcConfigCreator._load_datasets()
QcConfigCreator._var2var_in_file()
QcConfigCreator.create_config()
QcConfigCreator.var2dataset()
QcVariableConfig
to_json()
- ioos_qc.config_creator.fx_parser module
- ioos_qc.config_creator.get_assets module
- ioos_qc.config_creator.make_config module
Submodules¶
ioos_qc.argo module¶
Tests based on the ARGO QC manual.
- ioos_qc.argo.pressure_increasing_test(inp)[source]¶
Returns an array of flag values where each input is flagged with SUSPECT if it does not monotonically increase
Ref: ARGO QC Manual: 8. Pressure increasing test
- Parameters:
inp – Pressure values as a numeric numpy array or a list of numbers.
- Returns:
A masked array of flag values equal in size to that of the input.
- ioos_qc.argo.speed_test(lon, lat, tinp, suspect_threshold, fail_threshold)[source]¶
Checks that the calculated speed between two points is within reasonable bounds.
- This test calculates a speed between subsequent points by
using latitude and longitude to calculate the distance between points
calculating the time difference between those points
checking if distance/time_diff exceeds the given threshold(s)
Missing and masked data is flagged as UNKNOWN.
- If this test fails, it typically means that either a position or time is bad data,
or that a platform is mislabeled.
Ref: ARGO QC Manual: 5. Impossible speed test
- Parameters:
lon (
Sequence
[Real
]) – Longitudes as a numeric numpy array or a list of numbers.lat (
Sequence
[Real
]) – Latitudes as a numeric numpy array or a list of numbers.tinp (
Sequence
[Real
]) – Time data as a sequence of datetime objects compatible with pandas DatetimeIndex. This includes numpy datetime64, python datetime objects and pandas Timestamp object. ie. pd.DatetimeIndex([datetime.utcnow(), np.datetime64(), pd.Timestamp.now()] If anything else is passed in the format is assumed to be seconds since the unix epoch.suspect_threshold (
float
) – A float value representing a speed, in meters per second. Speeds exceeding this will be flagged as SUSPECT.fail_threshold (
float
) – A float value representing a speed, in meters per second. Speeds exceeding this will be flagged as FAIL.
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
ioos_qc.axds module¶
Tests based on the IOOS QARTOD manuals.
- ioos_qc.axds.span¶
alias of
Span
- ioos_qc.axds.valid_range_test(inp, valid_span, dtype=None, start_inclusive=True, end_inclusive=False)[source]¶
Checks that values are within a min/max range. This is not unlike a qartod.gross_range_test with fail and suspect bounds being equal, except that here we specify the inclusive range that should pass instead of the exclusive bounds which should fail. This also supports datetime-like objects where as the qartod.gross_range_test method only supports numerics.
Given a 2-tuple of minimum/maximum values, flag data outside of the given range as FAIL data. Missing and masked data is flagged as UNKNOWN. The first span value is treated as inclusive and the second span valid is treated as exclusive. To change this behavior you can use the parameters start_inclusive and end_inclusive.
- Parameters:
inp (Sequence[any]) – Data as a sequence of objects compatible with the fail_span objects
fail_span (Tuple[any, any]) – 2-tuple range which to flag outside data as FAIL. Objects should be of equal format to that of the inp parameter as they will be checked for equality without type conversion.
dtype (np.dtype) – Optional. If your data is not already numpy-typed you can specify its dtype here.
start_inclusive (bool) – Optional. If the starting span value should be inclusive (True) or exclusive (False).
end_inclusive (bool) – Optional. If the ending span value should be inclusive (True) or exclusive (False).
- Returns:
A masked array of flag values equal in size to that of the input.
- Return type:
np.ma.core.MaskedArray
ioos_qc.config module¶
QC Config objects
Module to store the different QC modules in ioos_qc
- ioos_qc.config.tw
The TimeWindow namedtuple definition
- Type:
namedtuple
- class ioos_qc.config.Call(stream_id, call, context=<factory>, attrs=<factory>)[source]
Bases:
object
- property args: tuple
-
attrs:
dict
-
call:
partial
- config()[source]
- Return type:
dict
-
context:
Context
- property func: str
- property is_aggregate: bool
- property kwargs: dict
- property method: str
- property method_path: str
- property module: str
- property region
- run(**passedkwargs)[source]
-
stream_id:
str
- property window
- class ioos_qc.config.Config(source, version=None, default_stream_key='_stream')[source]
Bases:
object
A class to load any ioos_qc configuration setup into a list of callable objects that will run quality checks. The resulting list of quality checks parsed from a config file can be appended and edited until they are ready to be run. On run the checks are consolidated into an efficient structure for indexing the dataset (stream) it is run against so things like subsetting by time and space only happen once for each test in the same Context.
How the individual checks are collected is up to each individual Stream implementation, this class only pairs various formats and versions of a config into a list of Call objects.
- add(source)[source]
Adds a source of calls to this Config. See extract_calls for information on the types of objects accepted as the source parameter. The changes the internal .calls attribute and returns None.
- Parameters:
source ([any]) – The source of Call objects, this can be a: * Call object * list of Call objects * list of objects with the ‘calls’ attribute * Config object * Object with the ‘calls’ attribute
- Return type:
None
- property aggregate_calls
- property calls
- calls_by_stream_id(stream_id)[source]
- Return type:
List
[Call
]
- property contexts
Group the calls into context groups and return them
- has(stream_id, method)[source]
- property stream_ids
Return a list of unique stream_ids for the Config
- class ioos_qc.config.Context(window=<factory>, region=None, attrs=<factory>)[source]
Bases:
object
-
attrs:
dict
-
region:
GeometryCollection
= None
-
window:
TimeWindow
-
attrs:
- class ioos_qc.config.ContextConfig(source)[source]
Bases:
object
A collection of a Region, a TimeWindow and a list of StreamConfig objects
Defines a set of quality checks to run against multiple input streams. This can include a region and a time window to subset any DataStreams by before running checks.
region: None window: starting: 2020-01-01T00:00:00Z ending: 2020-04-01T00:00:00Z streams: variable1: # stream_id qartod: # StreamConfig location_test: bbox: [-80, 40, -70, 60] variable2: # stream_id qartod: # StreamConfig gross_range_test: suspect_span: [1, 11] fail_span: [0, 12]
- Helper methods exist to run this check against a different inputs:
pandas.DataFrame, dask.DataFrame, netCDF4.Dataset, xarray.Dataset, ERDDAP URL
- config
dict representation of the parsed ContextConfig source
- Type:
odict
- region
A shapely object representing the valid geographic region
- Type:
GeometryCollection
- window
A TimeWindow object representing the valid time period
- Type:
namedtuple
- streams
dict representation of the parsed StreamConfig objects
- Type:
odict
- add(source)[source]
Adds a source of calls to this ContextConfig. See extract_calls for information on the types of objects accepted as the source parameter. The changes the internal .calls attribute and returns None.
- Parameters:
source ([any]) – The source of Call objects, this can be a: * Call object * list of Call objects * list of objects with the ‘calls’ attribute * Config object * Object with the ‘calls’ attribute
- Return type:
None
- property calls
- context
Calls This parses through available checks and selects the actual test functions to run, but doesn’t actually run anything. It just sets up the object to be run later by iterating over the configs.
- class ioos_qc.config.NcQcConfig(*args, **kwargs)[source]
Bases:
Config
- class ioos_qc.config.QcConfig(source, default_stream_key='_stream')[source]
Bases:
Config
- run(**passedkwargs)[source]
- ioos_qc.config.extract_calls(source)[source]
Extracts call objects from a source object
- Parameters:
source ([any]) – The source of Call objects, this can be a: * Call object * list of Call objects * list of objects with the ‘calls’ attribute * NewConfig object * Object with the ‘calls’ attribute
- Returns:
List of extracted Call objects
- Return type:
List[Call]
- ioos_qc.config.tw
alias of
TimeWindow
ioos_qc.gliders module¶
Deprecated module. Consider using ARGO instead.
ioos_qc.plotting module¶
ioos_qc.qartod module¶
Tests based on the IOOS QARTOD manuals.
- class ioos_qc.qartod.ClimatologyConfig(members=None)[source]¶
Bases:
object
Objects to hold the config for a Climatology test
- Parameters:
tspan – 2-tuple range. If period is defined, then this is a numeric range. If period is not defined, then its a date range.
fspan – (optional) 2-tuple range of valid values. This is passed in as the fail_span to the gross_range_test.
vspan – 2-tuple range of valid values. This is passed in as the suspect_span to the gross_range test.
zspan – (optional) Vertical (depth) range, in meters positive down
period –
(optional) The unit the tspan argument is in. Defaults to datetime object but can also be any attribute supported by a pandas Timestamp object.
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html
Options: * year * week / weekofyear * dayofyear * dayofweek * quarter
- mem¶
alias of
window
- property members¶
- ioos_qc.qartod.FLAGS¶
alias of
QartodFlags
- class ioos_qc.qartod.QartodFlags[source]¶
Bases:
object
Primary flags for QARTOD.
- FAIL = 4¶
- GOOD = 1¶
- MISSING = 9¶
- SUSPECT = 3¶
- UNKNOWN = 2¶
- ioos_qc.qartod.aggregate(results)[source]¶
Runs qartod_compare against all other qartod tests in results.
- Return type:
MaskedArray
- ioos_qc.qartod.attenuated_signal_test(inp, tinp, suspect_threshold, fail_threshold, test_period=None, min_obs=None, min_period=None, check_type='std', *args, **kwargs)[source]¶
Check for near-flat-line conditions using a range or standard deviation.
Missing and masked data is flagged as UNKNOWN.
- Parameters:
inp (
Sequence
[Real
]) – Input data as a numeric numpy array or a list of numbers.tinp (
Sequence
[Real
]) – Time input data as a numpy array of dtype datetime64.suspect_threshold (
Real
) – Any calculated value below this amount will be flagged as SUSPECT. In observations units.fail_threshold (
Real
) – Any calculated values below this amount will be flagged as FAIL. In observations units.test_period (
Optional
[Real
]) – Length of time to test over in seconds [optional]. Otherwise, will test against entire inp.min_obs (
Optional
[Real
]) – Minimum number of observations in window required to calculate a result [optional]. Otherwise, test will start at beginning of time series. Note: you can specify either min_obs or min_period, but not both.min_period (
Optional
[int
]) – Minimum number of seconds in test_period required to calculate a result [optional]. Otherwise, test will start at beginning of time series. Note: you can specify either min_obs or min_period, but not both.check_type (
str
) – Either ‘std’ (default) or ‘range’, depending on the type of check you wish to perform.
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input. This array will always contain only a single unique value since all input data is flagged together.
- ioos_qc.qartod.climatology_test(config, inp, tinp, zinp)[source]¶
Checks that values are within reasonable range bounds and flags as SUSPECT.
Data for which no ClimatologyConfig member exists is marked as UNKNOWN.
- Parameters:
config (
Union
[ClimatologyConfig
,Sequence
[Dict
[str
,Tuple
]]]) – A ClimatologyConfig object or a list of dicts containing tuples that can be used to create a ClimatologyConfig object. See ClimatologyConfig docs for more info.tinp (
Sequence
[Real
]) – Time data as a sequence of datetime objects compatible with pandas DatetimeIndex. This includes numpy datetime64, python datetime objects and pandas Timestamp object. ie. pd.DatetimeIndex([datetime.utcnow(), np.datetime64(), pd.Timestamp.now()] If anything else is passed in the format is assumed to be seconds since the unix epoch.vinp – Input data as a numeric numpy array or a list of numbers.
zinp (
Sequence
[Real
]) – Z (depth) data, in meters positive down, as a numeric numpy array or a list of numbers.
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
- ioos_qc.qartod.density_inversion_test(inp, zinp, suspect_threshold=None, fail_threshold=None)[source]¶
With few exceptions, potential water density will increase with increasing pressure. When vertical profile data is obtained, this test is used to flag as failed T, C, and SP observations, which yield densities that do not sufficiently increase with pressure. A small operator-selected density threshold (DT) allows for micro-turbulent exceptions. This test can be run on downcasts, upcasts, or down/up cast results produced in real time.
Both Temperature and Salinity should be flagged based on the result of this test.
Ref: Manual for Real-Time Quality Control of in-situ Temperature and Salinity Data, Version 2.0, January 2016
- Parameters:
inp (
Sequence
[Real
]) – Potential density values as a numeric numpy array or a list of numbers.zinp (
Sequence
[Real
]) – Corresponding depth/pressure values for each density.suspect_threshold (
Optional
[float
]) – A float value representing a maximum potential density(or sigma0) variation to be tolerated, downward density variation exceeding this will be flagged as SUSPECT.fail_threshold (
Optional
[float
]) – A float value representing a maximum potential density(or sigma0) variation to be tolerated, downward density variation exceeding this will be flagged as FAIL.
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
- ioos_qc.qartod.flat_line_test(inp, tinp, suspect_threshold, fail_threshold, tolerance=0)[source]¶
Check for consecutively repeated values within a tolerance. Missing and masked data is flagged as UNKNOWN. More information: https://github.com/ioos/ioos_qc/pull/11
- Parameters:
inp (
Sequence
[Real
]) – Input data as a numeric numpy array or a list of numbers.tinp (
Sequence
[Real
]) – Time data as a sequence of datetime objects compatible with pandas DatetimeIndex. This includes numpy datetime64, python datetime objects and pandas Timestamp object. ie. pd.DatetimeIndex([datetime.utcnow(), np.datetime64(), pd.Timestamp.now()] If anything else is passed in the format is assumed to be seconds since the unix epoch.suspect_threshold (
int
) – The number of seconds within tolerance to allow before being flagged as SUSPECT.fail_threshold (
int
) – The number of seconds within tolerance to allow before being flagged as FAIL.tolerance (
Real
) – The tolerance that should be exceeded between consecutive values. To determine if the current point n should be flagged, we use a rolling window, with endpoint at point n, and calculate the range of values in the window. If that range is less than tolerance, then the point is flagged.
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
- ioos_qc.qartod.gross_range_test(inp, fail_span, suspect_span=None)[source]¶
Checks that values are within reasonable range bounds.
Given a 2-tuple of minimum/maximum values, flag data outside of the given range as FAIL data. Optionally also flag data which falls outside of a user defined range as SUSPECT. Missing and masked data is flagged as UNKNOWN.
- Parameters:
inp (
Sequence
[Real
]) – Input data as a numeric numpy array or a list of numbers.fail_span (
Tuple
[Real
,Real
]) – 2-tuple range which to flag outside data as FAIL.suspect_span (
Optional
[Tuple
[Real
,Real
]]) – 2-tuple range which to flag outside data as SUSPECT. [optional]
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
- ioos_qc.qartod.location_test(lon, lat, bbox=(-180, -90, 180, 90), range_max=None)[source]¶
Checks that a location is within reasonable bounds.
Checks that longitude and latitude are within reasonable bounds defaulting to lon = [-180, 180] and lat = [-90, 90]. Optionally, check for a maximum range parameter in great circle distance defaulting to meters which can also use a unit from the quantities library. Missing and masked data is flagged as UNKNOWN.
- Parameters:
lon (
Sequence
[Real
]) – Longitudes as a numeric numpy array or a list of numbers.lat (
Sequence
[Real
]) – Latitudes as a numeric numpy array or a list of numbers.bbox (
Tuple
[Real
,Real
,Real
,Real
]) – A length 4 tuple expressed in (minx, miny, maxx, maxy) [optional].range_max (
Optional
[Real
]) – Maximum allowed range expressed in geodesic curve distance (meters).
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
- ioos_qc.qartod.qartod_compare(vectors)[source]¶
Aggregates an array of flags by precedence into a single array.
- Parameters:
vectors (
Sequence
[Sequence
[Real
]]) – An array of uniform length arrays representing individual flags- Return type:
MaskedArray
- Returns:
A masked array of aggregated flag data.
- ioos_qc.qartod.rate_of_change_test(inp, tinp, threshold)[source]¶
Checks the first order difference of a series of values to see if there are any values exceeding a threshold defined by the inputs. These are then marked as SUSPECT. It is up to the test operator to determine an appropriate threshold value for the absolute difference not to exceed. Threshold is expressed as a rate in observations units per second. Missing and masked data is flagged as UNKNOWN.
- Parameters:
inp (
Sequence
[Real
]) – Input data as a numeric numpy array or a list of numbers.tinp (
Sequence
[Real
]) – Time data as a sequence of datetime objects compatible with pandas DatetimeIndex. This includes numpy datetime64, python datetime objects and pandas Timestamp object. ie. pd.DatetimeIndex([datetime.utcnow(), np.datetime64(), pd.Timestamp.now()] If anything else is passed in the format is assumed to be seconds since the unix epoch.threshold (
float
) – A float value representing a rate of change over time, in observation units per second.
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
- ioos_qc.qartod.span¶
alias of
Span
- ioos_qc.qartod.spike_test(inp, suspect_threshold=None, fail_threshold=None, method='average')[source]¶
Check for spikes by checking neighboring data against thresholds
Determine if there is a spike at data point n-1 by subtracting the midpoint of n and n-2 and taking the absolute value of this quantity, and checking if it exceeds a low or high threshold (default). Values which do not exceed either threshold are flagged GOOD, values which exceed the low threshold are flagged SUSPECT, and values which exceed the high threshold are flagged FAIL. Missing and masked data is flagged as UNKNOWN.
- Parameters:
inp (
Sequence
[Real
]) – Input data as a numeric numpy array or a list of numbers.suspect_threshold (
Optional
[Real
]) – The SUSPECT threshold value, in observations units.fail_threshold (
Optional
[Real
]) – The SUSPECT threshold value, in observations units.method (
str
) – [‘average’(default),’differential’] optional input to assign the method used to detect spikes. * “average”: Determine if there is a spike at data point n-1 by subtracting the midpoint of n and n-2 and taking the absolute value of this quantity, and checking if it exceeds a low or high threshold. - * “differential”: Determine if there is a spike at data point n by calculating the difference between n and n-1 and n+1 and n variation. To considered, (n - n-1)*(n+1 - n) should be smaller than zero (in opposite direction).
- Return type:
MaskedArray
- Returns:
A masked array of flag values equal in size to that of the input.
ioos_qc.results module¶
- class ioos_qc.results.CallResult(package, test, function, results)[source]¶
Bases:
NamedTuple
-
function:
callable
¶ Alias for field number 2
-
package:
str
¶ Alias for field number 0
-
results:
ndarray
¶ Alias for field number 3
-
test:
str
¶ Alias for field number 1
-
function:
- class ioos_qc.results.CollectedResult(stream_id, package, test, function, results=None, data=None, tinp=None, zinp=None, lat=None, lon=None)[source]¶
Bases:
object
-
data:
ndarray
= None¶
-
function:
callable
¶
- property hash_key: str¶
-
lat:
ndarray
= None¶
-
lon:
ndarray
= None¶
-
package:
str
¶
-
results:
MaskedArray
= None¶
-
stream_id:
str
¶
-
test:
str
¶
-
tinp:
ndarray
= None¶
-
zinp:
ndarray
= None¶
-
data:
- class ioos_qc.results.ContextResult(stream_id, results, subset_indexes, data, tinp, zinp, lat, lon)[source]¶
Bases:
NamedTuple
-
data:
ndarray
¶ Alias for field number 3
-
lat:
ndarray
¶ Alias for field number 6
-
lon:
ndarray
¶ Alias for field number 7
-
results:
List
[CallResult
]¶ Alias for field number 1
-
stream_id:
str
¶ Alias for field number 0
-
subset_indexes:
ndarray
¶ Alias for field number 2
-
tinp:
ndarray
¶ Alias for field number 4
-
zinp:
ndarray
¶ Alias for field number 5
-
data:
- ioos_qc.results.collect_results_dict(results)[source]¶
Turns a list of ContextResult objects into a dictionary of test results by combining the subset_index information in each ContextResult together into a single array of results. This is mostly here for historical purposes. Users should migrate to using the Result objects directly.
ioos_qc.stores module¶
- class ioos_qc.stores.BaseStore[source]¶
Bases:
object
- save(*args, **kwargs)[source]¶
Serialize results to a store. This could save a file or publish messages.
- property stream_ids: List[str]¶
A list of stream_ids to save to the store
- class ioos_qc.stores.CFNetCDFStore(results, axes=None, **kwargs)[source]¶
Bases:
BaseStore
- save(path_or_ncd, dsg, config, dsg_kwargs={}, write_data=False, include=None, exclude=None, compute_aggregate=False)[source]¶
Serialize results to a store. This could save a file or publish messages.
- property stream_ids: List[str]¶
A list of stream_ids to save to the store
- class ioos_qc.stores.NetcdfStore[source]¶
Bases:
object
- save(path_or_ncd, config, results)[source]¶
Updates the given netcdf with test configuration and results. If there is already a variable for a given test, it will update that variable with the latest results. Otherwise, it will create a new variable.
- Parameters:
path_or_ncd – path or netcdf4 Dataset in which to store results
results – output of run()
- class ioos_qc.stores.PandasStore(results, axes=None)[source]¶
Bases:
BaseStore
Store results in a dataframe
- compute_aggregate(name='rollup')[source]¶
Internally compute the total aggregate and add it to the results
- save(write_data=False, write_axes=True, include=None, exclude=None)[source]¶
Serialize results to a store. This could save a file or publish messages.
- Return type:
DataFrame
- property stream_ids: List[str]¶
A list of stream_ids to save to the store
ioos_qc.streams module¶
- class ioos_qc.streams.BaseStream(*args, **kwargs)[source]¶
Bases:
object
Each stream should define how to return a list of datastreams along with their time and depth association. Each of these streams will passed through quality control configurations and returned back to it. Each stream needs to also define what to do with the resulting results (how to store them.)
- data(stream_id)[source]¶
Return the data array from the source dataset based on stream_id. This is useful when plotting QC results.
- class ioos_qc.streams.NetcdfStream(path_or_ncd, time=None, z=None, lat=None, lon=None, geom=None)[source]¶
Bases:
object
- class ioos_qc.streams.NumpyStream(inp=None, time=None, z=None, lat=None, lon=None, geom=None)[source]¶
Bases:
object
ioos_qc.utils module¶
- class ioos_qc.utils.GeoNumpyDateEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None, use_decimal=True, namedtuple_as_object=True, tuple_as_array=True, bigint_as_string=False, item_sort_key=None, for_json=False, ignore_nan=False, int_as_string_bitcount=None, iterable_as_array=False)[source]¶
Bases:
GeoJSONEncoder
- ioos_qc.utils.check_timestamps(times, max_time_interval=None)[source]¶
Sanity checks for timestamp arrays
Checks that the times supplied are in monotonically increasing chronological order, and optionally that time intervals between measurements do not exceed a value max_time_interval. Note that this is not a QARTOD test, but rather a utility test to make sure times are in the proper order and optionally do not have large gaps prior to processing the data.
- Parameters:
times (
ndarray
) – Input array of timestampsmax_time_interval (
Optional
[Real
]) – The interval between values should not exceed this value. [optional]
- Return type:
bool