IOOS GTS Statistics

IOOS GTS Statistics#

Created: 2020-10-10

Updated: 2023-06-26

The Global Telecommunication System (GTS) is a coordinated effort for rapid distribution of observations. The GTS monthly reports show the number of messages released to GTS for each station. The reports contain the following fields:

  • location ID: Identifier that station messages are released under to the GTS;

  • region: Designated IOOS Regional Association (only for IOOS regional report);

  • sponsor: Organization that owns and maintains the station;

  • Met: Total number of met messages released to the GTS

  • Wave: Total number of wave messages released to the GTS

In this notebook we will explore the statistics of the messages IOOS is releasing to GTS.

Using this notebook#

  1. Pick the appropriate date range of interest.

  2. Edit the variables start_date and end_date in the cell below to reflect your time period of interest (use YYYY-MM-DD format).

  3. Run all the cells in the notebook.

The first step is to pick the appropriate date range of interest.

start_date = "2021-07-01"
end_date = "2021-10-30"

Now we download the data. We will use the NDBC ioosstats server that hosts the CSV files with the ingest data.

import datetime as dt

import pandas as pd

# example https://www.ndbc.noaa.gov/ioosstats/rpts/2021_03_ioos_regional.csv

start = dt.datetime.strptime(start_date, "%Y-%m-%d")
end = dt.datetime.strptime(end_date, "%Y-%m-%d")

# build an array for days between dates
date_array = (start + dt.timedelta(days=x) for x in range(0, (end - start).days))

# get a unique list of year-months for url build
months = []
for date_object in date_array:
    months.append(date_object.strftime("%Y-%m"))
months = sorted(set(months))

df = pd.DataFrame(columns=["locationID", "region", "sponsor", "met", "wave"])
for month in months:
    url = (
        "https://www.ndbc.noaa.gov/ioosstats/rpts/%s_ioos_regional.csv"
        % month.replace("-", "_")
    )
    df1 = pd.read_csv(url, dtype={"met": float, "wave": float})
    df1["time (UTC)"] = pd.to_datetime(month)
    df = pd.concat([df, df1])

df.describe()
met wave time (UTC)
count 695.000000 695.000000 695
mean 5492.500719 1207.418705 2021-08-16 07:35:49.640287744
min 0.000000 0.000000 2021-07-01 00:00:00
25% 0.000000 0.000000 2021-08-01 00:00:00
50% 2576.000000 0.000000 2021-09-01 00:00:00
75% 8789.000000 1409.000000 2021-09-16 00:00:00
max 17814.000000 17814.000000 2021-10-01 00:00:00
std 5841.635417 2841.529514 NaN
df["locationID"] = df["locationID"].str.lower()

df["time (UTC)"].unique()
<DatetimeArray>
['2021-07-01 00:00:00', '2021-08-01 00:00:00', '2021-09-01 00:00:00',
 '2021-10-01 00:00:00']
Length: 4, dtype: datetime64[ns]

The table has all the ingest data. We can now explore it grouping the data by IOOS Regional Association (RA).

groups = df[["met", "wave", "region"]].groupby("region")

ax = groups.sum().plot(kind="bar", figsize=(11, 3.75))
ax.yaxis.get_major_formatter().set_scientific(False)
ax.set_ylabel("# observations")
Text(0, 0.5, '# observations')
../../../_images/19abb1cf9e81ee291e2424a91fa8e5ac931549f10a1d2829b4334e345fd9042f.png

Let us check the monthly sum of data released both for individual met and wave and the totals.

import pandas as pd

df["time (UTC)"] = pd.to_datetime(df["time (UTC)"])

# Remove time-zone info for easier plotting, it is all UTC.
df["time (UTC)"] = df["time (UTC)"].dt.tz_localize(None)

groups = df.groupby(pd.Grouper(key="time (UTC)", freq="M"))

We can create a table of observations per month,

s = groups[["met", "wave"]].sum()  # reducing the columns so the summary is digestable
totals = s.assign(total=s["met"] + s["wave"])
totals.index = totals.index.to_period("M")

print(f"Monthly totals:\n{totals}\n")

print(
    f"Sum for time period {totals.index.min()} to {totals.index.max()}: {totals['total'].sum()}"
)
Monthly totals:
                  met      wave      total
time (UTC)                                
2021-07     1000690.0  227112.0  1227802.0
2021-08      967746.0  226282.0  1194028.0
2021-09      923172.0  205020.0  1128192.0
2021-10      925680.0  180742.0  1106422.0

Sum for time period 2021-07 to 2021-10: 4656444.0

and visualize it in a bar plot.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(11, 3.75))

s.plot(ax=ax, kind="bar")
ax.set_xticklabels(
    labels=s.index.to_series().dt.strftime("%Y-%b"),
    rotation=70,
    rotation_mode="anchor",
    ha="right",
)
ax.yaxis.get_major_formatter().set_scientific(False)
ax.set_ylabel("# observations")
Text(0, 0.5, '# observations')
../../../_images/db10203e33d15c3c4dace7c8ecfbcec1e3598a792b12247e6f0eca73f9330745.png

Those plots are interesting to understand the RAs role in the GTS ingest and how much data is being released over time. It would be nice to see those per buoy on a map.

For that we need to get the position of the NDBC buoys. Let’s get a table of all the buoys and match with what we have in the GTS data.

import xml.etree.ElementTree as et

import pandas as pd
import requests


def make_ndbc_table():
    url = "https://www.ndbc.noaa.gov/activestations.xml"
    with requests.get(url) as r:
        elems = et.fromstring(r.content)
    df = pd.DataFrame([elem.attrib for elem in list(elems)])
    df["id"] = df["id"].str.lower()
    return df.set_index("id")


buoys = make_ndbc_table()
buoys["lon"] = buoys["lon"].astype(float)
buoys["lat"] = buoys["lat"].astype(float)

buoys.head()
lat lon elev name owner pgm type met currents waterquality dart seq
id
0y2w3 44.794 -87.313 179 Sturgeon Bay CG Station, WI U.S.C.G. Marine Reporting Stations IOOS Partners fixed n n n n NaN
13001 12.000 -23.000 0 NE Extension Prediction and Research Moored Array in the At... International Partners buoy y n n n NaN
13002 21.000 -23.000 0 NE Extension Prediction and Research Moored Array in the At... International Partners buoy y n n n NaN
13008 15.000 -38.000 0 Reggae Prediction and Research Moored Array in the At... International Partners buoy y n n n NaN
13009 8.000 -38.000 0 Lambada Prediction and Research Moored Array in the At... International Partners buoy n n n n NaN

For simplificty we will plot the total of observations per buoys.

df
locationID region sponsor met wave time (UTC)
0 46108 AOOS ALASKA OCEAN OBSERVING SYSTEM 0.0 0.0 2021-07-01
1 haxa2 AOOS MARINE EXCHANGE OF ALASKA 8760.0 0.0 2021-07-01
2 jmla2 AOOS MARINE EXCHANGE OF ALASKA 8600.0 0.0 2021-07-01
3 nkla2 AOOS MARINE EXCHANGE OF ALASKA 8774.0 0.0 2021-07-01
4 gixa2 AOOS MARINE EXCHANGE OF ALASKA 8534.0 0.0 2021-07-01
... ... ... ... ... ... ...
169 ssbn7 SECOORA COASTAL OCEAN RESEARCH AND MONITORING PROGRAM 0.0 2940.0 2021-10-01
170 41159 SECOORA COASTAL OCEAN RESEARCH AND MONITORING PROGRAM 0.0 2716.0 2021-10-01
171 sipf1 SECOORA FLORIDA INSTITUTE OF TECHNOLOGY 0.0 0.0 2021-10-01
172 42098 SECOORA GREATER TAMPA BAY MARINE ADVISORY COUNCIL PORTS 0.0 2872.0 2021-10-01
173 44095 SECOORA UNIVERSITY OF NORTH CAROLINA COASTAL STUDIES 0.0 2916.0 2021-10-01

695 rows × 6 columns

groups = df[["locationID", "met", "wave"]].groupby("locationID")
location_sum = groups.sum()
buoys = buoys.T

extra_cols = pd.DataFrame({k: buoys.get(k) for k, row in location_sum.iterrows()}).T
extra_cols = extra_cols[["lat", "lon", "type", "pgm", "name"]]

map_df = pd.concat([location_sum, extra_cols], axis=1)
map_df = map_df.loc[map_df["met"] + map_df["wave"] > 0]

And now we can overlay an HTML table with the buoy information and ingest data totals.

from ipyleaflet import AwesomeIcon, FullScreenControl, LegendControl, Map, Marker
from ipywidgets import HTML

m = Map(center=(35, -95), zoom=4)
m.add_control(FullScreenControl())

legend = LegendControl(
    {"wave": "#FF0000", "met": "#FFA500", "both": "#008000"},
    name="GTS",
    position="bottomright",
)
m.add_control(legend)


def make_popup(row):
    classes = "table table-striped table-hover table-condensed table-responsive"
    return pd.DataFrame(row[["met", "wave", "type", "name", "pgm"]]).to_html(
        classes=classes
    )


for k, row in map_df.iterrows():
    if (row["met"] + row["wave"]) > 0:
        location = row["lat"], row["lon"]
        if row["met"] == 0:
            color = "red"
        elif row["wave"] == 0:
            color = "orange"
        else:
            color = "green"
        marker = Marker(
            draggable=False,
            icon=AwesomeIcon(name="life-ring", marker_color=color),
            location=location,
        )
        msg = HTML()
        msg.value = make_popup(row)
        marker.popup = msg
        m.add_layer(marker)
m