Controlled Vocabulary

Last updated on 2025-11-02 | Edit this page

Overview

Questions

  • What is controlled vocabulary?
  • Why should we use it?
  • How do we choose and use controlled vocabulary in our data?

Objectives

  • Understand the purpose of controlled vocabulary
  • Understand the basics of how to find and select a vocabulary term for the eMoF table
  • Learn how to populate measurementTypeID, measurementValueID, and measurementUnitID

Introduction


Controlled vocabulary is simply the use of a standardized and defined set of terms to populate or describe data. This could include specific data column names, a set of specific terms within a field, and the use of identifiers that point to a defined term. For the purposes of this workshop, we focus on using controlled vocabulary as identifiers in the extendedMeasurementOrFact (eMoF) extension table.

As we have learned in Episode 4: Darwin Core and Extension Schemas, the eMoF terms measurementType, measurementValue, and measurementUnit are completely unconstrained and can be populated with free text. While free text offers the advantage of capturing complex and unclassified information, there is inevitable semantic heterogeneity (e.g., of spelling, wording, or language) that becomes a challenge for effective data interoperability and analysis.

For example, if you were interested in finding all records related to weight measurements, you would have to try to account for all the different ways “weight” was recorded by data providers (weight, wgt, Weight, wet weight, dry weight, etc.). This is where using controlled vocabulary in the measurementTypeID, measurementValueID, and measurementUnitID fields is helpful! These 3 identifier fields are used to standardize measurement types, values and units. Identifiers used in these columns are populated with Unique Resource Identifiers (URIs) that provide clear and unambiguous definitions of the data.

Using URIs mean that if you call the measurementType “abundance” but I call it “Abundance per square meter” and we both use the measurementTypeIDhttp://vocab.nerc.ac.uk/collection/P01/current/SDBIOL02/” then we know this is the same measurement type even if we didn’t use the same words to describe it. Choosing the right URI can be difficult but we will provide some basic guidance on this page. More details on controlled vocabulary and finding codes can be found in the OBIS Manual.

Why use controlled vocabulary?


Using controlled vocabulary will:

  • ensure datasets are consistently documented
  • simplify data aggregation and analysis of measurements
  • decrease the potential that data will be misunderstood or misused

Controlled vocabularies are also important to ensure data are interoperable - readable by both humans and machines - and that the information is presented in an unambiguous manner. Vocabulary collections like the NERC Vocabulary Server (NVS) compile vocabularies from different institutions and authorities (e.g., ISO, ICES, EUNIS), allowing you to map your data to them. OBIS recommends selecting terms from NVS whenever possible, but other vocabulary collections may be used (e.g. ICES, EcoPortal, ENVO, etc.) as long as the URIs provided are machine readable and interoperable.

When we all correctly use controlled vocabulary with our measurement data, we can search for a single measurementTypeID and obtain all related records, regardless of differences in wording or language used in the data.

Callout

📌 Tip

You can search for measurementTypes that other OBIS data providers have used by using the OBIS Mof Viewer. BE CAREFUL when using this tool and make sure the definition in the URI matches exactly your measurement type if you want to reuse it for your data.

How to select controlled vocabulary?


Where do we start when selecting controlled vocabularies? As mentioned above, OBIS strongly recommends selecting terms from the NERC Vocabulary Server. If your institution already recommends a machine-readable vocabulary collection, it is okay to continue using that, as long as the URIs are publicly accessible and machine-readable.

When selecting a vocabulary term keep in mind these general principles:

  • machine operable (URI, IRI)
  • human readable
  • clear, unambiguous definitions

OBIS has created a decision tree and has released video tutorials to help with choosing URIs. While we won’t go over the decision tree in detail here, you can watch the video series for additional assistance.

For now, let’s break it down for each of the measurement ID columns in the eMoF, starting with the “simplest” column.

measurementUnitID

The measurementUnitID field is the easiest measurement ID field to populate. It is used to provide a URI for the unit associated with the value provided to measurementValue (e.g. cm, kg, kg/m^2). OBIS recommends this field be populated with terms from the NVS P06 collection, BODC-approved data storage units. See screenshot below for an example vocabulary term from this collection, and where to find the URI on the page.

Screenshot of a unit within the NVS P06 collection. Box highlights the URI to be used in measurementUnitID field.
Screenshot of a unit within the NVS P06 collection. Box highlights the URI to be used in measurementUnitID field.

To search this collection, see https://www.bodc.ac.uk/resources/vocabularies/vocabulary_search/P06/.

Other examples for measurementUnit and associated measurementUnitID include:

measurementValueID

The measurementValueID field is used to provide an identifying code when measurementValue is populated with non-numerical values (e.g. sampling related, sex or life stage designation, etc.).

Callout

measurementValueID is NOT used for standardizing numeric measurements!

Unlike measurementUnitID, there are multiple NVS collections that may be used to search for and select terms from. The collection is dependent on which type of measurementValue you have. See the table below for some common, non-exhaustive examples. Note that when documenting behaviour values in the eMoF, OBIS recommends using codes from the ICES Vocabulary Server.

Type of measurementValue Vocabulary Collection
Sex (gender) S10
Lifestage S11
Sampling instruments and sensors (SeaVoX Device Catalogue) L22
Sampling instrument categories (SeaDataNet device categories) L05
Vessels (ICES Platform Codes) C17
European Nature Information System Level 3 Habitats C35
Behaviour ICES Behaviour collection

You may also populate measurementValueID with references to papers or manuals that document, for example, the sampling protocol used to obtain the measurement. To do this you should use either:

measurementTypeID

This is likely the most important, yet most difficult column to populate. It is the field that provides the URI defining your measurement or fact, and is most likely to be used to filter and aggregate measurement data in OBIS. Therefore it is important to do your best to select the most appropriate vocabulary term that best defines all important elements related to your measurement. Important elements may include:

  • the property or attribute being measured
  • the object or entity of interest (e.g. a chemical, a biological organism)
  • the environmental context of the measurement (e.g. per unit volume, per unit area)
  • specific methods required to interpret, understand, or contextualize the measurement (e.g. optical microscopy, filtration, computation, correction algorithms)

Think about what information a user would need to know in order to reuse your measurement data, then try to find a vocabulary term that includes all those measurements. Not all of the above may be necessary to include in your vocabulary term.

OBIS recommends selecting a vocabulary term from the NVS P01 collection. This is because the terms in this collection are built on a semantic structure that allows the elements listed above to be directly included in the vocabulary term. While you can search for terms in this collection at https://vocab.nerc.ac.uk/search_nvs/P01/, we strongly suggest using the SeaDataNet Facet Search, which is a bit more user friendly to use.

Here are a few examples of vocabulary terms mapped to the P01 collection:

Requesting new vocabulary terms


If you are unable to find a suitable vocabulary term for any of your measurementType, measurementValue, or measurementUnit, then you can request a suitable term be created for you. To do this, you can submit a request through the OBIS Vocabulary GitHub repository (https://github.com/nvs-vocabs/OBISVocabs/issues).

If you are unsure about whether a code fits your specific case, please feel free to ask questions in the Vocab channel on the OBIS Slack, or in the SMBD Slack.

Key Points
  • Controlled vocabulary ensures datasets are consistently documented, simplifies data aggregation, and can decrease data misuse
  • OBIS recommends populating measurementTypeID, measurementValueID, and measurementUnitID with terms from the NERC Vocabulary Server
  • Other vocabulary terms may be used as long as the URIs are machine readable
  • You can request new vocabulary terms to be created through the OBIS Vocab GitHub repository