Data Catalog Architecture

Data Catalog Architecture

The data catalog consists of two distinct projects: CKAN and PyCSW. CKAN is a open source data portal project and PyCSW is a python web interface that implements OGC Catalog Service for the Web (CS-W).

CKAN has several components within the project:

  • PostGIS Database for physically storing datasets, metadata, users, etc.
  • Apache solr which acts as a search engine for the datasets and other metadata
  • CKAN Front End which is a Python Web Service Gateway Interface (WSGI) which is built on pylons.
  • CKAN Plugins. CKAN is very modular and supports a plethora of plugins. The plugins we use:

    • ckanext-spatial: CKAN plugin to support geospatial data and be able to parse ISO-19139 XML documents implementing ISO-19115-2
    • ckanext-harvest: CKAN plugin to provide the framework for downloading and ingesting geospatial metadata from the IOOS Registry
    • ckan-pycsw: CKAN Plugin to synchronize CKAN data with PyCSW

PyCSW is a Python WSGI that implements the OGC Catalog Service for the Web (CSW). PyCSW has it’s own PostGIS database that is synchronized with CKAN at the bottom of every hour. PyCSW has it’s own search index that is built on SQL queries to PostGIS.

For code examples for interfacing with PyCSW, please take a look at Exploring CSW.