Scalable and Reusable Open Geospatial Data

Dr. Angelos Tzotsos, IMIS Athena

Scientific & Technical Manager

OSGeo Charter Member

GeoDataCamp 2015, Innovathens, 10 Sep 2015

Genesis of the project

  • Consortium established on active research, commercial, and personal interactions
  • Consortium members are active contributors to Free and Open Source Geospatial Software
  • Idea based on practical experience (2010-) from and our interaction with users, publishers and SMEs

Why PublicaMundi?

  • Imago Mundi
    • Image of the World, Babylonian world map
    • Maps, created by cartographers and geographers
  • PublicaMundi
    • Image of the World, based on Open Data
    • Maps, created by open knowledge and FOSS

Experiences and Problems

Reoccurring practical problems in open geospatial data reuse:

  • Data formats/CRS
  • Web maps
  • Interlink
  • Translations
  • Publishing
  • OGC/INSPIRE documents not for all


Athena IMIS


  • Athena Research and Innovation Center in Information, Communication and Knowledge Technologies
  • Non-profit, research organization, governed by public law
  • Institute for the Management of Information Systems (IMIS)


  • rasdaman GmbH
  • R&D centric SME, established in 2003 (MBO in 2010)
  • Areas of business: commercial support for rasdaman Array DBMS; consultancy on SDIs & standards
  • Geospatial World Innovation Award (2013)


  • GeoLabs SARL
  • R&D centric SME focused on FLOSS GIS
  • Senegalease Land register (2007)
  • 3D module development for Terra Explorer (2008) in IGN 3D Geoportal
  • Development of the MapMint SDI using WPS, other OGC Web Services and OASIS


  • Geospatial Enabling Technologies LtD
  • SME focused on GeoInformatics
  • Successful design and implementation of projects for public and private sector related to geospatial data production, management, curation, geospatial applications (desktop, mobile, web)
  • One of the first Greek private companies invested in Open Source GIS technology


Research and develop methodologies, as well as scalable, reusable tools to facilitate:

  • the publication
  • discovery
  • and reuse

of open geospatial data



  • Open data catalogues fully supporting publishing, curation and management lifecycle of geospatial data
  • Interlinking of geospatial data and multilinguality support in a cross-boundary context
  • Scalable technologies and services to create and reuse on-demand maps from open geospatial data
  • Analytics to accurately monitor the usage of open geospatial data
  • Scalable technologies and reusable data APIs supporting querying, processing, and analysis of open geospatial data

Free and Open Source
Software (FOSS)

  • PublicaMundi development is based exclusively on the OSGeo stack
  • Based on CKAN open data catalogue
  • PublicaMundi spatially extends CKAN using OGC standards
  • Source code, Issue Tracker on GitHub



An abbreviation for “Comprehensive Knowledge Archive Network”

Open Source web platform for publishing and sharing data with impressive deployment history:

Open Source Geospatial
Foundation (OSGeo)

Since 2006 A Non Profit Umbrella for:

  • GeoSpatial Free and Open Source Software
  • Education
  • Open Data



  • GNU/Linux distribution
  • 60+ Open Source Geospatial Applications
  • Sample Datasets
  • Consistent Overviews & Quickstarts
  • Translations

High level architecture

System Architecture

OGC standards and INSPIRE

  • Discovery Services
  • View Services
  • Download Services
  • Processing Services

Earth Observation Big Data

  • Integration with rasdaman
  • Integration with ZOO WPS
  • Raster processing services based on GRASS GIS, OrfeoToolbox, Saga GIS
  • WCPS and WPS support


  • CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data.
  • CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.

CKAN Features

  • Publish and find datasets
  • Store and manage data
  • Federated nodes
  • Harvesting
  • Metadata Editing/Management
  • APIs and Extensions

Publish Data

Search and Discovery




CKAN Spatial

  • A spatial field on the default CKAN dataset schema, that uses PostGIS as the backend and allows to perform spatial queries and to display the dataset extent on the frontend
  • Harvesters to import geospatial metadata into CKAN from other sources in ISO 19139 format and others
  • Commands to support the CSW standard using pycsw
  • Plugins to preview spatial formats such as GeoJSON

ckanext-spatial Features

  • Spatial Search
  • Spatial Harvesters
  • CSW interface
  • WMS Preview

Spatial Search

Spatial Search

Spatial Datasets

Spatial Datasets Preview

Spatial Datasets Preview

ISO 19115 Metadata

CSW Interface


  • pycsw is a OGC CSW server implementation written in Python
  • pycsw is an Open Source project released under the MIT license

What is Metadata?

Metadata is often described as “data about data”, or the who, what, where, and when.

In the geospatial world, for each dataset we maintain, we should record information about the data such as:

  • general description
  • location
  • usage restrictions
  • projection
  • technical contact
  • time period
  • date created
  • date modified
  • version

Metadata Standards

  • Dublin Core: established a core/common group of 15 metadata elements
  • FGDC CSDGM: approved by the U.S. Federal Geographic Data Committee originally in 1994 and composed of Sections, Compound Elements, Data Elements
  • ISO 19115: International Standards Organization’s TC211 committee created this in 2003 and is composed of more than 400 “Core”, “Mandatory”, and “Optional” elements
  • ISO 19139: The XML implementation schema for ISO 19115 specifying the metadata record format

OGC CSW Specification

The Open Geospatial Consortium (OGC) OpenGIS Catalogue Service Implementation Specification, currently at version 2.0.2, is a standard for discovering and retrieving spatial data and metadata.

Catalogue Services for the Web (CSW) is the HTTP protocol binding of the Catalogue Service Implementation Specificaton that allows for publishing and searching of metadata.

CSW Operations

  • GetCapabilities (mandatory) - allow clients to retrieve information describing the service instance
  • DescribeRecord (mandatory) - allows a client to discover elements of the information model supported by the target catalogue service
  • GetRecords (mandatory) - get metadata records
  • GetRecordById (optional) - get metadata records by ID
  • GetDomain (optional) - obtain runtime information about the range of values of a metadata record element or request parameter
  • Harvest (optional) - references the data to be inserted or updated in the catalog
  • Transaction (optional) - defines an interface for creating, modifying and deleting catalogue records

Example Requests


  • pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]
  • pycsw allows for the publishing and discovery of geospatial metadata


The project is certified OGC Compliant, and is an OGC Reference Implementation

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

pycsw is an official OSGeo Project

OSGeo Project in Incubation


  • Harvesting support for WMS, WFS, WCS, WPS, WAF, CSW, SOS
  • Implements ISO Metadata Application Profile 1.0.0
  • Implements FGDC CSDGM Application Profile for CSW 2.0
  • Implements INSPIRE Discovery Services 3.0
  • Supports ISO, Dublin Core, DIF, FGDC and Atom metadata models
  • Standalone of embedded deployment (CGI or WSGI)
  • Transactional capabilities (CSW-T)
  • Flexible repository configuration (SQLite, PostgreSQL, PostGIS, MySQL)
  • Federated catalogue distributed searching

More features...

  • Simple configuration
  • Extensible plugin architecture (profiles, repositories/backends)
  • Seamless integration with Python environments (e.g. GeoNode, Open Data Catalog)
  • Includes commandline utility to administer the metadata repository
  • Implements the Search/Retrieval via URL (SRU) search protocol
  • Implements OpenSearch
  • Realtime XML Schema validation

Standards Support

  • OGC CSW 2.0.2
  • OGC CSW 3.0.0
  • OGC Filter 1.1.0
  • OGC OWS Common 1.0.0
  • OGC OpenSearch Geo/Time
  • OGC GML 3.1.1
  • OGC SFSQL 1.2.1
  • Dublin Core 1.1
  • SOAP 1.2
  • ISO 19115 2003
  • ISO 19139 2007
  • ISO 19119 2005
  • NASA DIF 9.7
  • FGDC CSDGM 1998
  • SRU 1.1
  • A9 OpenSearch 1.1

ZOO Project WPS

  • ZOO is a WPS (Web Processing Service) open source project released under a MIT/X-11 style license
  • It provides an OGC WPS compliant developer-friendly framework to create and chain WPS Web services

ZOO Overview

ZOO is made of three parts:

  • ZOO Kernel: A powerful server-side C Kernel which makes it possible to manage and chain Web services coded in different programming languages
  • ZOO Services: A growing suite of example Web services based on various Open Source libraries
  • ZOO API: A server-side JavaScript API able to call and chain the ZOO Services, which makes the development and chaining processes easier

ZOO Overview

ZOO Kernel

ZOO Demos


  • Rasdaman ("raster data manager") is a domain-neutral Array Database System: it extends standard relational database systems with the ability to store and retrieve multi-dimensional raster data ( arrays) of unlimited size through an SQL-style query language.
  • It provides reference implementation of OGC WCS and WCPS interfaces
  • Rasdaman embeds itself smoothly into PostgreSQL
  • The Petascope component of rasdaman provides service interfaces based on the OGC WCS, WCPS, WCS-T, and WPS

Rasdaman features

  • Rasdaman makes it easy to search in large, multi-dimensional raster data
  • RASQL language
  • Tiling policies
  • Parallel server processing
  • OGC interfaces


The rasdaman query language, rasql, offers raster processing formulated through expressions over raster operations in the style of SQL.

Consider the following query: "The difference of red and green channel from all images from collection LandsatImages where somewhere in the red channel intensity exceeds 127"

select -
from LandsatImages as ls
where max_cells( ) > 127

Rasdaman Demo

Other Geospatial Technologies Involved

PostGIS - Spatial Database

OpenLayers – Browser Mapping Library

Leaflet – Mobile Friendly Interactive Maps

GeoServer – Web Services

MapServer – Web Services

MapProxy – Proxy WMS & tile services

GDAL/OGR – Geospatial Data Translation Tools

MetaCRS - Coordinate Reference System Transformations


OGC OpenSearch Geo/Time


  • First implementation of the new specification through ZOO Project


  • New specification driven around the developments of PublicaMundi
  • Soon to be adopted by OGC

OGC CSW 3.0.0

  • First implementation of the new specification through pycsw


  • PublicaMundi funded contributions to the new specification

Integration Environment

Integration Environment

  • Beta deployment of software to
  • The servers of the project were installed on the data center of Greek Ministry of Education

Cloud Environment

  • The integration environment of PublicaMundi is deployed on top of the Synnefo cloud stack, within a number of virtual machines
  • Synnefo is a complete open source cloud stack written in Python that provides Compute, Network, Image, Volume and Storage services, similar to the ones offered by AWS
  • Synnefo is the Open Source project behind Okeanos

Synnefo Services

Synnefo Architecture

Synnefo UI

VM clusters

The software components of PublicaMundi are deployed initially into 8 virtual clusters, with the provision of spinning up more virtual machines into each cluster if necessary.

  • Database cluster
  • CKAN cluster
  • GeoServer cluster
  • Rasdaman cluster
  • ZOO cluster
  • Proxy/Analytics cluster
  • Tiles/Caching cluster
  • Storage cluster

GeoServer cluster



PublicaMundi utilizes Ansible Playbooks in order to deploy software to the integration environment, starting from empty Debian 7 virtual machines, with only network and ssh root access being preconfigured from Synnefo

CKAN Contributions theme theme theme

Publishing Workflow

  • Added support for INSPIRE metadata
  • Added support for Geospatial datasets (raster, vector)

Metadata Editor

Administrators Dashboard

Vector support

Vector support

Raster support

OGC Web Services

Full CSW support

Mapping API

Mapping API

Data API






Thank you for your attention!