s

Exploring the Irish Open Data Portal



Retrieving and Exploring Irish open data using the API's

Exploring some of the open data available from the Irish Open Data Portal at https://data.gov.ie and retrieving datasets programmatically through Python using the CKAN APIs.

For my Data Representation Project, the brief was to write a Flask server program that has a REST API to perform CRUD operations on a MySQL database with a web interface using AJAX calls to perform these CRUD operations. My application linked to the third party API, retrieved the data and stored it in the database, then displayed the data on a web page. The user could then perform CRUD operations on the data as well as trigger requests for more data from the third party API.

I chose the Irish Open data portal at data.gov.ie as the third party API to work with. There are currently over 10,000 datasets available on the Irish open data portal under various themes such as environment, health, society, transport, economy, education etc. The datasets can be accessed directly through the open data portal but there is also an API. Ireland’s open data portal aims at promoting innovation and transparency through the publication of Irish Public Sector data in open, free and reusable formats. Open data is information that is collected, produced or paid for by government bodies and made freely available for reuse. Almost all data that is not privacy sensitive can be published as open data with an open licence.

The Irish Open Data portal

The Irish Open Data portal

The Irish open data portal uses the CKAN API. CKAN is a tool for making open data websites and is used by various governments and institutions who collect a lot of data. Data is published in units called “datasets” (also called “packages”). Datasets contain metadata and a number of resources which hold the data itself in formats such as csv, excel, pdf, json etc. CKAN can store the data internally or as a link with the resource itself being available somewhere else on the web. Using the CKAN API you can get JSON-formatted lists of a site’s datasets, groups or other CKAN objects such as a package list, tag list or group list, get a full JSON representation of a dataset, resource or other object and search for packages or resources matching a query. Authorised users such as publishers who can create, update and delete datasets, resources and other objects. There is no authorization required for accessing the data.

To call the CKAN API, you can post a JSON dictionary in an HTTP POST request to one of the CKAN APIs URLs. The parameters for the API function should be given in the JSON dictionary. CKAN will also return its response in a JSON dictionary.

The instructions for running the web application are outlined in the repo’s readme.

In brief: The DAO (data access object) python files contains Python code for interacting with the MySQL database using the mysql-connector package. The DAO files contain 3 different classes:

  1. A class containing functions to call <data.gov.ie> using three _list API action calls to retrieve the list of dataset/package names, tags and organizations (dataset publishers).
  2. A class containing functions that allow the user to perform CRUD operations.
  3. A class containing functions that allows the user to retrieve additional data relating to specific datasets using query parameters.

The Python script calls the API URL using the requests library which returns JSON data. The JSON data is parsed and sent to the database. The Flask application contain various routes that allow a user to trigger the functions that call the Open data API and retrieve the data. The user can then get more information on a particular dataset including the link to the datasets resources. Use the dataset/package name or package_id, a tag name or the name of the publisher of the dataset as a query parameter to another API action call. This will return JSON data containing metadata as well as the list of dataset resources and the URLs to either directly download them or the link to somewhere else on the web. The user can then click on the link to the dataset, which is some cases will actually cause the dataset to download in whatever format and in other cases will lead the user to the API for that publishers data. For datasets that do not have API’s, the url to the dataset is generally “https://data.gov.ie/dataset/" followed by the dataset name (as retrieved by the package_list api call.)

For example:

https://data.gov.ie/dataset/no-of-approved-general-foster-carers-with-an-allocated-link-worker-2020"

Some datasets use APIs such as the ArcGIS REST API, The All-Island Research Observatory (AIRO), The Central Statistics Office’s Statbank etc.


At the moment I am working on another project where I am programatically retrieving some datasets from the Irish open data portal using the CKAN APIs.

The aim is to be able to search for and retrieve datasets from within a a Jupyter notebook for further analysis without actually visiting the https://data.gov.ie website or clicking on links in the browser.

The datasets (or their URLS) can be accessed directly through the open data portal but my aim is to retrieve the datasets from within a notebook rather than following the links and clicking on the links to download the data.

The developer’s resources outlines how the the data.gov.ie API

is built using CKAN v2.8, which provides a powerful API that allows developers to retrieve datasets, groups or other CKAN objects and search for datasets. There is full documentation available for the CKAN API online.

Using the CKAN API you can get JSON-formatted lists of a site’s datasets, groups or other CKAN objects such as a package list, tag list or group list, get a full JSON representation of a dataset, resource or other object and search for packages or resources matching a query. Authorised users such as publishers can create, update and delete datasets, resources and other objects. There is no authorization required for accessing the data.

To call the CKAN API, post a JSON dictionary in an HTTP POST request to one of CKAN’s API URLs. The parameters for the API function should be given in the JSON dictionary. CKAN will also return its response in a JSON dictionary.

I wrote a Python class that incorparates a selection of these CKAN APIs including:

  • package_list to retrieve a list of the datasets / packages
  • tag_list to retrieve a list of tags
  • organization_list to retrieve a list of organisations / publishers.
  • package_show to get a full JSON representation of a dataset, resource or other object
  • package_search to search for packages matching a query
  • resource_search

Note: In terms of the CKAN API, a ‘package’ is a legacy name for a dataset.

  • The CKAN package_list API returns a list of the full names of the datasets but not the URL to the dataset resource. To get the actual URLs you need to use additional APIs such as the package_show, package_search or resource_search APIs.

  • tag_list and organization_list works similarly for retrieving lists of tags and organ

  • The CKAN package_search and resource_search API’s allow you to search for packages or resources matching a query and returns data about the dataset including the package_id and the URL to the dataset. The query parameters can be a partial package name.

  • The CKAN package_show API returns a full JSON representation of the dataset including the URL to the actual dataset. It takes a query parameter, either the full name of the package or the package_id:

    • The package name as returned from the package_list API.
    • The package_id is returned from the package_show, package_search and resource_search API’s as well as others.

Monthly Weather data

Exploring Met Éireann datasets available through the open data portal

I previously retrieved the individual daily, hourly and monthly datasets for many of the weather stations dotted around Ireland. The datasets record measurements such as rainfall, sunshine hours, wet bulb temperature, mean wind speed etc. These datasets and many other ones covering climate data are provides by http://www.met.ie.

The datasets were cleaned and then merged together to create a large file containing all the observations for each weather station over a number of years. Weather stations have opened and closed throughout the country over the years and therefore the starting and end date of data for each weather station differs. Not all measurements are available over the entire time period.

I also merged in the station details dataset which contains location data about each weather station including the open (and close date if applicable), the latitude and longitude, station height and county.

I have since come across monthly datasets that focus on a particular measurement such as rainfall, sunshine etc. These datasets are published by Met Eireann but through the CSO’s restful API. The datasets contains monthly data on rainfall, temperature, sunshine and maximum wind gale gust recorded by Met Éireann from 1958.

The CSO’s database recently changed from using it’s Statbank database to PxStat for its new Open data portal.


Met Eireann

Met Eireann

Opendata screenshot

Tech used:
  • Python
  • Flask
  • MySQL
  • HTML
  • JQuery / AJAX