Met Éireann is Ireland’s National Meteorological Service and is the leading provider of weather information and related services in Ireland and it’s many datasets are available through the Irish open data portal at https://data.gov.ie.
As far as I can see there is no dataset listed that combines the historical observations for all the weather stations in Ireland. There are many weather stations listed, however some of these are closed.
There is a dataset that provides details of the weather stations in Ireland. This dataset is listed on the list of packages retrieved using the
package_list API. As the package name does not mention all the usual words that indicate a weather dataset such as ‘weather’, ‘rainfall’, ‘climate’ etc I did miss this the first time when filtering the package list using string methods. Filtering the package list using the terms ‘station’ will return datasets for other types of stations such as fire stations, polling stations, various survey stations etc.
(Before I discovered this stations-details dataset I was able to filter the list of weather stations down to active ones based on the stations that had current datasets for today’s and yesterday’s weather. )
This ‘station-details’ dataset provides details of all Met Éireann’s rainfall, climate and synoptic weather stations, both open and closed. Details include: County, Station Number, Name, Height (m), Easting, Northing, Latitude, Longitude, Open Year, Close Year.
This can be downloaded by visiting the URL or using curl or Python’s request library or other such libraries. My aim is to read in the datasets without visiting the open data portal so I use the
requests library from a Jupyter notebook.
package_listAPI will list all the packages listed on <data.gov.ie>
Use some pattern matching to find package names.
Use the CKAN
package_showAPI to retrieve a JSON representation of the dataset. This will include the URL for the actual dataset.
Some datasets are available in a variety of formats such as CSV, JSON, JSON-STAT, GEO-JSON, HTML etc.
requestspackage to retrieve the data from the URL. This works well for CSV and JSON datasets.
- Using requests
contentmethods to retrieve the data from the URL.
- Write the content to a file.
- Using requests
The datasets can also be retrieved from the command line using
pandaspackage can be used to read CSV datasets from a URL directly into a pandas DataFrame. However as these datasets typically have some meta data at the top of the files, this may cause some problems when reading in the data. There may not be the same number of rows and columns across the file. Various options can be set when reading in the dataset into
Once you figure out the format of the CSV files then you should be able to read directly into pandas once you set the required options to skip rows, set header rows, etc. Most of the weather datasets do tend to follow the same overall structure with the station name and location at the top, followed by a data dictionary that gives a brief description of what each column contains, a header row with abbreviated column names and then the observations themselves.
I have several notebooks on the go for this project.
Notebook 1: Get the list of active weather stations in Ireland from the open data portal.
- retrieve the list of packages using the CKAN
- Filter the list of packages for weather related datasets
- Use the
package_showAPI action to get a JSON representation of the datasets. I wrote some functions to do this.
- Apply the function to the filtered list of packages, creating a pandas dataframe containing all the dataset details including the name, description, available formats and most importantly the URLs to the actual dataset.
- I wrote another function to retrieve the actual datasets from the list of datasets.
The station-details dataset contains the following columns:
‘County’, ‘Station Number’, ‘name’, ‘Height (m)’, ‘Easting’, ‘Northing’, ‘Latitude’, ‘Longitude’, ‘Open Year’, ‘Close Year’.
Here is a map of the currently open weather stations in Ireland using the
folium package from the Jupyter notebook.