⚠️ This is the first operational version of the handbook, but it is still a work in progress and will be heavily updated during 2024! ⚠️

Extreme precipitation: Changes under climate scenarios worfklow [Hazard assessment]#

Click Binder to launch this notebook on MyBinder.
Click Heavy Rainfall to go to this workflow’s GitHub repository.

The data used for the changes in extreme precipitation Risk Assessment workflow has been prepared by CRAHI-UPC. The steps outlined in this page were followed to extract the necessary annual maxima precipitation, fit the different distributions, and compute the return periods for specific durations and scenarios across the European region.

To illustrate the application of these steps, we provide a specific example for the Catalonia region in Spain. Feel free to explore the code and modify it according to your specific research needs.

Prepare your workspace#

Load libraries#

# Libraries to download data and manage files
import os
import cdsapi
import zipfile
import datetime
import glob

# Libraries for numerical computations, array manipulation and statistics.
import numpy as np
import xarray as xr
import scipy.stats as sst

# Libraries to handle geospatial data
import rioxarray as rio
import pyproj

# Libraries to plot maps, charts and tables
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.colors import from_levels_and_colors
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.util import add_cyclic_point
import plotly.express as px
import plotly.graph_objects as go
import contextily as ctx

# Choosing the matplotlib backend
%matplotlib inline

Select your area of interest#

Let us define the coordinates of an area of interest. Based on these coordinates we will be able to clip the datasets for further processing.

To easily define an area in terms of geographical coordinates, you can go to the Bounding Box Tool to select a region and get the coordinates. Make sure to select CSV in the lower left corner and copy the values in the brackets below.

# Define the BOUNDING BOX for your desired region (CATALONIA in this example).
bbox = [0, 40.2, 3.3, 43.5]; areaname = 'Catalonia'

Set your directory structure#

The next cell will create the extreme_precipitation_hazard folder in the directory where the notebook is saved. You can change the path defined in the variable workflow_dir to create it elsewhere.

# Define the path for the extreme_precipitation_hazard folder.
workflow_dir = 'extreme_precipitation_hazard'
# Create the directory checking if it already exists.
os.makedirs(workflow_dir, exist_ok = True)

Now, we will create a subfolder for general data as well as specific subfolders to save data for our area of interest, plots generated by the code and new datasets computed within the workflow.

# Define directories for general data.
general_data_dir = os.path.join(workflow_dir, 'general_data')
# Define specific directories for the selected area
data_dir = os.path.join(workflow_dir, f'data_{areaname}')
results_dir = os.path.join(workflow_dir, f'results_{areaname}')
plots_dir = os.path.join(workflow_dir, f'plots_{areaname}')

# Create the directories checking if they already exist.
os.makedirs(general_data_dir, exist_ok = True)
os.makedirs(data_dir, exist_ok = True)
os.makedirs(results_dir, exist_ok = True)
os.makedirs(plots_dir, exist_ok = True)

Step 1: Download the EURO-CORDEX from the Climate Data Store.#

For this analysis, EURO-CORDEX climate projections for precipitation flux at a 12km spatial resolution have been employed. These projections are readily accessible to the public through the Climate Data Store (CDS) portal. The EURO-CORDEX data can be downloaded through the CDS API. Learn how to use it here.

For this example, we will guide you through downloading two different 30-year frames from the EURO-CORDEX precipitation data: one corresponding to daily precipitation; the other corresponding to 3 hours time resolution, needed for sub-daily analysis. The selected timeframes are 1976-2005 (baseline or historic simulations) and 2041-2070 (climate projections). We will select a specific combination of General Circulation Model (GCM), Regional Climate model (RCM), and a Representative Concentration Pathway (RCP).


Feel free to explore the code and modify it according to your specific research needs (e.g., pair of models or scenarios). Explore the available data for CORDEX by referring to the resources found here. Notice that the start and end years differ from model to model.


EURO-CORDEX datasets are downloaded for all Europe (they cannot be pre-selected for our area), so it may take some hours to download, specially if we are interested in the 3-hour time-step.

Define in the next cell the specifications of the dataset you want to download.

# Global options for EURO-CORDEX data
GCM = 'ichec_ec_earth'
RCM = 'knmi_racmo22e'
RCP = 'rcp_8_5'
# Define global variables of the time-frame years for further use on the workflow.
GCM_str = GCM.replace('_','-')
RCP_str = RCP.replace('_','')

    'historical' :['1976', '2005'],
    RCP_str : ['2041', '2070'] 

To request data with CDS API, a specific list of the years has to be provided. Define in the next cell the dictionary that contains the necessary years for each experiment (historical and rcp). Remember to check with the CDS Data Portal that years are well-defined.

# Define dictionary with listed years so CDS API understands the request.
    '3_hours': {  # The data is saved in 1-year files. 
        'historical': {
            'start_year': [str(i) for i in range(1976, 2006)],
            'end_year' : [str(i+1) for i in range(1976, 2006)]
        RCP: {
            'start_year': [str(i) for i in range(2041, 2071)],
            'end_year' : [str(i+1) for i in range(2041, 2071)]
    'daily_mean': {  # The data is stored in 5-year files.
        'historical': {
            'start_year': [str(i) for i in range(1976, 2006, 5)],
            'end_year': [str(i+4) for i in range(1976, 2006, 5)]
        RCP: {
            'start_year': [str(i) for i in range(2041, 2071, 5)],
            'end_year': [str(i+4) for i in range(2041, 2071, 5)]

We will loop over the temporal resolutions and time frames and save the downloaded data into the general data directory. A subdirectory for the specific GCM and time-step will be also created to differentiate data if other datasets are later downloaded.

# Change the KEY to your own
URL = "https://cds.climate.copernicus.eu/api/v2"
c = cdsapi.Client(url=URL, key=KEY)

for temp_res, runs in YEARS_SPECIFICATION.items():
    for experiment, years in runs.items():
        for y_start, y_end in zip(years['start_year'], years['end_year']):
            # Define name and path for zip and subfolder for the files
            zip_name_i = f"pr_{temp_res.replace('_','')}_{experiment.replace('_','')}.zip"
            zip_path_i = os.path.join(general_data_dir, GCM_str, zip_name_i)
            dir_path_i = os.path.join(general_data_dir, GCM_str, temp_res.replace('_',''))
            os.makedirs(dir_path_i, exist_ok=True)
            # Download from CDS
                    'format': 'zip',
                    'domain': 'europe',
                    'experiment': experiment,
                    'horizontal_resolution': '0_11_degree_x_0_11_degree',
                    'temporal_resolution': temp_res,
                    'variable': 'mean_precipitation_flux',
                    'gcm_model': GCM,
                    'rcm_model': RCM,
                    'ensemble_member': 'r1i1p1',
                    'start_year': y_start,
                    'end_year': y_end
            # Unzip the files in a specific directory within general data dir folder.
            with zipfile.ZipFile(zip_path_i, 'r') as zObject:
            # Remove zip file

If you want to download just one specific subset of data from CDS, you can use the next code snippet.

# Change the KEY to your own
URL = "https://cds.climate.copernicus.eu/api/v2"
c = cdsapi.Client(url=URL, key=KEY)

# Define zip file's absolute path
zip_path = os.path.join(general_data_dir, 'download.zip')
dir_path = os.path.join(general_data_dir, 'name_folder')
os.makedirs(dir_path, exist_ok=True)

            'format': '',
            'domain': '',
            'experiment': '',
            'horizontal_resolution': '',
            'temporal_resolution': '',
            'variable': '',
            'gcm_model': '',
            'rcm_model': '',
            'ensemble_member': '',
            'start_year': '',
            'end_year': '',

# Unzip the files in a specific directory within general data dir folder.
with zipfile.ZipFile(zip_path, 'r') as zObject:

# Remove zip file

Step 2: Explore the data.#

The files from the CDS have a filename structure that describes the exact dataset contained. The general structure is the following:

pr_[domain]_[gcm]_[rcp]_[ensemble]_[rcm]_[version]_[temporal resolution]_[start day]_[end day]

Load one of the files and explore the content and structure of the dataset. Notice the dimensions, coordinates, data variables, indexes and attributes as well as the description of the spatial reference system in rotated_pole()_.

# Define the absolute path for a specific file
filename_precipitation_example = 'pr_EUR-11_ICHEC-EC-EARTH_rcp85_r1i1p1_KNMI-RACMO22E_v1_3hr_2041010100-2042010100.nc'
path_precipitation_example = os.path.join(general_data_dir, GCM_str, '3hours', filename_precipitation_example)

# Open the netCDF file as a dataset using xarray
dataset_precipitation_example = xr.open_dataset(path_precipitation_example, decode_cf = True)

# Display said dataset
Dimensions:       (rlon: 424, rlat: 412, time: 2920, bnds: 2)
  * rlon          (rlon) float64 -28.38 -28.27 -28.16 ... 17.93 18.05 18.16
    lon           (rlat, rlon) float64 ...
  * rlat          (rlat) float64 -23.38 -23.27 -23.16 ... 21.62 21.73 21.84
    lat           (rlat, rlon) float64 ...
  * time          (time) datetime64[ns] 2041-01-01T01:30:00 ... 2041-12-31T22...
Dimensions without coordinates: bnds
Data variables:
    rotated_pole  |S1 ...
    time_bnds     (time, bnds) datetime64[ns] ...
    pr            (time, rlat, rlon) float32 ...
Attributes: (12/25)
    Conventions:                    CF-1.4
    contact:                        Erik van Meijgaard, KNMI, Regional Climat...
    experiment:                     RCP8.5 run
    experiment_id:                  rcp85
    realization:                    1
    driving_experiment:             ICHEC-EC-EARTH,rcp85,r1i1p1
    ...                             ...
    knmi_model_comment:             RACMO22E: baseline physics from ECMWF CY3...
    knmi_version_comment:           v1: reference version for Europe and othe...
    knmi_grib_path:                 mos.knmi.nl:/climreg/CXEUR12/eCS6-v441-fE...
    creation_date:                  2015-12-10T17:13:36Z
    c3s_disclaimer:                 ""
    tracking_id:                    hdl:21.14103/157ee1ba-f9ba-4f87-82f2-206f...

Step 3: Extract the temporal series of annual maximum precipitation for sub-daily and daily resolution.#

For this example, the annual maximum precipitation for 3h, 6h, 12h durations (derived from the 3-hour temporal step) and the daily annual maximum precipitation (derived from the daily dataset) have been extracted and saved in a new netCDF (network Common Data Form) for the Catalonia region.

We will follow the next steps:

  1. Reproject the coordinates of our area of interest to match the dataset’s reference system.

  2. Define a function to cut/slice the original datasets (European scale).

  3. Define a function called do_annual_maximum that:

    1. Computes the annual maximum precipitation for each time duration that we are interested.

    2. Converts units of precipitation from kg m-2 s-1 to mm over each duration.

    3. Saves the temporal series for each duration as a netCDF in the data directory for our region.

  4. Apply do_annual_maximum to historical and future projections of EURO-CORDEX precipitation datasets.

  5. Visualise the temporal series for a specific point.

Some important functions from the xarray library will be used:

  • sel to select certain indexes of the dataset/dataarray by value.

  • open_dataset to open netCDF files.

  • rolling to roll over a dataarray within a specific dimension to apply an operation.

  • resample to group by year for performing an operation.

  • to_netcdf to save dataset/dataaray as netcdf.

In addition, the rioxarray library is used to properly write the coordinate reference system (CRS) so that the new files can be well visualized with a GIS application.

3.1 Reproject bounding box coordinates.#

# Reproject usual coordinates (in WGS84) to EURO-CORDEX's CRS.
CRS = ccrs.RotatedPole(pole_latitude=39.25, pole_longitude=-162)
transformer = pyproj.Transformer.from_crs('epsg:4326',CRS)

# Original bounding box in usual lat/lon.
West, South, East, North = bbox

# New bounding box coordinates matching EURO-CORDEX CRS
RLON_MIN, RLAT_MIN, RLON_MAX, RLAT_MAX = transformer.transform_bounds(South, West, North, East)
print(f'The bounding box new coordinates in EURO-CORDEX reference system are \n {RLON_MIN}, {RLAT_MIN}, {RLON_MAX}, {RLAT_MAX}.') 
The bounding box new coordinates in EURO-CORDEX reference system are 
 -13.82050453871809, -9.423645743863924, -10.66980176143207, -5.66467325395815.

3.2 Define a function to select the values of our area.#

The next function will slice each dataset before loading it to reduce execution time and resources.

# Auxiliary function to slice each dataset to a particular region (CATALONIA) with rotated coordinates.
def cut_to_region(ds):
    ds = ds.sel(rlat = slice(RLAT_MIN, RLAT_MAX), rlon = slice(RLON_MIN, RLON_MAX))
    return ds

3.3 Define the do_annual_maximum function.#

First, we define the funcion that accumulates precipitation into the desired time duration, and extracts annual maxima. This function will be applied to all datasets (historical and future).

def do_annual_maximum(paths, durations, t_step, run):    
    Generates and saves annual maximum precipitation as a netCDF file.

        paths: a list of dataset's absolute paths to compute annual maximums.
        durations: list of durations (multiples of 3 or 24)
        t_step: time-step (3 or 24) of the raw data
        run: 'historical' or RCP_STR
    if len(paths) == 0:
        print('Path list is empty. Please make sure paths are well-defined.')
    for dur in durations:
        print(f'...for {dur}h duration...')
        ds_prec_max = None
        for path in paths:
            # Load a file and call the cut_to_region function to slice it.
            ds = xr.open_dataset(path, decode_coords='all')
            ds_i = cut_to_region(ds); del ds
            # Longitude of window to roll dataset
            window = int(dur/t_step)
            # Compute annual maximum
            da_i = ds_i.pr.rolling(time=window).sum().resample(time='YS').max(keep_attrs=True)
            # Convert units
            da_i = da_i * 3600 * t_step
            da_i.attrs['units'] = 'mm'
            # Concatenate dataarrays over years
            ds_prec_max = xr.concat([ds_prec_max, da_i], dim='time') if ds_prec_max is not None else da_i
        # Assign new 'duration' dimension to save this information and rename variable
        ds_prec_max = ds_prec_max.expand_dims(dim = {'duration': [dur]}, axis = 0).rename('pr_max')
        # Drop 'rotated_pole' coord and write CRS into the dataset.
        ds_prec_max = ds_prec_max.drop({'rotated_pole'}).rio.write_crs(CRS, inplace=True)
        # Name of new netCDF file
        filename = f'pr_annualMax_{dur}h_{GCM_str}_{run}_{ds_prec_max.time.dt.year.values[0]}-{ds_prec_max.time.dt.year.values[-1]}.nc'
        # Save file in data directory
        print(f'...saving file in {data_dir} directory as {filename}...')
        ds_prec_max.to_netcdf(os.path.join(data_dir, filename))                

3.4 Generate the annual maximum precipitation datasets with the do_annual_maximum function.#

Now, we will apply the function above to both the historical and the future 3-hour precipitation files. If we also want to compute the annual maximum precipitation for 6-hour and 12-hour durations, we need to specify it in the variable durations.


Although 24h precipitation can be obtained from both the 3h and the daily dataset, we propose here to use the second alternative.


This could take some time due to the large dataset the script is handling.

timestep = '3hours'
durations = [3, 6, 12] # Durations to compute from the 3h raw data.
for run in TIME_FRAMES.keys(): # historical and future
    print(f'Computing annual maximum precipitation for {run} experiment...')
    paths = sorted(glob.glob(os.path.join(general_data_dir, GCM_str, timestep, f'pr_*{run}*.nc')))
    do_annual_maximum(paths, durations, 3, run)
Computing annual maximum precipitation for historical experiment...
...for 3h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_3h_ichec-ec-earth_historical_1976-2005.nc...
...for 6h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_6h_ichec-ec-earth_historical_1976-2005.nc...
...for 12h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_12h_ichec-ec-earth_historical_1976-2005.nc...
Computing annual maximum precipitation for rcp85 experiment...
...for 3h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_3h_ichec-ec-earth_rcp85_2041-2070.nc...
...for 6h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_6h_ichec-ec-earth_rcp85_2041-2070.nc...
...for 12h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_12h_ichec-ec-earth_rcp85_2041-2070.nc...

And we will do the same but using as input the historical and future projections for 24-hour precipitation.

timestep = 'dailymean'
durations = [24] # Durations to compute from the daily raw data.
for run in TIME_FRAMES.keys(): # historical and future
    print(f'Computing annual maximum precipitation for {run} experiment...')
    paths = sorted(glob.glob(os.path.join(general_data_dir, GCM_str, timestep, f'pr_*{run}*.nc')))
    do_annual_maximum(paths, durations, 24, run)
Computing annual maximum precipitation for historical experiment...
...for 24h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_24h_ichec-ec-earth_historical_1976-2005.nc...
Computing annual maximum precipitation for rcp85 experiment...
...for 24h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_24h_ichec-ec-earth_rcp85_2041-2070.nc...

3.5 Visualise the temporal series of annual maximum precipitation for a selected pixel.#

Let us have a look to one of the new netCDF files that we have just created.

We will define a point in the usual latitude and longitude coordinates (within our area of interest) and plot (with the plotly library) the evolution of the annual maximum precipitation series. We can also choose which duration and time-frame (within the ones downloaded) to plot.

Automatically, the code below will save the figure in the plots directory for our region.

# Select a location with latitude and longitude coordinates
POINT = [41.5, 2.]; location = 'Terrassa'

# Reproject to match EURO-CORDEX CRS
RPOINT = transformer.transform(POINT[0], POINT[1])
# Choose which duration and time frame to plot (from the ones computed previously).
duration_plot = 3
start_year_plot = 2041
end_year_plot = 2070
experiment_plot = 'rcp85'
# Load the maximum precipitation file.
filename_prec_max_plot = os.path.join(data_dir, f'pr_annualMax_{duration_plot}h_{GCM_str}_{experiment_plot}_{start_year_plot}-{end_year_plot}.nc')

ds_prec_max_plot = xr.open_dataset(filename_prec_max_plot)
da_prec_max_plot = ds_prec_max_plot.pr_max.sel(duration=duration_plot, rlat = RPOINT[1], rlon = RPOINT[0], method = 'nearest')

# Converting dataarray to dataframe for plotly library to understand.
df_prec_max_plot = da_prec_max_plot.to_dataframe().reset_index()
# Define bar chart and customize it.
figure = px.bar(df_prec_max_plot, x="time", y="pr_max", title=f'Annual maximum precipitation for {duration_plot}h duration in {location}')
figure.update_layout(yaxis=dict(title="Precipitation (mm)"),
                     xaxis = dict(title="Year")) #, range=[10, max(df.precipitation)+5]))


# Save plot to plots directory