Extreme precipitation: Changes under climate scenarios worfklow [Hazard assessment]#
A workflow from the CLIMAAX Handbook and HEAVY RAINFALL GitHub repository.
See our how to use risk workflows page for information on how to run this notebook.
This section outlines the guidelines for generating rainfall datasets representing current and future climate scenarios. The steps detailed here involve extracting annual maximum precipitation values, fitting various statistical distributions, and calculating return periods for specific durations and scenarios across Europe.
To demonstrate the practical application of hazard data assessment, we provide a specific example focused on the Catalonia region in Spain. In this example, EURO-CORDEX climate projections for precipitation flux at a 12km spatial resolution have been utilised for assessing future climate hazards. The timeframe from 1976-2005 represents historical simulations, while the period from 2041-2070 is used for climate projections.
Note
To simplify your climate risk assessment of extreme precipitation, CRAHI-UPC has already prepared datasets for the European region for different Global/Regional chains, accessible at link. These datasets are ready to be used directly in your risk assessment. You can find detailed instructions on how to use these datasets in the “Extreme precipitation: Changes under climate scenarios workflow [Risk assessment]”
If you are following Path A, you do not need to execute the steps outlined on this page to begin your risk assessment. However, depending on your specific research needs, we encourage you to adapt the code for your local area and explore how critical impact rainfall thresholds may be affected under different climate change scenarios (refer to the previous section on critical thresholds for more details).
Prepare your workspace#
In this notebook we will use the following Python libraries:
cdsapi - A library to request data from the datasets listed in the CDS catalogue.
numpy - A powerful library for numerical computations in Python, widely used for array operations and mathematical functions.
xarray - Introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays for a more intuitive experience.
rioxarray - An extension of the xarray library that simplifies working with geospatial raster data.
scipy - Provides algorithms for optimization, statistics and many other classes of problems.
matplotlib - A versatile plotting library in Python, commonly used for creating static, animated, and interactive visualizations.
cartopy - A package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
contextily - To retrieve matplotlib compatible tile maps from the internet.
pyproj - A Python interface to PROJ (cartographic projections and coordinate transformations library).
plotly - A Python Graphing Library
Load libraries#
# Libraries to download data and manage files
import os
import cdsapi
import zipfile
import datetime
import glob
# Libraries for numerical computations, array manipulation and statistics.
import numpy as np
import xarray as xr
import scipy.stats as sst
# Libraries to handle geospatial data
import rioxarray as rio
import pyproj
# Libraries to plot maps, charts and tables
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.colors import from_levels_and_colors
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.util import add_cyclic_point
import plotly.express as px
import plotly.graph_objects as go
import contextily as ctx
# Choosing the matplotlib backend
%matplotlib inline
Select your area of interest#
Let us define the coordinates of an area of interest. Based on these coordinates we will be able to clip the datasets for further processing.
To easily define an area in terms of geographical coordinates, you can go to the Bounding Box Tool to select a region and get the coordinates. Make sure to select CSV in the lower left corner and copy the values in the brackets below.
# Define the BOUNDING BOX for your desired region (CATALONIA in this example).
bbox = [0, 40.2, 3.3, 43.5]; areaname = 'Catalonia'
Set your directory structure#
The next cell will create the extreme_precipitation_hazard folder in the directory where the notebook is saved. You can change the path defined in the variable workflow_dir
to create it elsewhere.
# Define the path for the extreme_precipitation_hazard folder.
workflow_dir = 'extreme_precipitation_hazard'
# Create the directory checking if it already exists.
os.makedirs(workflow_dir, exist_ok = True)
Now, we will create a subfolder for general data as well as specific subfolders to save data for our area of interest, plots generated by the code and new datasets computed within the workflow.
# Define directories for general data.
general_data_dir = os.path.join(workflow_dir, 'general_data')
# Define specific directories for the selected area
data_dir = os.path.join(workflow_dir, f'data_{areaname}')
results_dir = os.path.join(workflow_dir, f'results_{areaname}')
plots_dir = os.path.join(workflow_dir, f'plots_{areaname}')
# Create the directories checking if they already exist.
os.makedirs(general_data_dir, exist_ok = True)
os.makedirs(data_dir, exist_ok = True)
os.makedirs(results_dir, exist_ok = True)
os.makedirs(plots_dir, exist_ok = True)
Step 1: Download the EURO-CORDEX from the Climate Data Store.#
For this analysis, EURO-CORDEX climate projections for precipitation flux at a 12km spatial resolution have been employed. These projections are readily accessible to the public through the Climate Data Store (CDS) portal. The EURO-CORDEX data can be downloaded through the CDS API. Learn how to use it here.
For this example, we will guide you through downloading two different 30-year frames from the EURO-CORDEX precipitation data: one corresponding to daily precipitation; the other corresponding to 3 hours time resolution, needed for sub-daily analysis. The selected timeframes are 1976-2005 (baseline or historic simulations) and 2041-2070 (climate projections). We will select a specific combination of General Circulation Model (GCM), Regional Climate model (RCM), and a Representative Concentration Pathway (RCP).
Note
Feel free to explore the code and modify it according to your specific research needs (e.g., pair of models or scenarios). Explore the available data for CORDEX by referring to the resources found here. Notice that the start and end years differ from model to model.
Warning
EURO-CORDEX datasets are downloaded for all Europe (they cannot be pre-selected for our area), so it may take some hours to download, specially if you are interested in the 3-hour time-step
Define in the next cell the specifications of the dataset you want to download.
# Global options for EURO-CORDEX data
GCM = 'ichec_ec_earth'
RCM = 'knmi_racmo22e'
RCP = 'rcp_8_5'
# Define global variables of the time-frame years for further use on the workflow.
GCM_str = GCM.replace('_','-')
RCM_str = RCM.replace('_','-')
RCP_str = RCP.replace('_','')
TIME_FRAMES = {
'historical' :['1976', '2005'],
RCP_str : ['2041', '2070']
}
To request data with CDS API, a specific list of the years has to be provided. Define in the next cell the dictionary that contains the necessary years for each experiment (historical and rcp). Remember to check with the CDS Data Portal that years are well-defined.
# Define dictionary with listed years so CDS API understands the request.
YEARS_SPECIFICATION = {
'3_hours': { # The data is saved in 1-year files.
'historical': {
'start_year': [str(i) for i in range(1976, 2006)],
'end_year' : [str(i+1) for i in range(1976, 2006)]
},
RCP: {
'start_year': [str(i) for i in range(2041, 2071)],
'end_year' : [str(i+1) for i in range(2041, 2071)]
}
},
'daily_mean': { # The data is stored in 5-year files.
'historical': {
'start_year': [str(i) for i in range(1976, 2006, 5)],
'end_year': [str(i+4) for i in range(1976, 2006, 5)]
},
RCP: {
'start_year': [str(i) for i in range(2041, 2071, 5)],
'end_year': [str(i+4) for i in range(2041, 2071, 5)]
}
}
}
We will loop over the temporal resolutions and time frames and save the downloaded data into the general data directory. A subdirectory for the specific GCM and time-step will be also created to differentiate data if other datasets are later downloaded.
# Change the KEY to your own
URL = "https://cds.climate.copernicus.eu/api"
KEY = None
c = cdsapi.Client(url=URL, key=KEY)
for temp_res, runs in YEARS_SPECIFICATION.items():
for experiment, years in runs.items():
for y_start, y_end in zip(years['start_year'], years['end_year']):
# Define name and path for zip and subfolder for the files
zip_name_i = f"pr_{temp_res.replace('_','')}_{experiment.replace('_','')}.zip"
zip_path_i = os.path.join(general_data_dir, GCM_str, zip_name_i)
dir_path_i = os.path.join(general_data_dir, GCM_str, temp_res.replace('_',''))
os.makedirs(dir_path_i, exist_ok=True)
# Download from CDS
c.retrieve(
'projections-cordex-domains-single-levels',
{
'format': 'zip',
'domain': 'europe',
'experiment': experiment,
'horizontal_resolution': '0_11_degree_x_0_11_degree',
'temporal_resolution': temp_res,
'variable': 'mean_precipitation_flux',
'gcm_model': GCM,
'rcm_model': RCM,
'ensemble_member': 'r1i1p1',
'start_year': y_start,
'end_year': y_end
},
zip_path_i)
# Unzip the files in a specific directory within general data dir folder.
with zipfile.ZipFile(zip_path_i, 'r') as zObject:
zObject.extractall(path=dir_path_i)
# Remove zip file
os.remove(zip_path_i)
If you want to download just one specific subset of data from CDS, you can use the next code snippet.
# Change the KEY to your own
URL = "https://cds.climate.copernicus.eu/api"
KEY = None
c = cdsapi.Client(url=URL, key=KEY)
# Define zip file's absolute path
zip_path = os.path.join(general_data_dir, 'download.zip')
dir_path = os.path.join(general_data_dir, 'name_folder')
os.makedirs(dir_path, exist_ok=True)
c.retrieve(
'projections-cordex-domains-single-levels',
{
'format': '',
'domain': '',
'experiment': '',
'horizontal_resolution': '',
'temporal_resolution': '',
'variable': '',
'gcm_model': '',
'rcm_model': '',
'ensemble_member': '',
'start_year': '',
'end_year': '',
},
zip_path)
# Unzip the files in a specific directory within general data dir folder.
with zipfile.ZipFile(zip_path, 'r') as zObject:
zObject.extractall(path=dir_path)
# Remove zip file
os.remove(zip_path)
Step 2: Explore the data.#
The files from the CDS have a filename structure that describes the exact dataset contained. The general structure is the following:
pr_[domain]_[gcm]_[rcp]_[ensemble]_[rcm]_[version]_[temporal resolution]_[start day]_[end day]
Load one of the files and explore the content and structure of the dataset. Notice the dimensions, coordinates, data variables, indexes and attributes as well as the description of the spatial reference system in rotated_pole().
# Define the absolute path for a specific file
filename_precipitation_example = 'pr_EUR-11_ICHEC-EC-EARTH_rcp85_r1i1p1_KNMI-RACMO22E_v1_3hr_2041010100-2042010100.nc'
path_precipitation_example = os.path.join(general_data_dir, GCM_str, '3hours', filename_precipitation_example)
# Open the netCDF file as a dataset using xarray
dataset_precipitation_example = xr.open_dataset(path_precipitation_example, decode_cf = True)
# Display said dataset
dataset_precipitation_example
<xarray.Dataset> Size: 2GB Dimensions: (rlon: 424, rlat: 412, time: 2920, bnds: 2) Coordinates: * rlon (rlon) float64 3kB -28.38 -28.27 -28.16 ... 17.93 18.05 18.16 lon (rlat, rlon) float64 1MB ... * rlat (rlat) float64 3kB -23.38 -23.27 -23.16 ... 21.62 21.73 21.84 lat (rlat, rlon) float64 1MB ... * time (time) datetime64[ns] 23kB 2041-01-01T01:30:00 ... 2041-12-... Dimensions without coordinates: bnds Data variables: rotated_pole |S1 1B ... time_bnds (time, bnds) datetime64[ns] 47kB ... pr (time, rlat, rlon) float32 2GB ... Attributes: (12/25) Conventions: CF-1.4 contact: Erik van Meijgaard, KNMI, Regional Climat... experiment: RCP8.5 run experiment_id: rcp85 realization: 1 driving_experiment: ICHEC-EC-EARTH,rcp85,r1i1p1 ... ... knmi_model_comment: RACMO22E: baseline physics from ECMWF CY3... knmi_version_comment: v1: reference version for Europe and othe... knmi_grib_path: mos.knmi.nl:/climreg/CXEUR12/eCS6-v441-fE... creation_date: 2015-12-10T17:13:36Z c3s_disclaimer: "" tracking_id: hdl:21.14103/157ee1ba-f9ba-4f87-82f2-206f...
Step 3: Extract the temporal series of annual maximum precipitation for sub-daily and daily resolution.#
For this example, the annual maximum precipitation associated with a duration of 3 and 24 hours (derived from the 3-hour temporal step and the daily dataset respectively).
We will follow the next steps:
Reproject the coordinates of our area of interest to match the dataset’s reference system.
Define a function to cut/slice the original datasets (European scale).
Define a function called
do_annual_maximum
that:Computes the annual maximum precipitation for each time duration that we are interested.
Converts units of precipitation from kg m-2 s-1 to mm over each duration.
Saves the temporal series for each duration as a netCDF in the data directory for our region.
Apply
do_annual_maximum
to historical and future projections of EURO-CORDEX precipitation datasets.Visualise the temporal series for a specific point.
Some important functions from the xarray
library will be used:
sel
to select certain indexes of the dataset/dataarray by value.open_dataset
to open netCDF files.rolling
to roll over a dataarray within a specific dimension to apply an operation.resample
to group by year for performing an operation.to_netcdf
to save dataset/dataaray as netcdf.
In addition, the rioxarray
library is used to properly write the coordinate reference system (CRS) so that the new files can be well visualized with a GIS application.
3.1 Reproject bounding box coordinates.#
# Reproject usual coordinates (in WGS84) to EURO-CORDEX's CRS.
CRS = ccrs.RotatedPole(pole_latitude=39.25, pole_longitude=-162)
transformer = pyproj.Transformer.from_crs('epsg:4326',CRS)
# Original bounding box in usual lat/lon.
West, South, East, North = bbox
# New bounding box coordinates matching EURO-CORDEX CRS
RLON_MIN, RLAT_MIN, RLON_MAX, RLAT_MAX = transformer.transform_bounds(South, West, North, East)
print(f'The bounding box new coordinates in EURO-CORDEX reference system are \n {RLON_MIN}, {RLAT_MIN}, {RLON_MAX}, {RLAT_MAX}.')
The bounding box new coordinates in EURO-CORDEX reference system are
-13.82050453871809, -9.423645743863924, -10.66980176143207, -5.66467325395815.
3.2 Define a function to select the values of our area.#
The next function will slice each dataset before loading it to reduce execution time and resources.
# Auxiliary function to slice each dataset to a particular region (CATALONIA) with rotated coordinates.
def cut_to_region(ds):
ds = ds.sel(rlat = slice(RLAT_MIN, RLAT_MAX), rlon = slice(RLON_MIN, RLON_MAX))
return ds
3.3 Define the do_annual_maximum
function.#
First, we define the funcion that accumulates precipitation into the desired time duration, and extracts annual maxima. This function will be applied to all datasets (historical and future).
def do_annual_maximum(paths, durations, t_step, run):
"""
Generates and saves annual maximum precipitation as a netCDF file.
Parameters:
paths: a list of dataset's absolute paths to compute annual maximums.
durations: list of durations (multiples of 3 or 24)
t_step: time-step (3 or 24) of the raw data
run: 'historical' or RCP_STR
"""
if len(paths) == 0:
print('Path list is empty. Please make sure paths are well-defined.')
return
for dur in durations:
print(f'...for {dur}h duration...')
ds_prec_max = None
for path in paths:
# Load a file and call the cut_to_region function to slice it.
ds = xr.open_dataset(path, decode_coords='all')
ds_i = cut_to_region(ds); del ds
# Longitude of window to roll dataset
window = int(dur/t_step)
# Compute annual maximum
da_i = ds_i.pr.rolling(time=window).sum().resample(time='YS').max(keep_attrs=True)
# Convert units
da_i = da_i * 3600 * t_step
da_i.attrs['units'] = 'mm'
# Concatenate dataarrays over years
ds_prec_max = xr.concat([ds_prec_max, da_i], dim='time') if ds_prec_max is not None else da_i
# Assign new 'duration' dimension to save this information and rename variable
ds_prec_max = ds_prec_max.expand_dims(dim = {'duration': [dur]}, axis = 0).rename('pr_max')
# Drop 'rotated_pole' coord and write CRS into the dataset.
ds_prec_max = ds_prec_max.drop_vars({'rotated_pole'}).rio.write_crs(CRS, inplace=True)
# Name of new netCDF file
filename = f'pr_annualMax_{dur}h_{GCM_str}_{RCM_str}_{run}_{ds_prec_max.time.dt.year.values[0]}-{ds_prec_max.time.dt.year.values[-1]}.nc'
# Save file in data directory
print(f'...saving file in {data_dir} directory as {filename}...')
ds_prec_max.to_netcdf(os.path.join(data_dir, filename))
3.4 Generate the annual maximum precipitation datasets with the do_annual_maximum
function.#
Now, we will apply the function above to both the historical and the future 3-hour precipitation files. If you would like to derive the annual maximum precipitation for other durations from the 3-hour temporal step (e.g., 6-hour and 12-hour durations), you would need to specify it in the variable durations
.
Note
Although 24h precipitation can be obtained from both the 3h and the daily dataset, we propose here to use the second alternative.
Warning
This could take some time due to the large dataset the script is handling.
timestep = '3hours'
durations = [3]
for run in TIME_FRAMES.keys(): # historical and future
print(f'Computing annual maximum precipitation for {run} experiment...')
paths = sorted(glob.glob(os.path.join(general_data_dir, GCM_str, timestep, f'pr_*{run}*.nc')))
do_annual_maximum(paths, durations, 3, run)
Computing annual maximum precipitation for historical experiment...
...for 3h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_3h_ichec-ec-earth_knmi-racmo22e_historical_1976-2005.nc...
Computing annual maximum precipitation for rcp85 experiment...
...for 3h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_3h_ichec-ec-earth_knmi-racmo22e_rcp85_2041-2070.nc...
And we will do the same but using as input the historical and future projections for 24-hour precipitation.
timestep = 'dailymean'
durations = [24] # Durations to compute from the daily raw data.
for run in TIME_FRAMES.keys(): # historical and future
print(f'Computing annual maximum precipitation for {run} experiment...')
paths = sorted(glob.glob(os.path.join(general_data_dir, GCM_str, timestep, f'pr_*{run}*.nc')))
do_annual_maximum(paths, durations, 24, run)
Computing annual maximum precipitation for historical experiment...
...for 24h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_24h_ichec-ec-earth_knmi-racmo22e_historical_1976-2005.nc...
Computing annual maximum precipitation for rcp85 experiment...
...for 24h duration...
...saving file in extreme_precipitation_hazard/data_Catalonia directory as pr_annualMax_24h_ichec-ec-earth_knmi-racmo22e_rcp85_2041-2070.nc...
3.5 Visualise the temporal series of annual maximum precipitation for a selected pixel.#
Let us have a look to one of the new netCDF files that we have just created.
We will define a point in the usual latitude and longitude coordinates (within our area of interest) and plot (with the plotly
library) the evolution of the annual maximum precipitation series. We can also choose which duration and time-frame (within the ones downloaded) to plot.
Automatically, the code below will save the figure in the plots directory for our region.
# Select a location with latitude and longitude coordinates
POINT = [41.5, 2.]; location = 'Terrassa'
# Reproject to match EURO-CORDEX CRS
RPOINT = transformer.transform(POINT[0], POINT[1])
# Choose which duration and time frame to plot (from the ones computed previously).
duration_plot = 3
start_year_plot = 2041
end_year_plot = 2070
experiment_plot = 'rcp85'
# Load the maximum precipitation file.
filename_prec_max_plot = os.path.join(data_dir, f'pr_annualMax_{duration_plot}h_{GCM_str}_{RCM_str}_{experiment_plot}_{start_year_plot}-{end_year_plot}.nc')
ds_prec_max_plot = xr.open_dataset(filename_prec_max_plot)
da_prec_max_plot = ds_prec_max_plot.pr_max.sel(duration=duration_plot, rlat = RPOINT[1], rlon = RPOINT[0], method = 'nearest')
ds_prec_max_plot.close
# Converting dataarray to dataframe for plotly library to understand.
df_prec_max_plot = da_prec_max_plot.to_dataframe().reset_index()
# Define bar chart and customize it.
figure = px.bar(df_prec_max_plot, x="time", y="pr_max", title=f'Annual maximum precipitation for {duration_plot}h duration in {location}')
figure.update_layout(yaxis=dict(title="Precipitation (mm)"),
xaxis = dict(title="Year")) #, range=[10, max(df.precipitation)+5]))
figure.show()
# Save plot to plots directory
figure.write_image(os.path.join(plots_dir,
f'pr_annualMax_{duration_plot}h_{start_year_plot}-{end_year_plot}_{location}.png'))