My workflow

My workflow#

This template workflow should help us make sure that all the notebooks have the same structure.
The green ‘tip’ boxes are there with the example text. Make sure you delete them when you add your own text.

Useful links:

Risk assessment methodology#

Write here a description of the methodology.

Describe the workflow and the data that is used.

Desrcibe where the data can be found, is there an API to download it or the files can be downloaded from some data repository. Provide a link to the repository (as DOI if possible).

Tip

This is the example of the text cell with description of the data:

River flood extent and water depth: available from the JRC repository for different return periods. Flood extent map of 100m resolution
Land-use information: The land cover map and all the spatial projections of population and land cover are available from the JRC Data Catalogue
Flood damage: assessed as a combination between flood extent/water depths and damage curve (available here. For each pixel, the water depths are used as input in the damage curve to assess the damage, together with different land use and country.
Flood affected population is assessed by overlaying the European population density map at 100 m resolution with the flood inundation maps for a given return map. Another possible dataset is the Global Human Settlement Population dataset which also has 100m resolution as the JRC flood data.

And a cell with description of the work

Probabilistic assessment of flood damage is calculated for different return periods (i.e. 2, 5, 10, 20, 50, 100, 250 and 500 years). In this way, damage-probability curves can be obtained at the grid cell by interpolating the damage estimates between the different recurrence intervals considered. The expected annual damages at a given grid cell due to river flooding are thus the integral of the damage-probability curve. Flood protection can be included in the expected annual damages estimation by truncating the damage-probability curves at the corresponding protection level (e.g. design flood with return period of 100 years). The integral of the remaining part after truncation quantifies the expected annual damages and expected annual population affected caused by river flooding considering flood protection up to the design flood.
Similar to flood damages, population exposure probability functions can be derived for each grid cell within the modelled domain.

Preparation work#

Replace the libraries in the next two cells with the libraries used in the workflow

Load libraries#

In this notebook we will use the following Python libraries:

os - To create directories and work with files
pooch - To download and unzip the data
rasterio - To access and explore geospatial raster data in GeoTIFF format
xarray - To process the data and prepare it for damage calculation
rioxarray - Rasterio xarray extension - to make it easier to use GeoTIFF data with xarray
cartopy - To plot the maps
matplotlib - For plotting as well

import os
import pooch

import rasterio
from pathlib import Path
import rioxarray as rxr
import xarray as xr

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy

Create the directory structure#

In order for this workflow to work even if you download and use just this notebook, we need to set up the directory structure.
Next cell will create the directory called ‘my_workflow’ in the same directory where this notebook is saved.

Tip

Don’t forget! Replace my_workflow with the workflow name and delete this note

workflow_folder = 'my_workflow'
if not os.path.exists(workflow_folder):
    os.makedirs(workflow_folder)

Download data#

You can keep the text below if you want to use pooch for downloading. Otherwise add a text about the API and delete the pooch bit.

The data we are using is available as compressed ZIP files in the JRC data portal. Since there is no API to download this data, we can use pooch library to donwload and unzip the data.

Pooch will check if the zip file already exists by comparing the hash of the file with what is stored in the default and only download it if it is not already there.

data_dir = os.path.join(workflow_folder,'data')

Note that now we have a directory my_workflow/data where all the zip files and unzipped flood files are.
We can list all the files in the data_dir using the os library.

with os.scandir(data_dir) as entries:
    for entry in entries:
        print(entry.name)

First type of data#

First we need the information on land use. We will download it from the JRC data portal.

Tip

Here’s an example of the text and code Once the data is downloaded and unzipped, Pooch will list the content of the directory with the data.

url = 'https://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/LUISA/PrimaryOutput/Europe/REF-2014/JRC_LUISA_Output_land_use_ref_2014.zip'
pooch.retrieve(
    url=url,
    known_hash=None,
    path=data_dir,
    processor=pooch.Unzip(extract_dir='')
)

Explore the data#

Now that we have downloaded and unpacked the needed data, we can have a look what is inside.

Some data I#

Add text about your data here.

Explain the folder structure and file names.

All the downloaded files are stored in our data_dir folder, with filenames starting with: …
First we can explore one of them.

Tip

Explore the file content Fell free to explore the content and structure of the datasets.
Note the coordinates, dimensions and attributes!

Hint

Find the information about spatial references, statistics👆 (click) 👋 Click on spatial_ref 📄 show/hide attributes to see the spatial information
👋 Look at STATISTICS attributes to find minimum, maximum and other statistics

# code here to open and show the data

Some data II#

Add text about other dataset you are using here.

If there are more files in the directory, you can list the directory and explain what is what.

#with os.scandir(f'{data_dir}/other_data') as entries:
#    for entry in entries:
#        print(entry.name)

Let’s explore one of the dataset files, for example this one…
Write which library is used, especially if it is different from other datasets.

# code here to open and show the data

Process the data#

Explain why the processing is needed. For example:

if the data is global, we might need to crop all the datasets to the area of interest, set the coordinates of the bounding box in a separate cell
if datasets have different resolution and projection we need to reproject one of them and interpolate to the same resolution in order to be able to do computations.
try to use areas of one of the pilots in the examples

Explain which libraries are used in this step and why.

Tip

Here’s an example text:

If we have a closer look at the x and y dimensions of the datasets, we can see that the data has different resolutions. Flood extent maps are at 100 m resolution, while land use data is at 1km. We can use xarray to get them to the same resolution.

But first we need to clip them to the same area, so we don’t interpolate the whole global field.

For this we use the rioxarray library again.
minx and maxx are longitudes, and miny and maxy are latitudes of the bounding box.

In this example we are clipping the bounding box around city of Zilina in Slovakia.

xmin=18
ymin=48
xmax=20
ymax=50

# code cell with processing

Explanation of the processing step I#

Explain the processing step, what library is used and why.
Try to include links to the documentation about the functions you are using.

Tip

Here’s an example text:

Interpolate the land use data array

Next we need to interpolate the land use data into the flood map grid in order to be able to calculate the damage map.
We can use the xarray interp_like() function, that will interpolate the land use data into flood grid.

Since we don’t really want interpolate the values, we are using method ‘nearest’, to assing the values of the nearest grid points.

# code cell with processing, for example:
#flood_200_small_area_1km  = flood_200_small_area.interp_like(land_use_small_area, method='nearest')
#flood_200_small_area_1km

You may include a simple plot to quickly visualise the result. But make sure to explain what is plotted

#flood_200_small_area_1km.plot()

Calculate the indices#

Calculating some indices using the processed data is often part of the workflow.

Explain here what is calculated, include the links to the documentation about the methond.
Explain what libraries are used and include link to the documentation
Explain clearly what input fields are
Explain which functions are used and possibly their parameters
Explain what the output is

# code for the calculation

# another code cell

Plot the results#

Plot the results. Explain what libray is used and provide the link if it is not already there.

If the plotting code is large, break it into more cells and explain each part.

damagemap = rxr.open_rasterio('scenario_damagemap.tif')
damagemap.plot()

<matplotlib.collections.QuadMesh at 0x1934e7ed0>

../../../_images/96111d5a834355c73de3ccb7d6f60db4d8d86221f2a0bfc3106e1aa7adc14f94.png

Conclusions#

Some text about conclusions and lessons learned

Contributors#

authors, links to libraries documentation, references etc