Datasets#

Workflows as presented in this handbook mostly rely on public datasets hosted, e.g., by the CDS. However, some workflows have specific data requirements (in terms of variables or resolution) that cannot easily be satisfied with public datasets or simply require large amounts of data as input. In these cases, preprocessed datasets and data samples can be provided by CLIMAAX developers and utilized in the workflows to lower the barrier of entry for users and to serve as blueprints for the input data required by the climate risk assessment workflows. Such data is hosted on the CLIMAAX cloud storage and described here.

../_images/data_manager.jpg

Fig. 28 Illustration created by Scriberia with The Turing Way community. CC-BY 4.0. 10.5281/zenodo.3332807#

Note

Datasets hosted on the CLIMAAX cloud storage are provided for the convenience of workflow users. This service should not be considered a long-term and reliable source of data. Datasets will be moved to a persistent and citable location in the future.

CLIMAAX datasets#

Datasets specifically realated to one of the CLIMAAX workflows:

Dataset mirrors#

Open datasets from elsewhere re-hosted for more convenient access in the CLIMAAX workflows:

How to access#

We provide file registries for use with the pooch Python package.

  1. Set up a download manager for a CLIMAAX dataset

    import pooch
    
    climaax_data = pooch.create(
        base_url="https://object-store.os-api.cci1.ecmwf.int/climaax/<DATASET-ID>/",
        path="." # set your download location
    )
    

    where <DATASET-ID> is a placeholder for the identifier of the accessed dataset.

  2. Download the registry.txt file from the corresponding dataset page. Load the pooch registry for the dataset (adapt the file path to your download location):

    climaax_data.load_registry("registry.txt")
    
  3. Individual files from the dataset can be downloaded with:

    climaax_data.fetch("path/to/file")
    

    and the registry can be used to easily fetch an entire dataset:

    for path in climaax_data.registry:
        climaax_data.fetch(path)
    

    Pooch will download the files into a folder structure as laid out by the dataset.

Each dataset page contains a listing of all files included in the dataset. Individual files can be downloaded via the links in the listings from a web browser or with tools such as wget and curl.