Currently under initial development and not meant for wider use. code is based on national-inventory-submissions
The repository is structured by folders
All data in this repository in the comma-separated values (CSV) files is formatted consistently with the PRIMAP2 interchange format.
The data contained in each column is as follows:
Name of the data source. Four country specific datasets it is \<ISO3\>-GHG-inventory
, where \<ISO3\>
is the ISO 3166 three-letter country code. Specifications for composite datasets including several countries will be added when the datasets are available.
The scenario specifies the submissions (e.g. BUR1, NC5, or Inventory_2021 for a non-UNFCCC inventory)
Provenance of the data. Here: "derived" as it is a composite source.
ISO 3166 three-letter country codes.
Gas categories using global warming potentials (GWP) from either Second Assessment Report (SAR) or Fourth Assessment Report (AR4).
Code Description
CH4 Methane CO2 Carbon Dioxide N2O Nitrous Oxide HFCS (SARGWP100) Hydrofluorocarbons (SAR) HFCS (AR4GWP100) Hydrofluorocarbons (AR4) PFCS (SARGWP100) Perfluorocarbons (SAR) PFCS (AR4GWP100) Perfluorocarbons (AR4) SF6 Sulfur Hexafluoride NF3 Nitrogen Trifluoride FGASES (SARGWP100) Fluorinated Gases (SAR): HFCs, PFCs, SF$_6$, NF$_3$ FGASES (AR4GWP100) Fluorinated Gases (AR4): HFCs, PFCs, SF$_6$, NF$_3$ KYOTOGHG (SARGWP100) Kyoto greenhouse gases (SAR) KYOTOGHGAR4 (AR4GWP100) Kyoto greenhouse gases (AR4)
Table: Gas categories and underlying global warming potentials
Units are of the form Gg/Mt/... <substance> / yr where substance is the entity or for CO$_2$ equivalent units Gg/Mt/... CO2 / yr. The CO$_2$-equivalent is calculated according to the global warming potential indicated by the entity (see above).
Categories for emission as defined in terminology <term>. Terminology names are those used in the climate_categories package. If the terminology name contains _PRIMAP is means that some (sub)categories have been added to the official IPCC category hierarchy. Added categories outside the hierarchy begin with the prefix M.
Original name of the category as presented in the submission.
Optional column. In some cases original category names have been translated to english. In this case these translations are stored in this column.
Years (depending on dataset)
This guide is for contributors. If you are solely interested in using the resulting data we refer to the relases of the data on zenodo which come with a DOI and are thus citeable.
This repository is not a pure git repository. It is a datalad repository which uses git for code and other small text files and git-annex for data files and binary files (for this repository mainly pdf files). The files stored in git-annex are not part of this repository but are stored in a gin repository at gin.hemio.de.
To use the repository you need to have datalad installed. To clone the repository you can use the github url, but also the gin url.
datalad clone git@github.com:JGuetschow/UNFCCC_non-AnnexI_data.git <directory_name>
clones the repository into the folder <directory_name>. You can also clone via git clone
. This avoids error messages regarding git-annex. Cloning works from any sibling.
The data itself (meaning all binary and csv files) are not downloaded automatically. Only symlinks are created on clone. Needed files can be obained using
datalad get <filename>
where <filename> can also be a folder to get all files within that folder. Datalad will look for a sibling that is accessible to you and provides the necessary data. In general that could also be the computer of another contributor, if that computer is accessible to you (which will normally not be the case). NOTE: If you push to the github repository using datalad your local clone will automatically become a sibling and of your machine is accessible from the outside it will also serve data to others.
For more detailed information on datalad we refer to the datalad handbook
The code is best run in a virtual environment. All python dependencies will be automatically installed when building the virtual environment using make venv
. If you don't wat to use a virtual environment you can find the dependencies in file code/requirements.txt
. As an external dependencies you need firefox-geckodriver and git-annex > XXX (2021 works, some 2020 versions also).
The code has not been tested under Windows and Mac OS.
The maintainers of this repository will update the list of submissions and the downloaded pdf files frequently. However, in some cases you might want to have the data early and do the download yourself. To avoid merge conflicts, please do this on a clean branch in your fork and make sure your branch is in sync with main
.
make update-bur
in the main project folder. This will create a new list of submissions. To actually download the files run make download-bur
.make update-nc
in the main project folder. This will create a new list of submissions. To actually download the files run make download-nc
.make download-ndc
.All download scripts create files listing the new downloads in the folder downloaded_data/UNFCCC. the filenames use the format 00_new_downloads_<type>-YYYY-MM-DD.csv where <type> is bur, nc, or ndc. Currently, only one file per type and day is stored, so if you run the download script more than once on a day you will overwrite your frist file (likely with an empty file as you have already downloaded everything) (see also issue #2).
See section [Contributing] below.
The idea behind this data package is that several people contribute to extracting the data from pdf files such that for each user the work is less than for individual data reading and in the same time data quality improves through institutionalized data checking. You can contribute in defferent ways.
The easiest way to contribute to the repository is via anlysis of submissions for data coverage. Before selecting a submission for analysis check that it is not yet listed as analyzed in the submission overview issues.
###
minimal requirements for use cases