Currently under initial development and not meant for wider use.
The repository is structured by folders
This guide is for contributors. If you are solely interested in using the resulting data we refer to the relases of the data on zenodo which come with a DOI and are thus citeable.
This repository is not a pure git repository. It is a datalad repository which uses git for code and other small text files and git-annex for data files and binary files (for this repository mainly pdf files). The files stored in git-annex are not part of this repository but are stored in a gin repository at gin.hemio.de.
To use the repository you need to have datalad installed. to clone the repository you can use the github url, but also the gin url.
datalad clone git@github.com:JGuetschow/UNFCCC_non-AnnexI_data.git <directory_name>
clones the repository into the folder <directory_name>. You can also clone via git clone
. This avoids error messages regarding git-annex. Cloning works from any sibling.
The data itself (meaning all binary files) is not downloaded automatically. Only simlinks are created on clone. Needed files can be obained using
datalad get <filename>
where <filename> can also be a folder to get all files within that folder. Datalad will look for a sibling that is accessible to you and provides the necessary data. In general that could also be the computer of another contributor, if that computer is accessible to you (which will normally not be the case). **NOTE: If you push to the github repository using dtalad your local clone will automatically become a sibling and of your machine is accessible from the outside it will also serve dat
For more detailed information on datalad we refer to the datalad handbook
The code is best run in a virtual environment. All python dependencies will be automatically installed when building the virtual environment using make venv
. If you don't wat to use a virtual environment you can find the dependencies in file code/requirements.txt
. As an external dependencies you need firefox-geckodriver and git-annex > XXX (2021 works, some 2020 versions also).
The code has not been tested under Windows and Mac OS.
TODO: develop a method to avoid conflicts here. e.G. only a few maintainers commit the raw data and all others use that or if they need updated raw data only use that locally.
To update BUR and NC submissions first make sure your branch is in sync with main
to avoid conflict when merging your branch later. To update the list of submissions run make update-bur
in the main project folder. This will create a new list of submissions. To actually download the files run make download-bur
The idea behind this data package is that several people contribute to extracting the data from pdf files such that for each user the work is less than for individual data reading and in the same time data quality improves through institutionalized data checking.
minimal requirements for use cases