Przeglądaj źródła

Merge pull request #90 from JGuetschow/add-docs-and-tests

Add docs and tests
Johannes Gütschow 10 miesięcy temu
rodzic
commit
728aa86d49
100 zmienionych plików z 1018 dodań i 26075 usunięć
  1. 19 0
      .copier-answers.yml
  2. 41 0
      .github/ISSUE_TEMPLATE/bug.md
  3. 2 2
      .github/ISSUE_TEMPLATE/country-data-template-non-annexi.md
  4. 21 0
      .github/ISSUE_TEMPLATE/default.md
  5. 32 0
      .github/ISSUE_TEMPLATE/feature_request.md
  6. 63 0
      .github/actions/setup/action.yml
  7. 9 0
      .github/pull_request_template.md
  8. 67 0
      .github/workflows/bump.yaml
  9. 123 0
      .github/workflows/ci.yaml
  10. 33 0
      .github/workflows/deploy.yaml
  11. 60 0
      .github/workflows/install.yaml
  12. 46 0
      .github/workflows/release.yaml
  13. 156 14
      .gitignore
  14. 44 0
      .pre-commit-config.yaml
  15. 29 0
      .readthedocs.yaml
  16. 84 10
      Makefile
  17. 148 54
      README.md
  18. 0 92
      UNFCCC_GHG_data/UNFCCC_CRF_reader/CRF_raw_for_year.py
  19. 0 554
      UNFCCC_GHG_data/UNFCCC_CRF_reader/UNFCCC_CRF_reader_prod.py
  20. 0 13
      UNFCCC_GHG_data/UNFCCC_CRF_reader/__init__.py
  21. 0 2508
      UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/CRF2021_specification.py
  22. 0 2666
      UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/CRF2022_specification.py
  23. 0 2688
      UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/CRF2023_specification.py
  24. 0 10
      UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/__init__.py
  25. 0 31
      UNFCCC_GHG_data/UNFCCC_CRF_reader/read_UNFCCC_CRF_submission.py
  26. 0 33
      UNFCCC_GHG_data/UNFCCC_CRF_reader/read_UNFCCC_CRF_submission_datalad.py
  27. 0 31
      UNFCCC_GHG_data/UNFCCC_CRF_reader/read_new_UNFCCC_CRF_for_year.py
  28. 0 32
      UNFCCC_GHG_data/UNFCCC_CRF_reader/read_new_UNFCCC_CRF_for_year_datalad.py
  29. 0 33
      UNFCCC_GHG_data/UNFCCC_CRF_reader/test_read_UNFCCC_CRF_for_year.py
  30. 0 16
      UNFCCC_GHG_data/UNFCCC_CRF_reader/util.py
  31. 0 2323
      UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_config.py
  32. 0 30
      UNFCCC_GHG_data/UNFCCC_DI_reader/__init__.py
  33. 0 26
      UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country.py
  34. 0 22
      UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_datalad.py
  35. 0 25
      UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_group.py
  36. 0 25
      UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_group_datalad.py
  37. 0 27
      UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country.py
  38. 0 17
      UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country_datalad.py
  39. 0 19
      UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country_group.py
  40. 0 19
      UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country_group_datalad.py
  41. 0 19
      UNFCCC_GHG_data/UNFCCC_DI_reader/util.py
  42. 0 5
      UNFCCC_GHG_data/UNFCCC_downloader/__init__.py
  43. 0 195
      UNFCCC_GHG_data/UNFCCC_downloader/download_annexI.py
  44. 0 157
      UNFCCC_GHG_data/UNFCCC_downloader/download_btr.py
  45. 0 108
      UNFCCC_GHG_data/UNFCCC_downloader/download_ndc.py
  46. 0 141
      UNFCCC_GHG_data/UNFCCC_downloader/download_non-annexI.py
  47. 0 145
      UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_annexI.py
  48. 0 97
      UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_btr.py
  49. 0 86
      UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_bur.py
  50. 0 88
      UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_nc.py
  51. 0 291
      UNFCCC_GHG_data/UNFCCC_reader/Argentina/config_ARG_BUR5.py
  52. 0 404
      UNFCCC_GHG_data/UNFCCC_reader/Argentina/read_ARG_BUR4_from_pdf.py
  53. 0 118
      UNFCCC_GHG_data/UNFCCC_reader/Argentina/read_ARG_BUR5_from_csv.py
  54. 0 226
      UNFCCC_GHG_data/UNFCCC_reader/Burundi/read_BDI_BUR1_from_pdf.py
  55. 0 186
      UNFCCC_GHG_data/UNFCCC_reader/Chile/config_CHL_BUR4.py
  56. 0 281
      UNFCCC_GHG_data/UNFCCC_reader/Chile/read_CHL_BUR4_from_xlsx.py
  57. 0 285
      UNFCCC_GHG_data/UNFCCC_reader/Chile/read_CHL_BUR5_from_xlsx.py
  58. 0 249
      UNFCCC_GHG_data/UNFCCC_reader/Colombia/read_COL_BUR3_from_xlsx.py
  59. 0 679
      UNFCCC_GHG_data/UNFCCC_reader/Guinea/read_GIN_BUR1_from_pdf.py
  60. 0 377
      UNFCCC_GHG_data/UNFCCC_reader/Indonesia/read_IDN_BUR3_from_pdf.py
  61. 0 430
      UNFCCC_GHG_data/UNFCCC_reader/Israel/config_ISR_BUR2.py
  62. 0 301
      UNFCCC_GHG_data/UNFCCC_reader/Israel/read_ISR_BUR2_from_pdf.py
  63. 0 676
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR3.py
  64. 0 402
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR4.py
  65. 0 211
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR3_from_pdf.py
  66. 0 214
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR4_from_pdf.py
  67. 0 227
      UNFCCC_GHG_data/UNFCCC_reader/Mexico/read_MEX_BUR3_from_pdf.py
  68. 0 335
      UNFCCC_GHG_data/UNFCCC_reader/Mongolia/read_MNG_BUR2_from_pdf.py
  69. 0 67
      UNFCCC_GHG_data/UNFCCC_reader/Montenegro/config_MNE_BUR3.py
  70. 0 286
      UNFCCC_GHG_data/UNFCCC_reader/Montenegro/read_MNE_BUR3_from_pdf.py
  71. 0 141
      UNFCCC_GHG_data/UNFCCC_reader/Morocco/config_MAR_BUR3.py
  72. 0 324
      UNFCCC_GHG_data/UNFCCC_reader/Morocco/read_MAR_BUR3_from_pdf.py
  73. 0 458
      UNFCCC_GHG_data/UNFCCC_reader/Nigeria/config_NGA_BUR2.py
  74. 0 260
      UNFCCC_GHG_data/UNFCCC_reader/Nigeria/read_NGA_BUR2_from_pdf.py
  75. 0 290
      UNFCCC_GHG_data/UNFCCC_reader/Peru/read_PER_BUR3_from_pdf.py
  76. 0 448
      UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/config_KOR_BUR4.py
  77. 0 497
      UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/config_KOR_INV2023.py
  78. 0 313
      UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_2021-Inventory_from_xlsx.py
  79. 0 318
      UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_2022-Inventory_from_xlsx.py
  80. 0 333
      UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_2023-Inventory_from_xlsx.py
  81. 0 185
      UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_BUR4_from_xlsx.py
  82. 0 493
      UNFCCC_GHG_data/UNFCCC_reader/Singapore/config_SGP_BUR5.py
  83. 0 260
      UNFCCC_GHG_data/UNFCCC_reader/Singapore/read_SGP_BUR5_from_pdf.py
  84. 0 329
      UNFCCC_GHG_data/UNFCCC_reader/Taiwan/config_TWN_NIR2022.py
  85. 0 447
      UNFCCC_GHG_data/UNFCCC_reader/Taiwan/config_TWN_NIR2023.py
  86. 0 399
      UNFCCC_GHG_data/UNFCCC_reader/Taiwan/read_TWN_2022-Inventory_from_pdf.py
  87. 0 228
      UNFCCC_GHG_data/UNFCCC_reader/Taiwan/read_TWN_2023-Inventory_from_pdf.py
  88. 0 363
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR3.py
  89. 0 381
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR4.py
  90. 0 270
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR3_from_pdf.py
  91. 0 225
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR4_from_pdf.py
  92. 0 1
      UNFCCC_GHG_data/UNFCCC_reader/__init__.py
  93. 0 77
      UNFCCC_GHG_data/UNFCCC_reader/read_UNFCCC_submission.py
  94. 0 15
      UNFCCC_GHG_data/__init__.py
  95. 0 36
      UNFCCC_GHG_data/helper/__init__.py
  96. 0 22
      UNFCCC_GHG_data/helper/country_info.py
  97. 0 173
      UNFCCC_GHG_data/helper/definitions.py
  98. 0 23
      UNFCCC_GHG_data/helper/folder_mapping.py
  99. 0 160
      UNFCCC_GHG_data/helper/functions_temp.py
  100. 41 0
      changelog/README.md

+ 19 - 0
.copier-answers.yml

@@ -0,0 +1,19 @@
+# Changes here will be overwritten by Copier; NEVER EDIT MANUALLY
+_commit: v0.6.2
+_src_path: ../../../helper_tools/copier-core-python-repository/
+email: mail@johannes-guetschow.de
+initial_setup: false
+name: Johannes Gütschow
+notebook_dependencies: false
+pandas_doctests: false
+plot_dependencies: true
+project_description_short: Reading country greenhouse gas data submitted to the United
+    Nations Framework Convention on Climate Change (UNFCCC)in different submissions
+    and formats and providing it in a standadized nc and csv format compatible with
+    primap2. Data are read using different methods from APIs, xlsx and csv files as
+    well as pdf files.
+project_name_human: Country greenhouse gas data submitted to the UNFCCC
+project_name_pip: unfccc-ghg-data
+project_name_python: unfccc_ghg_data
+project_url: https://github.com/JGuetschow/UNFCCC_non-AnnexI_data
+track_lock_file: true

+ 41 - 0
.github/ISSUE_TEMPLATE/bug.md

@@ -0,0 +1,41 @@
+---
+name: Bug report
+about: Report a bug
+title: ''
+labels: bug
+assignees: ''
+
+---
+
+## Describe the bug
+<!--- A clear and concise description of what the bug is. -->
+
+## Failing Test
+<!---
+Please put the code (ideally in the form of a unit test) which fails below.
+
+e.g.
+
+```python
+def test_bug_12():
+    # Python code here which fails because of the bug
+    # This is best if other developers can simply copy and paste this test in
+    # order to run it
+```
+-->
+
+## Expected behavior
+<!--- A clear and concise description of what you expected to happen. -->
+
+## Screenshots
+<!--- If applicable, add screenshots to help explain your problem. -->
+
+## System
+<!--- Please complete the following information. -->
+
+ - OS: [e.g. Windows, Linux, macOS]
+ - Python version [e.g. Python 3.11]
+ - Please also upload your `poetry.lock` file (first run `poetry lock` to make sure the lock file is up-to-date)
+
+## Additional context
+<!--- Add any other context about the problem here. -->

+ 2 - 2
.github/ISSUE_TEMPLATE/country-data-template-non-annexi.md

@@ -33,11 +33,11 @@ Detailed data for 2015, less data for other years but main sectors present.
 The terminology is important as data in IPCC2006 categories has priority as it will currently not be made available through the UNFCCC interface.
 
 ### National communications (NC)
-* 
+*
 
 ### Biannial Update Reports (BUR)
 *
- 
+
 ### Nationally Determined Contributions (NDC)
 *
 

+ 21 - 0
.github/ISSUE_TEMPLATE/default.md

@@ -0,0 +1,21 @@
+---
+name: Default
+about: Report an issue or problem
+title: ''
+labels: triage
+assignees: ''
+
+---
+
+## The problem
+<!--- Useful to breakdown to "As a [persona], I [want to do], so that [reason] -->
+
+## Definition of "done"
+<!---
+What are the things that must be true in order to close this issue
+
+We find that describing these as dot points works well.
+-->
+
+## Additional context
+<!--- Add any additional context can go here -->

+ 32 - 0
.github/ISSUE_TEMPLATE/feature_request.md

@@ -0,0 +1,32 @@
+---
+name: Feature Request
+about: Request a feature or suggest an idea for this project
+title: ''
+labels: feature
+assignees: ''
+
+---
+
+## The motivation
+
+<!--- Useful to breakdown to "As a [persona], I [want to do], so that [reason] -->
+
+## The proposed solution
+
+<!---
+If you'd like, please provide a description of the solution you would like to see
+
+If you don't have any ideas for the solution, simply leave this blank
+-->
+
+## Alternatives
+
+<!---
+If you've considered any alternatives, please describe them here
+
+If you don't have any alternatives, simply leave this blank
+-->
+
+## Additional context
+
+<!--- Add any additional context can go here -->

+ 63 - 0
.github/actions/setup/action.yml

@@ -0,0 +1,63 @@
+name: "Setup Python and Poetry"
+description: "setup Python and Poetry with caches"
+
+inputs:
+  os:
+    description: "Operating system to use"
+    required: false
+    default: "ubuntu-latest"
+  python-version:
+    description: "Python version to use"
+    required: true
+  venv-id:
+    description: "ID to identify cached environment (should be unique from other steps)"
+    required: true
+  poetry-dependency-install-flags:
+    description: "Flags to pass to poetry when running `poetry install --no-interaction --no-root`"
+    required: true
+  run-poetry-install:
+    description: "Should we run the poetry install steps"
+    required: false
+    default: true
+
+
+runs:
+  using: "composite"
+  steps:
+    - name: Install poetry
+      shell: bash
+      run: |
+        pipx install poetry
+        which poetry
+        poetry --version  # Check poetry installation
+
+    - name: Set up Python ${{ inputs.python-version }}
+      id: setup-python
+      uses: actions/setup-python@v5
+      with:
+        python-version: ${{ inputs.python-version }}
+        cache: poetry
+    - name: Set Poetry environment
+      shell: bash
+      run: |
+        # This line used to be needed, but seems to have been
+        # sorted with newer poetry versions. We can still check whether
+        # the right version of python is used by looking at the output of
+        # `poetry run which python` below and whether the right version
+        # of python is used in the tests (or whatever step is being done)
+        # poetry env use "python${{ inputs.python-version }}"
+        poetry config virtualenvs.create true
+        poetry config virtualenvs.in-project true
+    - name: Install dependencies
+      if: ${{ (inputs.run-poetry-install == 'true')  && (steps.setup-python.outputs.cache-hit != 'true') }}
+      shell: bash
+      run: |
+        poetry install --no-interaction --no-root ${{ inputs.poetry-dependency-install-flags }}
+    # Now run same command but let the package install too
+    - name: Install package
+      # To ensure that the package is always installed, this step is run even if the cache was hit
+      if: ${{ inputs.run-poetry-install == 'true' }}
+      shell: bash
+      run: |
+        poetry install --no-interaction ${{ inputs.poetry-dependency-install-flags }}
+        poetry run python --version  # Check python version just in case

+ 9 - 0
.github/pull_request_template.md

@@ -0,0 +1,9 @@
+## Description
+
+## Checklist
+
+Please confirm that this pull request has done the following:
+
+- [ ] Tests added
+- [ ] Documentation added (where applicable)
+- [ ] Changelog item added to `changelog/`

+ 67 - 0
.github/workflows/bump.yaml

@@ -0,0 +1,67 @@
+name: Bump version
+
+on:
+  workflow_dispatch:
+    inputs:
+      bump_rule:
+        type: choice
+        description: How to bump the project's version (see https://python-poetry.org/docs/cli/#version)
+        options:
+          - patch
+          - minor
+          - major
+          - prepatch
+          - preminor
+          - premajor
+          - prerelease
+        required: true
+
+jobs:
+  bump_version:
+    name: "Bump version and create changelog"
+    if: "!startsWith(github.event.head_commit.message, 'bump:')"
+    runs-on: ubuntu-latest
+    env:
+      CI_COMMIT_EMAIL: "ci-runner@unfccc-ghg-data.invalid"
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+          token: "${{ secrets.PERSONAL_ACCESS_TOKEN }}"
+
+      # towncrier imports the package as part of building so we have to
+      # install the pacakage (to keep things slim, we only install the main
+      # dependencies, which also means that we get a test that we can import
+      # the package with only the compulsory dependencies installed for free)
+      - uses: ./.github/actions/setup
+        with:
+          python-version: "3.11"
+          venv-id: "bump"
+          poetry-dependency-install-flags: "--only main"
+
+      - name: Install towncrier
+        run: |
+          poetry run pip install towncrier
+
+      - name: Create bump and changelog
+
+        run: |
+          git config --global user.name "$GITHUB_ACTOR"
+          git config --global user.email "$CI_COMMIT_EMAIL"
+
+          # Bump
+          BASE_VERSION=`poetry version -s`
+          NEW_VERSION=`poetry version -s ${{ github.event.inputs.bump_rule }}`
+          echo "Bumping version $BASE_VERSION > $NEW_VERSION"
+          poetry run towncrier build --yes --version v$NEW_VERSION
+          git commit -a -m "bump: version $BASE_VERSION -> $NEW_VERSION"
+          git tag v$NEW_VERSION
+
+          # Bump to alpha (so that future commits do not have the same
+          # version as the tagged commit)
+          BASE_VERSION=`poetry version -s`
+          NEW_VERSION=`poetry version -s prerelease`
+          echo "Bumping version $BASE_VERSION > $NEW_VERSION"
+          git commit -a -m "bump(pre-release): version $BASE_VERSION > $NEW_VERSION"
+          git push && git push --tags

+ 123 - 0
.github/workflows/ci.yaml

@@ -0,0 +1,123 @@
+name: CI
+
+on:
+  pull_request:
+  push:
+    branches: [main]
+    tags: ['v*']
+
+jobs:
+#  mypy:
+#    if: ${{ !github.event.pull_request.draft }}
+#    runs-on: ubuntu-latest
+#    steps:
+#      - name: Check out repository
+#        uses: actions/checkout@v3
+#      - uses: ./.github/actions/setup
+#        with:
+#          os: "ubuntu-latest"
+#          python-version: "3.9"
+#          venv-id: "docs"
+#          poetry-dependency-install-flags: "--all-extras --only 'main,dev'"
+#      - name: mypy
+#        run: MYPYPATH=stubs poetry run mypy src
+
+  docs:
+    if: ${{ !github.event.pull_request.draft }}
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v3
+      - uses: ./.github/actions/setup
+        with:
+          os: "ubuntu-latest"
+          python-version: "3.9"
+          venv-id: "docs"
+          poetry-dependency-install-flags: "--all-extras --only 'main,docs'"
+      - name: docs
+        run: poetry run sphinx-build -W --keep-going -T -b html docs/source docs/build
+
+#  tests:
+#    strategy:
+#      fail-fast: false
+#      matrix:
+#        os: [ "ubuntu-latest" ]
+#        python-version: [ "3.9", "3.10", "3.11" ]
+#    runs-on: "${{ matrix.os }}"
+#    defaults:
+#      run:
+#        # This might be needed for Windows and doesn't seem to affect unix-based systems
+#        # so we include it. If you have better proof of whether this is needed or not,
+#        # feel free to update.
+#        shell: bash
+#    steps:
+#      - name: Check out repository
+#        uses: actions/checkout@v3
+#      - uses: ./.github/actions/setup
+#        with:
+#          os: "${{ matrix.os }}"
+#          python-version: "${{ matrix.python-version }}"
+#          venv-id: "tests-${{ runner.os }}"
+#          poetry-dependency-install-flags: "--all-extras"
+#      - name: Run tests
+#        run: |
+#          poetry run pytest -r a -v src tests --doctest-modules --cov=src --cov-report=term-missing --cov-report=xml
+#          poetry run coverage report
+#      - name: Upload coverage reports to Codecov
+#        uses: codecov/codecov-action@v3
+
+#  imports-without-extras:
+#    strategy:
+#      fail-fast: false
+#      matrix:
+#        os: [ "ubuntu-latest" ]
+#        python-version: [ "3.9", "3.10", "3.11" ]
+#    runs-on: "${{ matrix.os }}"
+#    steps:
+#      - name: Check out repository
+#        uses: actions/checkout@v3
+#      - uses: ./.github/actions/setup
+#        with:
+#          python-version: "${{ matrix.python-version }}"
+#          venv-id: "imports-without-extras-${{ runner.os }}"
+#          poetry-dependency-install-flags: "--only main"
+#      - name: Check importable without extras
+#        run: poetry run python scripts/test-install.py
+#
+#  check-build:
+#    runs-on: ubuntu-latest
+#    steps:
+#      - name: Check out repository
+#        uses: actions/checkout@v3
+#      - uses: ./.github/actions/setup
+#        with:
+#          python-version: "3.9"
+#          venv-id: "check-build-${{ runner.os }}"
+#          run-poetry-install: false
+#          poetry-dependency-install-flags: "not used"
+#      - name: Build package
+#        run: |
+#          poetry build --no-interaction
+#      - name: Check build
+#        run: |
+#          tar -tvf dist/unfccc_ghg_data-*.tar.gz --wildcards '*unfccc_ghg_data/py.typed'
+#          tar -tvf dist/unfccc_ghg_data-*.tar.gz --wildcards 'unfccc_ghg_data-*/LICENCE'
+
+
+#  check-dependency-licences:
+#    runs-on: ubuntu-latest
+#    steps:
+#      - name: Check out repository
+#        uses: actions/checkout@v3
+#      - uses: ./.github/actions/setup
+#        with:
+#          python-version: "3.9"
+#          venv-id: "licence-check"
+#          poetry-dependency-install-flags: "--all-extras"
+#      - name: Check licences of dependencies
+#        shell: bash
+#        run: |
+#          TEMP_FILE=$(mktemp)
+#          poetry export --without=tests --without=docs --without=dev > $TEMP_FILE
+#          poetry run liccheck -r $TEMP_FILE -R licence-check.txt
+#          cat licence-check.txt

+ 33 - 0
.github/workflows/deploy.yaml

@@ -0,0 +1,33 @@
+name: Deploy
+
+on:
+  release:
+    types: [published]
+
+defaults:
+  run:
+    shell: bash
+
+jobs:
+  deploy-pypi:
+    name: Deploy to PyPI
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+      - uses: ./.github/actions/setup
+        with:
+          python-version: "3.9"
+          venv-id: "deploy"
+          poetry-dependency-install-flags: "--all-extras"
+      - name: Run tests
+        run: |
+          poetry run pytest -r a src tests --doctest-modules
+      - name: Publish to PyPI
+        env:
+          PYPI_TOKEN: "${{ secrets.PYPI_TOKEN }}"
+        run: |
+          poetry config pypi-token.pypi $PYPI_TOKEN
+          poetry publish --build --no-interaction

+ 60 - 0
.github/workflows/install.yaml

@@ -0,0 +1,60 @@
+name: Test installation
+
+on:
+  workflow_dispatch:
+  schedule:
+    # * is a special character in YAML so you have to quote this string
+    - cron:  '0 0 * * 3'
+
+jobs:
+  test-pypi-install:
+    name: Test PyPI install (${{ matrix.python-version }}, ${{ matrix.os }})
+    runs-on: "${{ matrix.os }}"
+    strategy:
+      fail-fast: false
+      matrix:
+        os: ["ubuntu-latest", "macos-latest", "windows-latest"]
+        python-version: [ "3.9", "3.10", "3.11" ]
+    steps:
+    - name: Set up Python "${{ matrix.python-version }}"
+      id: setup-python
+      uses: actions/setup-python@v4
+      with:
+        python-version: "${{ matrix.python-version }}"
+    - name: Install
+      run: |
+        pip install --upgrade pip
+        pip install unfccc-ghg-data
+    - name: Checkout repository
+      uses: actions/checkout@v3
+    - name: Test installation
+      run: |
+        which python
+        python scripts/test-install.py
+
+  test-micromamba-installation:
+    name: Test (micro)mamba install (${{ matrix.python-version }}, ${{ matrix.os }})
+    runs-on: "${{ matrix.os }}"
+    strategy:
+      fail-fast: false
+      matrix:
+        os: ["ubuntu-latest", "macos-latest", "windows-latest"]
+        python-version: [ "3.9", "3.10", "3.11" ]
+
+    steps:
+    - name: Setup (micro)mamba and install package
+      uses: mamba-org/setup-micromamba@v1
+      with:
+        environment-name: test-mamba-install
+        create-args: >-
+          python=${{ matrix.python-version }}
+          -c conda-forge
+          unfccc-ghg-data
+        init-shell: bash
+    - name: Checkout repository
+      uses: actions/checkout@v3
+    - name: Test installation
+      shell: bash -leo pipefail {0}
+      run: |
+        which python
+        python scripts/test-install.py

+ 46 - 0
.github/workflows/release.yaml

@@ -0,0 +1,46 @@
+name: Release
+
+on:
+  push:
+    tags: ['v*']
+
+defaults:
+  run:
+    shell: bash
+
+jobs:
+  draft-release:
+    name: Create draft release
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+      - uses: ./.github/actions/setup
+        with:
+          python-version: "3.9"
+          venv-id: "release"
+          poetry-dependency-install-flags: "--all-extras"
+      - name: Add version to environment
+        run: |
+          PROJECT_VERSION=$(poetry version --short)
+          echo "PROJECT_VERSION=$PROJECT_VERSION" >> $GITHUB_ENV
+      - name: Run tests
+        run: |
+          poetry run pytest -r a -v src tests --doctest-modules
+      - name: Build package
+        run: |
+          poetry build --no-interaction
+      - name: Generate Release Notes
+        run: |
+          git log $(git describe --tags --abbrev=0 HEAD^)..HEAD --pretty='format:* %h %s' --no-merges >> ".github/release_template.md"
+      - name: Create Release Draft
+        uses: softprops/action-gh-release@v1
+        with:
+          body_path: ".github/release_template.md"
+          token: "${{ secrets.PERSONAL_ACCESS_TOKEN }}"
+          draft: true
+          files: |
+            dist/unfccc_ghg_data-${{ env.PROJECT_VERSION }}-py3-none-any.whl
+            dist/unfccc_ghg_data-${{ env.PROJECT_VERSION }}.tar.gz

+ 156 - 14
.gitignore

@@ -1,18 +1,160 @@
-.idea
-.DS_Store
-venv
+# temporary
+src/unfccc_ghg_data/datasets
+
+# logging
+log/*
 geckodriver.log
-__pycache__
+
+# private dev code
 /JG_test_code/
-.doit.db
-.doit.db.db
-log/*
-UNFCCC_GHG_data/datasets
-UNFCCC_GHG_data/UNFCCC_DI_reader/test_UNFCCC_DI_reader.ipynb
-UNFCCC_GHG_data/UNFCCC_DI_reader/.ipynb_checkpoints/
-*.autosave
-#UNFCCC_GHG_data/UNFCCC_DI_reader
-build
-UNFCCC_GHG_data.egg-info
 
+# Notebooks
+*.ipynb
+
+# Databases
+*.db
+
+# Jupyter cache
+.jupyter_cache
+
+# Ruff cache
+.ruff_cache
+
+# Licence check
+licence-check.txt
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# pycharm settings
+.idea
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
 
+# Mac stuff
+*.DS_Store

+ 44 - 0
.pre-commit-config.yaml

@@ -0,0 +1,44 @@
+# See https://pre-commit.com for more information
+ci:
+  autofix_prs: false
+  autoupdate_schedule: quarterly
+  autoupdate_branch: pre-commit-autoupdate
+
+# See https://pre-commit.com/hooks.html for more hooks
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: 'v4.5.0'
+    hooks:
+      - id: check-added-large-files
+      - id: check-ast
+      - id: check-case-conflict
+      - id: check-json
+      - id: check-merge-conflict
+      #- id: check-symlinks
+      - id: check-yaml
+      - id: debug-statements
+      - id: detect-private-key
+      - id: end-of-file-fixer
+        exclude: ".json|.yaml"
+      - id: fix-byte-order-marker
+      - id: mixed-line-ending
+      - id: trailing-whitespace
+        exclude: ".json|.yaml"
+  - repo: local
+    hooks:
+      # Prevent committing .rej files
+      - id: forbidden-files
+        name: forbidden files
+        entry: found Copier update rejection files; review them and remove them
+        language: fail
+        files: "\\.rej$"
+  - repo: https://github.com/charliermarsh/ruff-pre-commit
+    rev: 'v0.1.8'
+    hooks:
+      - id: ruff
+        args: [ --fix, --exit-non-zero-on-fix ]
+      - id: ruff-format
+  - repo: https://github.com/python-poetry/poetry
+    rev: '1.7.0'
+    hooks:
+      - id: poetry-check

+ 29 - 0
.readthedocs.yaml

@@ -0,0 +1,29 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Thank you also https://browniebroke.com/blog/specify-docs-dependency-groups-with-poetry-and-read-the-docs/
+
+# Required
+version: 2
+
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.11"
+  jobs:
+    post_create_environment:
+      - pip install poetry
+      - poetry config virtualenvs.create false
+    post_install:
+      # RtD seems to be not happy with poetry installs,
+      # hence use pip directly instead.
+      - poetry export -f requirements.txt --output requirements.txt --with docs
+      - python -m pip install -r requirements.txt
+      - python -m pip install .
+      - python -m pip list
+
+# Set sphinx configuration
+sphinx:
+   configuration: docs/source/conf.py

+ 84 - 10
Makefile

@@ -1,10 +1,84 @@
-.SILENT: help
-help:
-	echo Options:
-	echo make venv: create virtual environment
-
-venv: UNFCCC_GHG_data
-	[ -d ./venv ] || python3 -m venv venv
-	./venv/bin/pip install --upgrade pip
-	./venv/bin/pip install -Ur code/requirements.txt
-	touch venv
+# Makefile to help automate key steps
+
+.DEFAULT_GOAL := help
+# Will likely fail on Windows, but Makefiles are in general not Windows
+# compatible so we're not too worried
+TEMP_FILE := $(shell mktemp)
+
+# A helper script to get short descriptions of each target in the Makefile
+define PRINT_HELP_PYSCRIPT
+import re, sys
+
+for line in sys.stdin:
+	match = re.match(r'^([\$$\(\)a-zA-Z_-]+):.*?## (.*)$$', line)
+	if match:
+		target, help = match.groups()
+		print("%-30s %s" % (target, help))
+endef
+export PRINT_HELP_PYSCRIPT
+
+
+.PHONY: help
+help:  ## print short description of each target
+	@python3 -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST)
+
+.PHONY set_env:
+set_env:  ## set the environment variable
+    export UNFCCC_GHG_ROOT_PATH = .
+
+.PHONY: checks
+checks:  ## run all the linting checks of the codebase
+	@echo "=== pre-commit ==="; poetry run pre-commit run --all-files || echo "--- pre-commit failed ---" >&2; \
+		##echo "=== mypy ==="; MYPYPATH=stubs poetry run mypy src || echo "--- mypy failed ---" >&2; \
+		echo "======"
+
+.PHONY: ruff-fixes
+ruff-fixes:  ## fix the code using ruff
+    # format before and after checking so that the formatted stuff is checked and
+    # the fixed stuff is formatted
+	poetry run ruff format src tests scripts docs/source/conf.py
+	poetry run ruff src tests scripts docs/source/conf.py  --fix
+	poetry run ruff format src tests scripts docs/source/conf.py
+
+.PHONY: ruff-fixes-current
+ruff-fixes-current:  ## fix the code using ruff
+	poetry run ruff src/unfccc_ghg_data/unfccc_reader --fix
+
+
+.PHONY: test
+test:  ## run the tests
+	poetry run pytest src tests -r a -v --doctest-modules --cov=src
+
+# Note on code coverage and testing:
+# You must specify cov=src as otherwise funny things happen when doctests are
+# involved.
+# If you want to debug what is going on with coverage, we have found
+# that adding COVERAGE_DEBUG=trace to the front of the below command
+# can be very helpful as it shows you if coverage is tracking the coverage
+# of all of the expected files or not.
+# We are sure that the coverage maintainers would appreciate a PR that improves
+# the coverage handling when there are doctests and a `src` layout like ours.
+
+.PHONY: docs
+docs:  ## build the docs
+	poetry run sphinx-build -T -b html docs/source docs/build/html
+
+.PHONY: changelog-draft
+changelog-draft:  ## compile a draft of the next changelog
+	poetry run towncrier build --draft
+
+.PHONY: licence-check
+licence-check:  ## Check that licences of the dependencies are suitable
+	# Will likely fail on Windows, but Makefiles are in general not Windows
+	# compatible so we're not too worried
+	poetry export --without=tests --without=docs --without=dev > $(TEMP_FILE)
+	poetry run liccheck -r $(TEMP_FILE) -R licence-check.txt
+	rm -f $(TEMP_FILE)
+
+.PHONY: virtual-environment
+virtual-environment:  ## update virtual environment, create a new one if it doesn't already exist
+	poetry lock
+	# Put virtual environments in the project
+	poetry config virtualenvs.in-project true
+	poetry install --all-extras
+	poetry run pre-commit install

+ 148 - 54
README.md

@@ -1,77 +1,93 @@
-# Collaborative UNFCCC non-AnnexI dataset
+# Country greenhouse gas data submitted to the UNFCCC
 This repository aims to organize a collective effort to bring GHG emissions and related data submitted by developing countries (non-AnnexI) to the UNFCCC into a standardized machine readable format. We focus on data not available through the [UNFCCC DI interface](https://di.unfccc.int/) which is mostly data submitted in IPCC 2006 categories.
 
-The code is based on [national-inventory-submissions](https://github.com/openclimatedata/national-inventory-submisions)
+<!--- sec-begin-description -->
 
+Reading country greenhouse gas data submitted to the United Nations Framework Convention on Climate Change (UNFCCC)in different submissions and formats and providing it in a standadized nc and csv format compatible with primap2. Data are read using different methods from APIs, xlsx and csv files as well as pdf files.
 
-**The repository is currently under initial development so a lot of things are still subject to change.**
 
-## Description
-### Repository structure
-The repository is structured by folders. Here we list the folders in order of processing.
 
-* **downloaded_data** This folder contains data downloaded from the UNFCCC website and other sources. For Biannual Update Reports (BUR), national Communications (NC), and Nationally Determined Contributions (NDC) an automatical dowloaded exists (folder UNFCCC). Within the UNFCCC folder the data is organized in a *\<country\>/\<submission\>* structure. NDC submissions are often revised. To be able to keep track of the targets and emissions inventories we store each NDC revision in a time-stamped folder. The *non-UNFCCC* folder contains official country inventories not (yet) submitted to the UNFCCC. The internal structure is the same as for the UNFCCC folder.
-* **analyzed_submissions** Here we collect all files needed to extract data from submissions. Subfolders are countries (use the same names as in the *downloaded data* folder) and within the country folders each submission / report should have it's own subfolder, e.g. *Argentina/BUR1*. National Inventory Reports (NIR) are submitted together with BURs or NCs and have no individual folder but are used as additional inputs to their BUR or NC. As the repository is in the process of being set up, there currently is no data available.
-* **extracted_data** This folder holds all extracted datasets in PRIMAP2 interchange format. The datasets are organized in country subfolders. The naming convention for the datasets is the following: *\<iso\>\_\<sub\>\_\<year\>_\<term\>* where *\<iso\>* is the countries 3 letter iso code, *\<sub\>* is the submissions, e.g. **BUR1**, **NC5**, or **inventory2020** (for a non-UNFCCC inventory), *\<year\>* is the year of publication, and *\<term\>* is the main sector terminology e.g. IPCC2006 or IPCC1996. As the repository is in the process of being set up, there currently is no data available.
-* **code** Code that is used for several countries / reports, but not (yet) part of the primap2 package. This folder also contains scripts that automate data reading for all analyzed submissions or subsets (e.g. all first BURs) and code to generate composite datasets. Currently the only subfolder is the *UNFCCC_downloader* where code to automatically download BUR and NC submission files from the [UNFCCC website](https://www.unfccc.int) resides.
-* **composite_datasets** This folder contains generated composite datasets in PRIMAP2 interchnage format. Each dataset has it's own subfolder which should contain a dataset name, a version, and publication date (e.g. year). As the repository is in the process of being set up, there currently is no data available.
-* **legacy_data** This folder holds all extracted datasets in PRIMAP2 interchange format. The datasets are organized in country subfolders. The naming convention for the datasets is the following: *\<iso\>\_\<sub\>\_\<year\>\_\<term\>\_\<extra\>* where *\<iso\>* is the countries 3 letter iso code, *\<sub\>* is the submissions, e.g. **BUR1**, **NC5**, or **inventory2020** (for a non-UNFCCC inventory), *\<year\>* is the year of publication, *\<term\>* is the main sector terminology e.g. IPCC2006 or IPCC1996, and *\<extra\>* is a free identifier to distinguish several files for the same submission (in some cases data for e.g. fluorinated gases are in a separate file). This folder also holds data where the code or some input files are not publicly available. Our aim is to reduce data in this folder to zero and to create fully open source processes for all datasets such that they can be included in the main folder.
+[![CI](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/actions/workflows/ci.yaml)
+[![Coverage](https://codecov.io/gh/JGuetschow/UNFCCC_non-AnnexI_data/branch/main/graph/badge.svg)](https://codecov.io/gh/JGuetschow/UNFCCC_non-AnnexI_data)
+[![Docs](https://readthedocs.org/projects/unfccc-ghg-data/badge/?version=latest)](https://unfccc-ghg-data.readthedocs.io)
+
+**PyPI :**
+[![PyPI](https://img.shields.io/pypi/v/unfccc-ghg-data.svg)](https://pypi.org/project/unfccc-ghg-data/)
+[![PyPI: Supported Python versions](https://img.shields.io/pypi/pyversions/unfccc-ghg-data.svg)](https://pypi.org/project/unfccc-ghg-data/)
+[![PyPI install](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/actions/workflows/install.yaml/badge.svg?branch=main)](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/actions/workflows/install.yaml)
+
+**Other info :**
+[![Licence](https://img.shields.io/github/license/JGuetschow/UNFCCC_non-AnnexI_data.svg)](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/blob/main/LICENCE)
+[![Last Commit](https://img.shields.io/github/last-commit/JGuetschow/UNFCCC_non-AnnexI_data.svg)](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/commits/main)
+[![Contributors](https://img.shields.io/github/contributors/JGuetschow/UNFCCC_non-AnnexI_data.svg)](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/graphs/contributors)
+
 
-### Data format description (columns)
-All data in this repository in the comma-separated values (CSV) files is formatted consistently with the PRIMAP2 interchange format.
+<!--- sec-end-description -->
 
-The data contained in each column is as follows:
+Full documentation can be found at:
+[unfccc-ghg-data.readthedocs.io](https://unfccc-ghg-data.readthedocs.io/en/latest/).
+We recommend reading the docs there because the internal documentation links
+don't render correctly on GitHub's viewer.
 
-#### "source"
-Name of the data source. Four country specific datasets it is `\<ISO3\>-GHG-inventory`, where `\<ISO3\>` is the ISO 3166 three-letter country code. Specifications for composite datasets including several countries will be added when the datasets are available.
+## Installation
 
-#### "scenario (PRIMAP)"
-The scenario specifies the submissions (e.g. BUR1, NC5, or Inventory_2021 for a non-UNFCCC inventory)
+<!--- sec-begin-installation -->
 
-#### "provenance"
-Provenance of the data. Here: "derived" as it is a composite source.
+Country greenhouse gas data submitted to the UNFCCC can be installed with pip, mamba or conda:
 
-#### "country (ISO3)"
-ISO 3166 three-letter country codes.
+```bash
+pip install unfccc-ghg-data
+mamba install -c conda-forge unfccc-ghg-data
+conda install -c conda-forge unfccc-ghg-data
+```
 
-#### "entity"
-Gas categories using global warming potentials (GWP) from either Second Assessment Report (SAR) or Fourth Assessment Report (AR4).
+Additional dependencies can be installed using
 
-Code                     Description
-----                     -----------
-CH4                      Methane
-CO2                      Carbon Dioxide
-N2O                      Nitrous Oxide
-HFCS (SARGWP100)         Hydrofluorocarbons (SAR)
-HFCS (AR4GWP100)         Hydrofluorocarbons (AR4)
-PFCS (SARGWP100)         Perfluorocarbons (SAR)
-PFCS (AR4GWP100)         Perfluorocarbons (AR4)
-SF6                      Sulfur Hexafluoride
-NF3                      Nitrogen Trifluoride
-FGASES (SARGWP100)       Fluorinated Gases (SAR): HFCs, PFCs, SF$_6$, NF$_3$
-FGASES (AR4GWP100)       Fluorinated Gases (AR4): HFCs, PFCs, SF$_6$, NF$_3$
-KYOTOGHG (SARGWP100)     Kyoto greenhouse gases (SAR)
-KYOTOGHGAR4 (AR4GWP100)  Kyoto greenhouse gases (AR4)
+```bash
+# To add plotting dependencies
+pip install unfccc-ghg-data[plots]
 
-Table: Gas categories and underlying global warming potentials
+# If you are installing with conda, we recommend
+# installing the extras by hand because there is no stable
+# solution yet (issue here: https://github.com/conda/conda/issues/7502)
+```
 
+<!--- sec-end-installation -->
 
-#### "unit"
-Units are of the form *Gg/Mt/... \<substance\> / yr* where substance is the entity or for CO$_2$ equivalent units *Gg/Mt/... CO2 / yr*. The CO$_2$-equivalent is calculated according to the global warming potential indicated by the entity (see above).
+### For developers
 
+<!--- sec-begin-installation-dev -->
 
-#### "category (\<term\>)"
-Categories for emission as defined in terminology \<term\>. Terminology names are those used in the [climate_categories](https://github.com/pik-primap/climate_categories) package. If the terminology name contains *\_PRIMAP* is means that some (sub)categories have been added to the official IPCC category hierarchy. Added categories outside the hierarchy begin with the prefix *M*.
+For development, we rely on [poetry](https://python-poetry.org) for all our
+dependency management. To get started, you will need to make sure that poetry
+is installed
+([instructions here](https://python-poetry.org/docs/#installing-with-the-official-installer),
+we found that pipx and pip worked better to install on a Mac).
 
-#### "CategoryName"
-Original name of the category as presented in the submission.
+For all of work, we use our `Makefile`.
+You can read the instructions out and run the commands by hand if you wish,
+but we generally discourage this because it can be error prone.
+In order to create your environment, run `make virtual-environment`.
 
-#### "CategoryNameTranslation"
-Optional column. In some cases original category names have been translated to english. In this case these translations are stored in this column.
+If there are any issues, the messages from the `Makefile` should guide you
+through. If not, please raise an issue in the
+[issue tracker](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/issues).
+
+For the rest of our developer docs, please see [](development-reference).
+
+<!--- sec-end-installation-dev -->
+
+
+TODO: old README below. reorganize into proper docs.
+
+The code for downloading submissions is based on [national-inventory-submissions](https://github.com/openclimatedata/national-inventory-submisions)
+
+
+**The repository is currently under initial development so a lot of things are still subject to change.**
+
+## Description
 
-#### Remaining columns
 
-Years (depending on dataset)
 
 
 
@@ -108,9 +124,9 @@ The code has not been tested under Windows and Mac OS.
 ### Update BUR, NC, and NDC submissions
 The maintainers of this repository will update the list of submissions and the downloaded pdf files frequently. However, in some cases you might want to have the data early and do the download yourself. To avoid merge conflicts, please do this on a clean branch in your fork and make sure your branch is in sync with `main`.
 
-* **BUR**: To update the list of submissions run `make update-bur` in the main project folder. This will create a new list of submissions. To actually download the files run `make download-bur`.
-* **NC**: To update the list of submissions run `make update-nc` in the main project folder. This will create a new list of submissions. To actually download the files run `make download-nc`.
-* **NDC**: For the NDC submissions we use the list published in [openclimatedata/ndcs](https://github.com/openclimatedata/ndcs) which receives daily updates. To  download the files run `make download-ndc`.
+* **BUR**: To update the list of submissions run `poetry run doit update_bur` in the main project folder. This will create a new list of submissions. To actually download the files run `poetry run doit  download_bur`.
+* **NC**: To update the list of submissions run `poetry run doit update_nc` in the main project folder. This will create a new list of submissions. To actually download the files run `poetry run doit download_nc`.
+* **NDC**: For the NDC submissions we use the list published in [openclimatedata/ndcs](https://github.com/openclimatedata/ndcs) which receives daily updates. To  download the files run `poetry run doit download_ndc` (currently not working due to a data format change).
 
 All download scripts create files listing the new downloads in the folder *downloaded_data/UNFCCC*. the filenames use the format *00\_new\_downloads\_\<type\>-YYYY-MM-DD.csv* where *\<type\>* is *bur*, *nc*, or *ndc*. Currently, only one file per type and day is stored, so if you run the download script more than once on a day you will overwrite your first file (likely with an empty file as you have already downloaded everything) (see also [issue #2](https://github.com/JGuetschow/UNFCCC_non-AnnexI_data/issues/2)).
 
@@ -152,3 +168,81 @@ Activity data needed depends on use case. We have listed some use cases and thei
 
 * **PRIMAP-hist**: currently only emissions data is needed. In the future activity data and socioeconomic data might be needed as well. For sectors and gases we refer to the data description available on [zenodo](https://zenodo.org/record/5494497):
 * **FAOSTAT**: FOSTAT uses only data for the AFOLU sector (AFOLU = Agriculture, Forestry, and Other Land Use). However activity data is needed in addition to emissions data. The used sectors and variables are listed in the [FAO to UNFCCC sector mapping document](https://fenixservices.fao.org/faostat/static/documents/GT/Mapping_to_UNFCCC_IPCC.pdf)
+
+
+<!--- sec-begin-datalad -->
+
+## DataLad datasets and how to use them
+
+This repository is a [DataLad](https://www.datalad.org/) dataset
+(id: 4d062170-604c-4efd-afbf-5ce7f97e0e63). It provides
+fine-grained data access down to the level of individual files, and allows for
+tracking future updates. In order to use this repository for data retrieval,
+[DataLad](https://www.datalad.org/) is required. It is a free and open source
+command line tool, available for all major operating systems, and builds up on
+Git and [git-annex](https://git-annex.branchable.com/) to allow sharing,
+synchronizing, and version controlling collections of large files.
+
+More information on how to install DataLad and
+[how to install](http://handbook.datalad.org/en/latest/intro/installation.html)
+it can be found in the
+[DataLad Handbook](https://handbook.datalad.org/en/latest/index.html).
+
+### Get the dataset
+
+A DataLad dataset can be `cloned` by running
+
+```
+datalad clone <url>
+```
+
+Once a dataset is cloned, it is a light-weight directory on your local machine.
+At this point, it contains only small metadata and information on the identity
+of the files in the dataset, but not actual *content* of the (sometimes large)
+data files.
+
+### Retrieve dataset content
+
+After cloning a dataset, you can retrieve file contents by running
+
+```
+datalad get <path/to/directory/or/file>
+```
+
+This command will trigger a download of the files, directories, or subdatasets
+you have specified.
+
+DataLad datasets can contain other datasets, so called *subdatasets*.  If you
+clone the top-level dataset, subdatasets do not yet contain metadata and
+information on the identity of files, but appear to be empty directories. In
+order to retrieve file availability metadata in subdatasets, run
+
+```
+datalad get -n <path/to/subdataset>
+```
+
+Afterwards, you can browse the retrieved metadata to find out about subdataset
+contents, and retrieve individual files with `datalad get`.  If you use
+`datalad get <path/to/subdataset>`, all contents of the subdataset will be
+downloaded at once.
+
+### Stay up-to-date
+
+DataLad datasets can be updated. The command `datalad update` will *fetch*
+updates and store them on a different branch (by default
+`remotes/origin/master`). Running
+
+```
+datalad update --merge
+```
+
+will *pull* available updates and integrate them in one go.
+
+### Find out what has been done
+
+DataLad datasets contain their history in the ``git log``.  By running ``git
+log`` (or a tool that displays Git history) in the dataset or on specific
+files, you can find out what has been done to the dataset or to individual
+files by whom, and when.
+
+<!--- sec-end-datalad -->

+ 0 - 92
UNFCCC_GHG_data/UNFCCC_CRF_reader/CRF_raw_for_year.py

@@ -1,92 +0,0 @@
-"""
-This script collects all latest CRF submissions for a given year
-
-Currently it only checks the extracted_data folder and not if new
-submission are available in the downloaded data folder.
-"""
-
-# TODO: sort importing and move to datasets folder
-# TODO: integrate into doit
-
-import argparse
-import primap2 as pm2
-from pathlib import Path
-from datetime import date
-from UNFCCC_GHG_data.helper import dataset_path_UNFCCC
-
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.util import all_crf_countries
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.UNFCCC_CRF_reader_prod import get_input_and_output_files_for_country
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.UNFCCC_CRF_reader_prod import submission_has_been_read
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--submission_year', help='Submission round to read', type=int)
-args = parser.parse_args()
-submission_year = args.submission_year
-
-ds_all_CRF = None
-outdated_countries = []
-included_countries = []
-
-for country in all_crf_countries:
-    # determine folder
-    try:
-        country_info = get_input_and_output_files_for_country(
-            country, submission_year=submission_year, verbose=False)
-
-        # check if the latest submission has been read already
-
-        data_read = submission_has_been_read(
-            country_info["code"], country_info["name"],
-            submission_year=submission_year,
-            submission_date=country_info["date"],
-            verbose=False,
-        )
-        if not data_read:
-            print(f"Latest submission for {country} has not been read yet.")
-            # TODO: make sure an older submission is read if present. currently none is included at all
-            outdated_countries.append(country)
-
-        # read the native format file
-        #print(country_info["output"])
-        input_files = [file for file in country_info["output"] if Path(file).suffix == ".nc"]
-
-        ds_country = pm2.open_dataset(input_files[0])
-
-        # combine per table DS
-        if ds_all_CRF is None:
-            ds_all_CRF = ds_country
-        else:
-            ds_all_CRF = ds_all_CRF.combine_first(ds_country)
-
-        included_countries.append(country)
-
-    except Exception as ex:
-        print(f"Exception {ex} occurred for {country}")
-
-
-# Update metadata
-# not necessary
-
-# write to disc
-today = date.today()
-
-compression = dict(zlib=True, complevel=9)
-output_folder = dataset_path_UNFCCC / f"CRF{submission_year}"
-output_filename = f"CRF{submission_year}_raw_{today.strftime('%Y-%m-%d')}"
-
-if not output_folder.exists():
-    output_folder.mkdir()
-
-# write data in interchange format
-pm2.pm2io.write_interchange_format(output_folder / output_filename,
-                                   ds_all_CRF.pr.to_interchange_format())
-
-# write data in native PRIMAP2 format
-encoding = {var: compression for var in ds_all_CRF.data_vars}
-ds_all_CRF.pr.to_netcdf(output_folder / (output_filename + ".nc"),
-                      encoding=encoding)
-
-# show info
-print(f"The following countries are included in the dataset: {included_countries}")
-print(f"The following countries have updated submission not yet read "
-      f"and not included in the dataset: {outdated_countries}")

+ 0 - 554
UNFCCC_GHG_data/UNFCCC_CRF_reader/UNFCCC_CRF_reader_prod.py

@@ -1,554 +0,0 @@
-import xarray as xr
-import primap2 as pm2
-import datalad.api
-from datetime import date
-from typing import Optional, List, Dict, Union
-
-from . import crf_specifications as crf
-
-from .UNFCCC_CRF_reader_core import read_crf_table
-from .UNFCCC_CRF_reader_core import convert_crf_table_to_pm2if
-from .UNFCCC_CRF_reader_core import get_latest_date_for_country
-from .UNFCCC_CRF_reader_core import get_crf_files
-from .UNFCCC_CRF_reader_devel import save_unknown_categories_info
-from .UNFCCC_CRF_reader_devel import save_last_row_info
-
-from UNFCCC_GHG_data.helper import code_path, log_path, root_path
-from UNFCCC_GHG_data.helper import custom_country_mapping, extracted_data_path_UNFCCC
-from UNFCCC_GHG_data.helper import get_country_code, get_country_name
-from .util import all_crf_countries, NoCRFFilesError
-
-#import sys
-#sys.path.append(code_path.name)
-
-# functions:
-# * testing fucntions
-# ** read one or more table(s) for all countries
-#    (and a if desired only a single year) and write
-#    output files with missing sectors etc
-# **
-
-# TODO: add function to read several / all countries
-
-# general approach:
-# main UNFCCC_GHG_data in a function that reads on table from one file.
-# return raw pandas DF for use in different functions
-# wrappers around this function to read for a whole country or for test reading where we also
-# write files with missing sectors etc.
-# merging functions use native pm2 format
-
-
-def read_crf_for_country(
-        country_code: str,
-        submission_year: int,
-        submission_date: Optional[str]=None,
-        re_read: Optional[bool]=True,
-) -> xr.Dataset:
-    """
-    Read CRF data for given submission year and country. All tables
-    available in the specification will be read for all years. Result
-    will be written to appropriate country folder.
-
-    Folders are determined from the submission_year and country_code variables.
-    The output is a primap2 dataset (xarray based).
-
-    If you want to read data for more countries or from a different folder
-    use the read_latest_crf_submissions_for_year or test_read_crf_data function.
-
-    IMPORTANT NOTE:
-    Currently there is no consistency check between data for the same category
-    read from different tables
-
-    We only save the data in the country folder if there were no messages like
-    unknown rows to make sure that data that goes into the repository is complete.
-    The result dataframe is returned in any case. In case log messages appeared
-    they are saved in the folder 'log' under the file name
-    'country_reading_<country_code>_<date>_X.csv'.
-
-
-    Parameters
-    __________
-
-    country_codes: str
-        ISO 3-letter country UNFCCC_GHG_data
-
-    submission_year: int
-        Year of the submission of the data
-
-    submission_data: Optional(str)
-        Read for a specific submission date (given as string as in the file names)
-        If not specified latest data will be read
-
-    re_read: Optional(bool) default: True
-        Read the data also if it's already present
-
-    Returns
-    _______
-        return value is a Pandas DataFrame with the read data in PRIMAP2 format
-    """
-
-    # get country name
-    country_name = get_country_name(country_code)
-
-
-    # get specification
-    # if we only have a single country check if we might have a country specific
-    # specification (currently only Australia, 2023)
-    try:
-        crf_spec = getattr(crf, f"CRF{submission_year}_{country_code}")
-        print(f"Using country specific specification: "
-              f"CRF{submission_year}_{country_code}")
-    except:
-        # no country specific specification, check for general specification
-        try:
-            crf_spec = getattr(crf, f"CRF{submission_year}")
-        except:
-            raise ValueError(
-                f"No terminology exists for submission year " f"{submission_year}"
-            )
-
-
-    tables = [table for table in crf_spec.keys()
-              if crf_spec[table]["status"] == "tested"]
-    print(f"The following tables are available in the " \
-          f"CRF{submission_year} specification: {tables}")
-
-    if submission_date is None:
-        submission_date = get_latest_date_for_country(country_code, submission_year)
-
-    # check if data has been read already
-    read_data = not submission_has_been_read(
-        country_code, country_name, submission_year=submission_year,
-        submission_date=submission_date, verbose=True,
-    )
-
-    ds_all = None
-    if read_data or re_read:
-        unknown_categories = []
-        last_row_info = []
-        for table in tables:
-            # read table for all years
-            ds_table, new_unknown_categories, new_last_row_info = read_crf_table(
-                country_code, table, submission_year, date=submission_date)#, data_year=[1990])
-
-            # collect messages on unknown rows etc
-            unknown_categories = unknown_categories + new_unknown_categories
-            last_row_info = last_row_info + new_last_row_info
-
-            # convert to PRIMAP2 IF
-            # first drop the orig_cat_name col as it can have multiple values for
-            # one category
-            ds_table = ds_table.drop(columns=["orig_cat_name"])
-
-            # if we need to map entities pass this info to the conversion function
-            if "entity_mapping" in crf_spec[table]:
-                entity_mapping = crf_spec[table]["entity_mapping"]
-            else:
-                entity_mapping = None
-            ds_table_if = convert_crf_table_to_pm2if(
-                ds_table,
-                submission_year,
-                meta_data_input={"title": f"Data submitted in {submission_year} to the UNFCCC "
-                                          f"in the common reporting format (CRF) by {country_name}. "
-                                          f"Submission date: {submission_date}"},
-                entity_mapping=entity_mapping,
-            )
-
-            # now convert to native PRIMAP2 format
-            ds_table_pm2 = pm2.pm2io.from_interchange_format(ds_table_if)
-
-            # if individual data for emissions and removals / recovery exist combine
-            # them
-            if (('CO2 removals' in ds_table_pm2.data_vars) and
-                    ('CO2 emissions' in ds_table_pm2.data_vars) and not
-                    ('CO2' in ds_table_pm2.data_vars)):
-                # we can just sum to CO2 as we made sure that it doesn't exist.
-                # If we have CO2 and removals but not emissions, CO2 already has
-                # removals subtracted and we do nothing here
-                ds_table_pm2["CO2"] = ds_table_pm2[["CO2 emissions",
-                                                "CO2 removals"]].pr.sum(
-                    dim="entity", skipna=True, min_count=1
-                )
-                ds_table_pm2["CO2"].attrs["entity"] = "CO2"
-
-            if (('CH4 removals' in ds_table_pm2.data_vars) and
-                    ('CH4 emissions' in ds_table_pm2.data_vars) and not
-                    ('CH4' in ds_table_pm2.data_vars)):
-                # we can just sum to CH4 as we made sure that it doesn't exist.
-                # If we have CH4 and removals but not emissions, CH4 already has
-                # removals subtracted and we do nothing here 
-                ds_table_pm2["CH4"] = ds_table_pm2[["CH4 emissions",
-                                                "CH4 removals"]].pr.sum(
-                    dim="entity", skipna=True, min_count=1
-                )
-                ds_table_pm2["CH4"].attrs["entity"] = "CH4"
-
-            # combine per table DS
-            if ds_all is None:
-                ds_all = ds_table_pm2
-            else:
-                ds_all = ds_all.combine_first(ds_table_pm2)
-
-        # check if there were log messages.
-        save_data = True
-        if len(unknown_categories) > 0:
-            save_data = False
-            today = date.today()
-            log_location = log_path / f"CRF{submission_year}" \
-                           / f"{country_code}_unknown_categories_{today.strftime('%Y-%m-%d')}.csv"
-            print(f"Unknown rows found for {country_code}. Not saving data. Savin log to "
-                  f"{log_location}" )
-            save_unknown_categories_info(unknown_categories, log_location)
-
-        if len(last_row_info) > 0:
-            save_data = False
-            today = date.today()
-            log_location = log_path / f"CRF{submission_year}" \
-                           / f"{country_code}_last_row_info_{today.strftime('%Y-%m-%d')}.csv"
-            print(f"Data found in the last row found for {country_code}. Not saving data. Savin log to "
-                  f"{log_location}")
-            save_last_row_info(last_row_info, log_location)
-
-        if save_data:
-            compression = dict(zlib=True, complevel=9)
-            output_folder = extracted_data_path_UNFCCC / country_name.replace(" ", "_")
-            output_filename = f"{country_code}_CRF{submission_year}_{submission_date}"
-
-            if not output_folder.exists():
-                output_folder.mkdir()
-
-            # write data in interchange format
-            pm2.pm2io.write_interchange_format(output_folder / output_filename,
-                                               ds_all.pr.to_interchange_format())
-
-            # write data in native PRIMAP2 format
-            encoding = {var: compression for var in ds_all.data_vars}
-            ds_all.pr.to_netcdf(output_folder / (output_filename + ".nc"),
-                                  encoding=encoding)
-
-    return ds_all
-
-
-def read_crf_for_country_datalad(
-        country: str,
-        submission_year: int,
-        submission_date: Optional[str]=None,
-        re_read: Optional[bool]=True
-) -> None:
-    """
-    Wrapper around read_crf_for_country which takes care of selecting input
-    and output files and using datalad run to trigger the data reading
-
-    Parameters
-    __________
-
-    country_codes: str
-        ISO 3-letter country UNFCCC_GHG_data
-
-    submission_year: int
-        Year of the submission of the data
-
-    submission_date: Optional(str)
-        Read for a specific submission date (given as string as in the file names)
-        If not specified latest data will be read
-
-    """
-
-    # get all the info for the country
-    country_info = get_input_and_output_files_for_country(
-        country, submission_year=submission_year, verbose=True)
-
-    print(f"Attempting to read data for CRF{submission_year} from {country}.")
-    print("#"*80)
-    print("")
-    print(f"Using the UNFCCC_CRF_reader")
-    print("")
-    print(f"Run the script using datalad run via the python api")
-    script = code_path / "UNFCCC_CRF_reader" / "read_UNFCCC_CRF_submission.py"
-
-    cmd = f"./venv/bin/python3 {script.as_posix()} --country={country} "\
-          f"--submission_year={submission_year} --submission_date={submission_date}"
-    if re_read:
-        cmd = cmd + f" --re_read"
-    datalad.api.run(
-        cmd=cmd,
-        dataset=root_path,
-        message=f"Read data for {country}, CRF{submission_year}, {submission_date}.",
-        inputs=country_info["input"],
-        outputs=country_info["output"],
-        dry_run=None,
-        explicit=True,
-    )
-
-
-def read_new_crf_for_year(
-        submission_year: int,
-        countries: Optional[List[str]]=None,
-        re_read: Optional[bool]=False,
-) -> dict:
-    """
-    Read CRF data for given submission year for all countries in
-    `countries` that have submitted data. If no `countries` list is
-    given, all countries are used.
-    When updated submission exist the latest will be read.
-    All tables available in the specification will be read for all years.
-    Results will be written to appropriate country folders.
-
-    If you want to read data from a different folder use the
-    test_read_crf_data function.
-
-    IMPORTANT NOTE:
-    Currently there is no consistency check between data for the same category
-    read from different tables
-
-    Parameters
-    __________
-
-    submission_year: int
-        Year of the submission of the data
-
-    countries: List[int] (optional)
-        List of countries to read. If not given reading is tried for all
-        CRF countries
-
-    re_read: bool (optional, default=False)
-        If true data will be read even if already read before.
-
-    TODO: write log with failed countries and what has been read
-
-    Returns
-    _______
-        list[str]: list with country codes for which the data has been read
-
-    """
-
-    if countries is None:
-        countries = all_crf_countries
-
-    read_countries = {}
-    for country in countries:
-        try:
-            country_df = read_crf_for_country(country, submission_year, re_read=re_read)
-            if country_df is None:
-                read_countries[country] = "skipped"
-            else:
-                read_countries[country] = "read"
-        except NoCRFFilesError:
-            print(f"No data for country {country}, {submission_year}")
-            read_countries[country] = "no data"
-        except Exception as ex:
-            print(f"Data for country {country}, {submission_year} could not be read")
-            print(f"The following error occurred: {ex}")
-            read_countries[country]= "failed"
-
-    # print overview
-    successful_countries = [country for country in read_countries if read_countries[country] == "read"]
-    skipped_countries = [country for country in read_countries if read_countries[country] == "skipped"]
-    failed_countries = [country for country in read_countries if read_countries[country] == "failed"]
-    no_data_countries = [country for country in read_countries if read_countries[country] == "no data"]
-
-    print(f"Read data for countries {successful_countries}")
-    print(f"Skipped countries {skipped_countries}")
-    print(f"No data for countries {no_data_countries}")
-    print(f"!!!!! Reading failed for {failed_countries}. Check why")
-    return(read_countries)
-
-
-def read_new_crf_for_year_datalad(
-        submission_year: int,
-        countries: Optional[List[str]] = None,
-        re_read: Optional[bool] = False,
-) -> None:
-    """
-    Wrapper around read_crf_for_year_datalad which takes care of selecting input
-    and output files and using datalad run to trigger the data reading
-
-    Parameters
-    __________
-
-    submission_year: int
-        Year of the submission of the data
-
-    countries: List[int] (optional)
-        List of countries to read. If not given reading is tried for all
-        CRF countries
-
-    re_read: bool (optional, default=False)
-        If true data will be read even if already read before.
-
-    """
-
-    if countries is not None:
-        print(f"Reading CRF{submission_year} for countries {countries} using UNFCCC_CRF_reader.")
-    else:
-        print(f"Reading CRF{submission_year} for all countries using UNFCCC_CRF_reader.")
-        countries = all_crf_countries
-    print("#" * 80)
-    print("")
-    if re_read:
-        print("Re-reading all latest submissions.")
-    else:
-        print("Only reading new submissions not read yet.")
-
-
-    input_files = []
-    output_files = []
-    # loop over countries to collect input and output files
-    print("Collect input and output files to pass to datalad")
-    for country in countries:
-        try:
-            country_info = get_input_and_output_files_for_country(
-                country, submission_year=submission_year, verbose=False)
-            # check if the submission has been read already
-            if re_read:
-                input_files = input_files + country_info["input"]
-                output_files = output_files + country_info["output"]
-            else:
-                data_read = submission_has_been_read(
-                    country_info["code"], country_info["name"],
-                    submission_year=submission_year,
-                    submission_date=country_info["date"],
-                    verbose=False,
-                )
-                if not data_read:
-                    input_files = input_files + country_info["input"]
-                    output_files = output_files + country_info["output"]
-        except:
-            # no error handling here as that is done in the function that does the actual reading
-            pass
-
-    print(f"Run the script using datalad run via the python api")
-    script = code_path / "UNFCCC_CRF_reader" / "read_new_UNFCCC_CRF_for_year.py"
-
-    #cmd = f"./venv/bin/python3 {script.as_posix()} --countries={countries} "\
-    #      f"--submission_year={submission_year}"
-    cmd = f"./venv/bin/python3 {script.as_posix()} " \
-          f"--submission_year={submission_year}"
-
-    if re_read:
-        cmd = cmd + " --re_read"
-    datalad.api.run(
-        cmd=cmd,
-        dataset=root_path,
-        message=f"Read data for {countries}, CRF{submission_year}. Re-reading: {re_read}",
-        inputs=input_files,
-        outputs=output_files,
-        dry_run=None,
-        #explicit=True,
-    )
-
-
-def get_input_and_output_files_for_country(
-        country: str,
-        submission_year: int,
-        submission_date: Optional[str]=None,
-        verbose: Optional[bool]=True,
-) -> Dict[str, Union[List, str]]:
-    """
-    Get input and output files for a given country
-    """
-
-    country_info = {}
-
-    if country in custom_country_mapping:
-        country_code = country
-    else:
-        country_code = get_country_code(country)
-    # now get the country name
-    country_name = get_country_name(country_code)
-    country_info["code"] = country_code
-    country_info["name"] = country_name
-
-    # determine latest data
-    print(f"Determining input and output files for {country}")
-    if submission_date is None:
-        if verbose:
-            print(f"No submission date given, find latest date.")
-        submission_date = get_latest_date_for_country(country_code, submission_year)
-    else:
-        if verbose:
-            print(f"Using given submissions date {submission_date}")
-
-    if submission_date is None:
-        # there is no data. Raise an exception
-        raise NoCRFFilesError(f"No submissions found for {country_code}, "
-                              f"submission_year={submission_year}, "
-                              f"date={date}")
-    else:
-        if verbose:
-            print(f"Latest submission date for CRF{submission_year} is {submission_date}")
-    country_info["date"] = submission_date
-
-    # get possible input files
-    input_files = get_crf_files(country_codes=country_code,
-                                submission_year=submission_year,
-                                date=submission_date)
-    if not input_files:
-        raise NoCRFFilesError(f"No possible input files found for {country}, CRF{submission_year}, "
-                              f"v{submission_date}. Are they already submitted and included in the "
-                              f"repository?")
-    elif verbose:
-        print(f"Found the following input_files:")
-        for file in input_files:
-            print(file.name)
-        print("")
-
-
-    # convert file's path to str
-    input_files = [file.as_posix() for file in input_files]
-    country_info["input"] = input_files
-
-    # get output file
-    output_folder = extracted_data_path_UNFCCC / country_name.replace(" ", "_")
-    output_files = [output_folder / f"{country_code}_CRF{submission_year}"
-                                    f"_{submission_date}.{suffix}" for suffix
-                    in ['yaml', 'csv', 'nc']]
-    if verbose:
-        print(f"The following files are considered as output_files:")
-        for file in output_files:
-            print(file)
-        print("")
-
-    # check if output data present
-
-    # convert file paths to str
-    output_files = [file.as_posix() for file in output_files]
-    country_info["output"] = output_files
-
-    return country_info
-
-
-def submission_has_been_read(
-        country_code: str,
-        country_name: str,
-        submission_year: int,
-        submission_date: str,
-        verbose: Optional[bool]=True,
-) -> bool:
-    """
-    Check if a CRF submission has already been read
-    """
-    output_folder = extracted_data_path_UNFCCC / country_name.replace(" ", "_")
-    output_filename = f"{country_code}_CRF{submission_year}_{submission_date}"
-    if output_folder.exists():
-        existing_files = output_folder.glob(f"{output_filename}.*")
-        existing_suffixes = [file.suffix for file in existing_files]
-        if all(suffix in existing_suffixes for suffix in [".nc", ".yaml", ".csv"]):
-            has_been_read = True
-            if verbose:
-                print(f"Data already available for {country_code}, "
-                      f"CRF{submission_year}, version {submission_date}.")
-        else:
-            has_been_read = False
-            if verbose:
-                print(f"Partial data available for {country_code}, "
-                      f"CRF{submission_year}, version {submission_date}. "
-                      "Please check if all files have been written after "
-                      "reading.")
-    else:
-        has_been_read = False
-        if verbose:
-            print(f"No read data available for {country_code}, "
-                  f"CRF{submission_year}, version {submission_date}. ")
-
-    return has_been_read

+ 0 - 13
UNFCCC_GHG_data/UNFCCC_CRF_reader/__init__.py

@@ -1,13 +0,0 @@
-"""
-CRF reader module
-"""
-
-#from pathlib import Path
-from . import crf_specifications
-from .UNFCCC_CRF_reader_prod import read_crf_for_country, read_crf_for_country_datalad
-
-__all__ = ["crf_specifications",
-           "read_crf_for_country",
-           "read_crf_for_country_datalad",
-           ]
-

+ 0 - 2508
UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/CRF2021_specification.py

@@ -1,2508 +0,0 @@
-""" CRF2021 specification.
-Currently not all tables are included. Extend if you need all country
-specific items in categories 2, 3.H-G, 4
-
-tables included:
-* Energy
-    'Table1s1', Table1s2',
-    'Table1.A(a)s1', 'Table1.A(a)s2', 'Table1.A(a)s3', 'Table1.A(a)s4',
-    'Table1.B.1', 'Table1.B.2', 'Table1.C', 'Table1.D',
-* Industrial processes
-    'Table2(I)s1', 'Table2(I)s2',
-    'Table2(II)',
-* Agriculture
-    'Table3s1', 'Table3s2',
-    'Table3.C', 'Table3.D', 'Table3.E',
-* LULUCF
-    'Table4',
-* Waste
-    'Table5', 'Table5.A', 'Table5.B', 'Table5.C', 'Table5.D'
-
-missing tables are:
-* Energy
-    'Table1.D'
-* Industrial processes
-    'Table2(I).A-Hs1', 'Table2(I).A-Hs2',
-    'Table2(II)B-Hs1', 'Table2(II)B-Hs2',
-* Agriculture
-    'Table3.As1', 'Table3.As2' (no additional emissions data)
-    'Table3.F', 'Table3.G-I',
-* LULUCF
-    All tables except Table4
-* Waste
-    All tables read
-
-TODO:
-* Add missing tables
-* Add activity data
-
-"""
-
-import numpy as np
-from .util import unit_info
-
-CRF2021 = {
-    "Table1s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 26,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Total Energy', ['1']],
-            ['A. Fuel combustion activities (sectoral approach)', ['1.A']],
-            ['1. Energy industries', ['1.A.1']],
-            ['a. Public electricity and heat production', ['1.A.1.a']],
-            ['b. Petroleum refining', ['1.A.1.b']],
-            ['c. Manufacture of solid fuels and other energy industries', ['1.A.1.c']],
-            ['2. Manufacturing industries and construction', ['1.A.2']],
-            ['a. Iron and steel', ['1.A.2.a']],
-            ['b. Non-ferrous metals', ['1.A.2.b']],
-            ['c. Chemicals', ['1.A.2.c']],
-            ['d. Pulp, paper and print', ['1.A.2.d']],
-            ['e. Food processing, beverages and tobacco', ['1.A.2.e']],
-            ['f. Non-metallic minerals', ['1.A.2.f']],
-            ['g. Other (please specify)', ['1.A.2.g']],
-            ['3. Transport', ['1.A.3']],
-            ['a. Domestic aviation', ['1.A.3.a']],
-            ['b. Road transportation', ['1.A.3.b']],
-            ['c. Railways', ['1.A.3.c']],
-            ['d. Domestic navigation', ['1.A.3.d']],
-            ['e. Other transportation', ['1.A.3.e']],
-        ],
-        "entity_mapping": {
-            "NOX": "NOx",
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 36,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['4. Other sectors', ['1.A.4']],
-            ['a. Commercial/institutional', ['1.A.4.a']],
-            ['b. Residential', ['1.A.4.b']],
-            ['c. Agriculture/forestry/fishing', ['1.A.4.c']],
-            ['5. Other (as specified in table 1.A(a) sheet 4)', ['1.A.5']],
-            ['a. Stationary', ['1.A.5.a']],
-            ['b. Mobile', ['1.A.5.b']],
-            ['B. Fugitive emissions from fuels', ['1.B']],
-            ['1. Solid fuels', ['1.B.1']],
-            ['a. Coal mining and handling', ['1.B.1.a']],
-            ['b. Solid fuel transformation', ['1.B.1.b']],
-            ['c. Other (as specified in table 1.B.1)', ['1.B.1.c']],
-            ['2. Oil and natural gas and other emissions from energy production', ['1.B.2']],
-            ['a. Oil', ['1.B.2.a']],
-            ['b. Natural gas', ['1.B.2.b']],
-            ['c. Venting and flaring', ['1.B.2.c']],
-            ['d. Other (as specified in table 1.B.2)', ['1.B.2.d']],
-            ['C. CO2 Transport and storage', ['1.C']],
-            ['1. Transport of CO2', ['1.C.1']],
-            ['2. Injection and storage', ['1.C.2']],
-            ['3. Other', ['1.C.3']],
-            ['Memo items: (1)', ['\IGNORE']],
-            ['International bunkers', ['M.Memo.Int']],
-            ['Aviation', ['M.Memo.Int.Avi']],
-            ['Navigation', ['M.Memo.Int.Mar']],
-            ['Multilateral operations', ['M.Memo.Mult']],
-            ['CO2 emissions from biomass', ['M.Memo.Bio']],
-            ['CO2 captured', ['M.Memo.CO2Cap']],
-            ['For domestic storage', ['M.Memo.CO2Cap.Dom']],
-            ['For storage in other countries', ['M.Memo.CO2Cap.Exp']],
-        ],
-        "entity_mapping": {
-            "NOX": "NOx",
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.A(a)s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 6,
-            "lastrow": 104,  # template, countries report less
-            # check the resulting data as the templates have nan rows
-            # which would stop the reading process (actual reported
-            # data does not seem to have the nan rows)
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured'
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A. Fuel combustion', ['1.A', 'Total'], 0],
-            ['Liquid fuels', ['1.A', 'Liquid'], 1],
-            ['Solid fuels', ['1.A', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A', 'Peat'], 1],
-            ['Biomass(6)', ['1.A', 'Biomass'], 1],
-            # 1.A.1. Energy industries
-            ['1.A.1. Energy industries', ['1.A.1', 'Total'], 1],
-            ['Liquid fuels', ['1.A.1', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.1', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.1', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.1', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.1', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.1', 'Biomass'], 2],
-            # a. Public electricity and heat production
-            ['a. Public electricity and heat production(7)', ['1.A.1.a', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.a', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.a', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.a', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.a', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.a', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.a', 'Biomass'], 3],
-            # 1.A.1.a.i Electricity Generation
-            ['1.A.1.a.i Electricity Generation', ['1.A.1.a.i', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.i', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.i', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.i', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.i', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.i', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.i', 'Biomass'], 4],
-            # 1.A.1.a.ii Combined heat and power generation
-            ['1.A.1.a.ii Combined heat and power generation', ['1.A.1.a.ii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.ii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.ii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.ii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.ii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.ii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.ii', 'Biomass'], 4],
-            # 1.A.1.a.iii heat plants
-            ['1.A.1.a.iii Heat plants', ['1.A.1.a.iii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.iii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.iii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.iii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.iii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.iii', 'Biomass'], 4],
-            # 1.A.1.a.iv Other (please specify)
-            ['1.A.1.a.iv Other (please specify)', ['1.A.1.a.iv', 'Total'], 3],
-            # AUT
-            ['Total Public Electricity and Heat Production', ['1.A.1.a.iv.4', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.4', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.4', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.4', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.4', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.4', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.4', 'Biomass'], 5],
-            # DEU
-            ['1.A.1.a Public Electricity and Heat Production', ['1.A.1.a.iv.4', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.4', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.4', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.4', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.4', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.4', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.4', 'Biomass'], 5],
-            # ESP
-            ['Other', ['1.A.1.a.iv.3', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.3', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.3', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.3', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.3', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.3', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.3', 'Biomass'], 5],
-            # SVK
-            ['Methane Cogeneration (Mining)', ['1.A.1.a.iv.1', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.1', 'OtherFF'], 5],
-            ['Municipal Solid Waste Incineration (Energy use)', ['1.A.1.a.iv.2', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.2', 'OtherFF'], 5],
-            ['Biomass', ['1.A.1.a.iv.2', 'Biomass'], 5],
-            # CHE
-            ['Municipal and special waste incineration plants', ['1.A.1.a.iv.2', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.2', 'OtherFF'], 5],
-            ['Biomass', ['1.A.1.a.iv.2', 'Biomass'], 5],
-            # b. Petroleum refining
-            ['b. Petroleum refining', ['1.A.1.b', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.b', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.b', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.b', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.b', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.b', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.b', 'Biomass'], 3],
-            # c. Manufacture of solid fuels and other energy industries
-            ['c. Manufacture of solid fuels and other energy industries(8)', ['1.A.1.c', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.c', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.c', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.c', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.c', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.c', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.c', 'Biomass'], 3],
-            # 1.A.1.c.i Manufacture of solid fuels
-            ['1.A.1.c.i Manufacture of solid fuels', ['1.A.1.c.i', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.i', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.i', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.i', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.i', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.i', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.i', 'Biomass'], 4],
-            # 1.A.1.c.ii Oil and gas extraction
-            ['1.A.1.c.ii Oil and gas extraction', ['1.A.1.c.ii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.ii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.ii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.ii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.ii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.ii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.ii', 'Biomass'], 4],
-            # 1.A.1.c.iii Other energy industries
-            ['1.A.1.c.iii Other energy industries', ['1.A.1.c.iii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.iii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.iii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.iii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.iii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.iii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.iii', 'Biomass'], 4],
-            # 1.A.1.c.iv Other (please specify)
-            ['1.A.1.c.iv Other (please specify)', ['1.A.1.c.iv', 'Total'], 3],
-            # DEU
-            ['1.A.1.c Manufacture of Solid Fuels and Other Energy Industries', ['1.A.1.c.iv.2', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.2', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.2', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.2', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.2', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.2', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.2', 'Biomass'], 5],
-            # ESP
-            ['Other', ['1.A.1.c.iv.3', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.3', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.3', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.3', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.3', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.3', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.3', 'Biomass'], 5],
-            # CYP
-            ['Charcoal Production', ['1.A.1.c.iv.1', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.1', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.1', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.1', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.1', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.1', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.1', 'Biomass'], 5],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 114,  # template, countries report less
-            # check the resulting data as the templates have nan rows
-            # which would stop the reading process (actual reported
-            # data does not seem to have the nan rows)
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.2 Manufacturing industries and construction', ['1.A.2', 'Total'], 0],
-            ['Liquid fuels', ['1.A.2', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.2', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.2', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.2', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A.2', 'Peat'], 1],
-            ['Biomass(6)', ['1.A.2', 'Biomass'], 1],
-            # a. Iron and Steel
-            ['a. Iron and steel', ['1.A.2.a', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.a', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.a', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.a', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.a', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.a', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.a', 'Biomass'], 2],
-            # b. non-ferrous metals
-            ['b. Non-ferrous metals', ['1.A.2.b', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.b', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.b', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.b', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.b', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.b', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.b', 'Biomass'], 2],
-            # c. Chemicals
-            ['c. Chemicals', ['1.A.2.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.c', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.c', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.c', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.c', 'Biomass'], 2],
-            # d. Pulp paper print
-            ['d. Pulp, paper and print', ['1.A.2.d', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.d', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.d', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.d', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.d', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.d', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.d', 'Biomass'], 2],
-            # e. Food processing, beverages and tobacco
-            ['e. Food processing, beverages and tobacco', ['1.A.2.e', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.e', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.e', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.e', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.e', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.e', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.e', 'Biomass'], 2],
-            # f. non-metallic minerals
-            ['f. Non-metallic minerals', ['1.A.2.f', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.f', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.f', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.f', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.f', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.f', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.f', 'Biomass'], 2],
-            # g. other
-            ['g. Other (please specify)(9)', ['1.A.2.g', 'Total'], 1],
-            #1.A.2.g.i Manufacturing of machinery
-            ['1.A.2.g.i Manufacturing of machinery', ['1.A.2.g.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.i', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.i', 'Biomass'], 3],
-            # 1.A.2.g.ii Manufacturing of transport equipment
-            ['1.A.2.g.ii Manufacturing of transport equipment', ['1.A.2.g.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.ii', 'Biomass'], 3],
-            # 1.A.2.g.iii Mining (excluding fuels) and quarrying
-            ['1.A.2.g.iii Mining (excluding fuels) and quarrying', ['1.A.2.g.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.iii', 'Biomass'], 3],
-            # 1.A.2.g.iv Wood and wood products
-            ['1.A.2.g.iv Wood and wood products', ['1.A.2.g.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.iv', 'Biomass'], 3],
-            # 1.A.2.g.v Construction
-            ['1.A.2.g.v Construction', ['1.A.2.g.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.v', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.v', 'Biomass'], 3],
-            # 1.A.2.g.vi Textile and leather
-            ['1.A.2.g.vi Textile and leather', ['1.A.2.g.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.vi', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.vi', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.vi', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.vi', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.vi', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.vi', 'Biomass'], 3],
-            # 1.A.2.g.vii Off-road vehicles and other machinery
-            ['1.A.2.g.vii Off-road vehicles and other machinery', ['1.A.2.g.vii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.vii', 'Liquid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.vii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.vii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.2.g.vii', 'Biomass'], 3],
-            # 1.A.2.g.viii Other (please specify)
-            ['1.A.2.g.viii Other (please specify)', ['1.A.2.g.viii', 'Total'], 2],
-            # DKE
-            ['Construction', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            ['Mining', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            # DNK, DKE, USA, CZE
-            ['Other non-specified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #SVK, CYP
-            ['Non-specified Industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #BEL
-            ['Other non specified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #PRT, LTU
-            ['Non-specified industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # MLT
-            ['Undefined Industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # TUR
-            ['Other unspecified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # DKE
-            ['Textile', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            # DNK, DNM, FIN, DKE
-            ['Other manufacturing industries', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # CAN
-            ['Other Manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # AUT, LUX
-            ['Other Manufacturing Industries', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # NOR
-            ['Other manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # AUS
-            ['All Other Manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # NLD
-            ['Other Industrial Sectors', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # GBR, GBK
-            ['Other industry (not specified above)', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # UKR
-            ['Oter Industries', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # RUS
-            ['Other industries', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # RUS
-            ['Non-CO2 emissions from BFG combustion', ['1.A.2.g.viii.5', 'Total'], 3],
-            ['Solid Fuels', ['1.A.2.g.viii.5', 'Solid'], 4],
-            # BLR, DNK, ESP, LVA, NZL, POL, ROU, SVN,
-            ['Other', ['1.A.2.g.viii.10', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.10', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.10', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.10', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.10', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.10', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.10', 'Biomass'], 4],
-            # BLR
-            ['Manufacture and construction Aggregated', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # HRV
-            ['Other Industry', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # HRV
-            ['1A2 Total for 1990 to 2000', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # MLT
-            ['All Industry', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # PRT
-            ['Rubber', ['1.A.2.g.viii.6', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.6', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.6', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.6', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.6', 'OtherFF'], 4],
-            ['Biomass', ['1.A.2.g.viii.6', 'Biomass'], 4],
-            # SWE
-            ['All stationary combustin within CRF 1.A.2.g', ['1.A.2.g.viii.7', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.7', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.7', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.7', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.7', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.7', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.7', 'Biomass'], 4],
-            # IRL
-            ['Other stationary combustion', ['1.A.2.g.viii.8', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.8', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.8', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.8', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.8', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.8', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.8', 'Biomass'], 4],
-            # HUN
-            ['Other Stationary Combustion', ['1.A.2.g.viii.8', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.8', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.8', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.8', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.8', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.8', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.8', 'Biomass'], 4],
-            # CHE
-            ['Other Boilers and Engines Industry', ['1.A.2.g.viii.9', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.9', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.9', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.9', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.9', 'OtherFF'], 4],
-            ['Biomass', ['1.A.2.g.viii.9', 'Biomass'], 4],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s3": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 115,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-            ],
-            "stop_cats": ["Note: All footnotes for this table are given at the end of the table on sheet 4.", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.3 Transport', ['1.A.3', 'Total'], 0],
-            ['Liquid fuels', ['1.A.3', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.3', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.3', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.3', 'OtherFF'], 1],
-            ['Biomass(6)', ['1.A.3', 'Biomass'], 1],
-            # a. Domestic Aviation
-            ['a. Domestic aviation(10)', ['1.A.3.a', 'Total'], 1],
-            ['Aviation gasoline', ['1.A.3.a', 'AvGasoline'], 2],
-            ['Jet kerosene', ['1.A.3.a', 'JetKerosene'], 2],
-            ['Biomass', ['1.A.3.a', 'Biomass'], 2],
-            # b. road Transportation
-            ['b. Road transportation(11)', ['1.A.3.b', 'Total'], 1],
-            ['Gasoline', ['1.A.3.b', 'Gasoline'], 2],
-            ['Diesel oil', ['1.A.3.b', 'DieselOil'], 2],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b', 'LPG'], 2],
-            ['Other liquid fuels (please specify)', ['1.A.3.b', 'OtherLiquid'], 2],
-            ['Gaseous fuels', ['1.A.3.b', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.b', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b', 'OtherFF'], 2],
-            # i. Cars
-            ['i. Cars', ['1.A.3.b.i', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.i', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.i', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.i', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.i', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.i', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.i', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant oil', ['1.A.3.b.i', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.i', 'OLBiodieselFC'], 4],  # CAN
-            ['Fossil part of biodiesel', ['1.A.3.b.i', 'OLBiodieselFC'], 4],  # LTU
-            ['Other', ['1.A.3.b.i', 'OLOther'], 4],  # UKR, MLT
-            ['Other Liquid Fuels', ['1.A.3.b.i', 'OLOther'], 4],  # CYP
-            ['Other motor fuels', ['1.A.3.b.i', 'OMotorFuels'], 4],  # RUS
-            ['Lubricants in 2-stroke engines', ['1.A.3.b.i', 'Lubricants'], 4],  # HUN
-            ['LNG', ['1.A.3.b.i', 'LNG'], 4],  ## USA
-            ['Gaseous fuels', ['1.A.3.b.i', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.i', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.i', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.i', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN, LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.i', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Natural Gas', ['1.A.3.b.i', 'OFFNaturalGas'], 4],  # USA
-            ['Fossil part of biofuel', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # IRL
-            ['Other', ['1.A.3.b.i', 'OFFOther'], 4],  # MLT
-            # ii. Light duty trucks
-            ['ii. Light duty trucks', ['1.A.3.b.ii', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.ii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.ii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.ii', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.ii', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.ii', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.ii', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant Oil', ['1.A.3.b.ii', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.ii', 'OLBiodieselFC'], 4],  # CAN
-            ['Other', ['1.A.3.b.ii', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.ii', 'OLOther'], 4],  # CYP
-            ['Other motor fuels', ['1.A.3.b.ii', 'OMotorFuels'], 4],  # RUS
-            ['LNG', ['1.A.3.b.ii', 'LNG'], 4],  ## USA
-            ['Gaseous fuels', ['1.A.3.b.ii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.ii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.ii', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.ii', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN, LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.ii', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biofuel', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # IRL
-            # iii. Heavy duty trucks and buses
-            ['iii. Heavy duty trucks and buses', ['1.A.3.b.iii', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.iii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.iii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.iii', 'LPG'], 3],
-            ['Other liquid fFuels (please specify)', ['1.A.3.b.iii', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.iii', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.iii', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant Oil', ['1.A.3.b.iii', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.iii', 'OLBiodieselFC'], 4],  # CAN
-            ['Other', ['1.A.3.b.iii', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.iii', 'OLOther'], 4],  # CYP
-            ['Other motor fuels', ['1.A.3.b.iii', 'OMotorFuels'], 4],  # RUS
-            ['LNG', ['1.A.3.b.iii', 'LNG'], 4],  ## USA
-            ['Gaseous fuels', ['1.A.3.b.iii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.iii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.iii', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.iii', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN. LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.iii', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biofuel', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # IRL
-            # iv. Motorcycles
-            ['iv. Motorcycles', ['1.A.3.b.iv', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.iv', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.iv', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.iv', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.iv', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.iv', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.iv', 'Lubricants'], 4],  # UKR, JPN, HRV
-            ['Lubricant Oil', ['1.A.3.b.iv', 'Lubricants'], 4],  # PRT
-            ['Other', ['1.A.3.b.iv', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.iv', 'OLOther'], 4],  # CYP
-            ['Lube', ['1.A.3.b.iv', 'Lubricants'], 4],  # MCO
-            ['Lubricants in 2-stroke engines', ['1.A.3.b.iv', 'Lubricants'], 4],  # HUN
-            ['Lubricants (two-stroke engines)', ['1.A.3.b.iv', 'Lubricants'], 4],  # ESP
-            ['lubricants', ['1.A.3.b.iv', 'Lubricants'], 4],  # SVN
-            ['Gaseous fuels', ['1.A.3.b.iv', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.iv', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.iv', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.iv', 'OFFOther'], 4],  # CYP
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # CZE
-            ['Fossil part of biodiesel', ['1.A.3.b.iv', 'OFFBiodieselFC'], 4],  # BEL
-            ['Fossil part of biogasoline', ['1.A.3.b.iv', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biodiese', ['1.A.3.b.iv', 'OFFBiodieselFC'], 4],  # LVA
-            ['Fossil part of biofuel', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # IRL
-            ['fossil part of biodiesel', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # HRV
-            # v. Other
-            ['v. Other (please specify)', ['1.A.3.b.v', 'Total'], 2],
-            # TUR
-            ['Road total', ['1.A.3.b.v.1', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.1', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.1', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.1', 'LPG'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.1', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.1', 'Biomass'], 4],
-            # CYP
-            ['Buses', ['1.A.3.b.v.2', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.2', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.2', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.2', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.2', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.2', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.2', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.2', 'OtherFF'], 4],
-            # GBK, GBR
-            ['All vehicles - biofuel use', ['1.A.3.b.v.3', 'Total'], 3],
-            ['Biomass', ['1.A.3.b.v.3', 'Biomass'], 4],
-            ['All vehicles - LPG use', ['1.A.3.b.v.4', 'Total'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.4', 'LPG'], 4],
-            ['All vehicles - biofuel use (fossil component)', ['1.A.3.b.v.5', 'Total'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.5', 'OtherFF'], 4],
-            # CAN
-            ['Propane and Natural Gas Vehicles', ['1.A.3.b.v.6', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.6', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.6', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.6', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.6', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.6', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.6', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.6', 'OtherFF'], 4],
-            # BEL
-            ['Lubricant Two-Stroke Engines', ['1.A.3.b.v.7', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            # ROU
-            ['Gaseous Fuels', ['1.A.3.b.v.8', 'Total'], 3],
-            ['Gaseous Fuels', ['1.A.3.b.v.8', 'Gaseous'], 4],
-            ['Other Liquid Fuels', ['1.A.3.b.v.9', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.9', 'OtherLiquid'], 4],
-            ['Other Kerosene', ['1.A.3.b.v.9', 'Kerosene'], 5],
-            ['Heating and Other Gasoil', ['1.A.3.b.v.9', 'HeatingGasoil'], 5],
-            ['Biomass', ['1.A.3.b.v.10', 'Total'], 3],
-            ['Biomass', ['1.A.3.b.v.10', 'Biomass'], 4],
-            # DEU
-            ['CO2 from lubricant co-incineration in 2-stroke road vehicles', ['1.A.3.b.v.7', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            ['lubricant used in 2-stroke mix', ['1.A.3.b.v.7', 'Lubricants'], 5],
-            # USA
-            ['Evaporative Emissions', ['1.A.3.b.v.11', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.11', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.11', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.11', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.11', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.11', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.11', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.11', 'OtherFF'], 4],
-            # SVK
-            ['Urea-based catalysts', ['1.A.3.b.v.12', 'Total'], 3],
-            ['Diesel Oil', ['1.A.3.b.v.12', 'DieselOil'], 4],
-            # ESP
-            ['Other non-specified', ['1.A.3.b.v.13', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.13', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.13', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.13', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.13', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.13', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.13', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.13', 'OtherFF'], 4],
-            # BGR
-            ['Urea', ['1.A.3.b.v.12', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.12', 'OtherLiquid'], 4],
-            ['Lubricants', ['1.A.3.b.v.7', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            # c. Railways
-            ['c. Railways', ['1.A.3.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.3.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.3.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.3.c', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.c', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)', ['1.A.3.c', 'OtherFF'], 2],
-            ['Biodiesel (fossil component)', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # LUX
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # SWE
-            # d. Domestic navigation
-            ['d. Domestic Navigation(10)', ['1.A.3.d', 'Total'], 1],
-            ['Residual fuel oil', ['1.A.3.d', 'ResFuelOil'], 2],
-            ['Gas/diesel oil', ['1.A.3.d', 'GasDieselOil'], 2],
-            ['Gasoline', ['1.A.3.d', 'Gasoline'], 2],
-            ['Other liquid fuels (please specify)', ['1.A.3.d', 'OtherLiquid'], 2],
-            ['Lubricants', ['1.A.3.d', 'Lubricants'], 3],  # UKR, JPN
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.d', 'OLBiodieselFC'], 3],  # CAN
-            ['Light Fuel Oil', ['1.A.3.d', 'LightFuelOil'], 3],  # CAN
-            ['Kerosene and stove oil', ['1.A.3.d', 'KeroseStoveOil'], 3],  # CAN
-            ['Kerosene', ['1.A.3.d', 'Kerosene'], 3],  # DKE, DNK
-            ['Natural Gas Liquids', ['1.A.3.d', 'NGL'], 3],  # DKE, DNK
-            ['Fossil part of biodiesel', ['1.A.3.d', 'OLBiodieselFC'], 3],  # LTU
-            ['Other non-specified', ['1.A.3.d', 'OLOther'], 3],  # SWE
-            ['Other motor fuels', ['1.A.3.d', 'OMotorFuels'], 3],  # RUS
-            ['Fuel oil A', ['1.A.3.d', 'FuelOilA'], 3],  # JPN
-            ['Fuel oil B', ['1.A.3.d', 'FuelOilB'], 3],  # JPN
-            ['Fuel oil C', ['1.A.3.d', 'FuelOilC'], 3],  # JPN
-            ['Diesel Oil', ['1.A.3.d', 'OLDiesel'], 3],  # FIN
-            ['Gaseous fuels', ['1.A.3.d', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.d', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.d', 'OtherFF'], 2],
-            ['Liquified natural gas', ['1.A.3.d', 'LNG'], 3],  # DKE, DNK, DNM
-            ['Biodiesel (fossil component)', ['1.A.3.d', 'OFFBiodieselFC'], 3],  # LUX
-            ['Coal', ['1.A.3.d', 'OFFCoal'], 3],  # NZL, NDL
-            ['fossil part of biodiesel', ['1.A.3.d', 'OFFBiodieselFC'], 3],  # AUT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.d', 'OFFBioGasDieselFC'], 3],  # SWE
-            ['Solid Fuels', ['1.A.3.d', 'OFFSolid'], 3],  # AUS
-            # e. other transportation
-            # keep details also for top category as it's present
-            ['e. Other transportation (please specify)', ['1.A.3.e', 'Total'], 1],
-            ['Liquid fuels', ['1.A.3.e', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.3.e', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.3.e', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.3.e', 'OtherFF'], 2],
-            ['Biomass(6)', ['1.A.3.e', 'Biomass'], 2],
-            # i. pipeline
-            ['i. Pipeline transport', ['1.A.3.e.i', 'Total'], 2],
-            ['Liquid fuels', ['1.A.3.e.i', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.3.e.i', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.3.e.i', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.3.e.i', 'OtherFF'], 3],
-            ['Biomass(6)', ['1.A.3.e.i', 'Biomass'], 3],
-            # ii other
-            ['ii. Other (please specify)', ['1.A.3.e.ii', 'Total'], 2],
-            ## temp
-            # ['Liquid fuels', ['1.A.3.e.ii', 'Liquid'], False],
-            # ['Solid fuels', ['1.A.3.e.ii', 'Solid'], False],
-            # ['Gaseous fuels', ['1.A.3.e.ii', 'Gaseous'], False],
-            # ['Other fossil fuels(4)', ['1.A.3.e.ii', 'OtherFF'], False],
-            # ['Biomass(6)', ['1.A.3.e.ii', 'Biomass'], False],
-            ## end temp
-            # UKR, SWE
-            ['Off-road vehicles and other machinery', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # GBR, GBK
-            ['Aircraft support vehicles', ['1.A.3.e.ii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.2', 'Liquid'], 4],
-            # CAN
-            ['Off Road', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # LTU
-            ['Off-road transport', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # BEL
-            ['Other non-specified', ['1.A.3.e.ii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.3', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.3', 'Biomass'], 4],
-            # AUS
-            ['Off-Road Vehicles', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # USA
-            ['Non-Transportation Mobile', ['1.A.3.e.ii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.4', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.4', 'Biomass'], 4],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s4": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 127,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.4 Other sectors', ['1.A.4', 'Total'], 0],
-            ['Liquid fuels', ['1.A.4', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.4', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.4', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.4', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A.4', 'Peat'], 1],
-            ['Biomass(6)', ['1.A.4', 'Biomass'], 1],
-            # a. Commercial/institutional(12)
-            ['a. Commercial/institutional(12)', ['1.A.4.a', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.a', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.a', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.a', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.a', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.a', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.a', 'Biomass'], 2],
-            # 1.A.4.a.i Stationary combustion
-            ['1.A.4.a.i Stationary combustion', ['1.A.4.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.4.a.i', 'Biomass'], 3],
-            # 1.A.4.a.ii Off-road vehicles and other machinery
-            ['1.A.4.a.ii Off-road vehicles and other machinery', ['1.A.4.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.4.a.ii', 'Biomass'], 3],
-            # 1.A.4.a.iii Other (please specify)
-            ['1.A.4.a.iii Other (please specify)', ['1.A.4.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.a.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.4.a.iii', 'Biomass'], 3],
-            # b. Residential(13)
-            ['b. Residential(13)', ['1.A.4.b', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.b', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.b', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.b', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.b', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.b', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.b', 'Biomass'], 2],
-            # 1.A.4.b.i Stationary combustion
-            ['1.A.4.b.i Stationary combustion', ['1.A.4.b.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.b.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.b.i', 'Peat'], 3],
-            ['Biomass', ['1.A.4.b.i', 'Biomass'], 3],
-            # 1.A.4.b.ii Off-road vehicles and other machinery
-            ['1.A.4.b.ii Off-road vehicles and other machinery', ['1.A.4.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.4.b.ii', 'Biomass'], 3],
-            # 1.A.4.b.iii Other (please specify)
-            ['1.A.4.b.iii Other (please specify)', ['1.A.4.b.iii', 'Total'], 2],
-            # CYP, USA
-            ['Residential', ['1.A.4.b.iii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.4.b.iii.1', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.iii.1', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.iii.1', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.iii.1', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.b.iii.1', 'Peat'], 3],
-            ['Biomass', ['1.A.4.b.iii.1', 'Biomass'], 3],
-            # c. Agriculture/forestry/fishing
-            ['c. Agriculture/forestry/fishing', ['1.A.4.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.c', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.c', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.c', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.c', 'Biomass'], 2],
-            # i. Stationary
-            ['i. Stationary', ['1.A.4.c.i', 'Total'], 2],
-            ['Liquid fuels', ['1.A.4.c.i', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.4.c.i', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.4.c.i', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.4.c.i', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.4.c.i', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.4.c.i', 'Biomass'], 3],
-            # ii. Off-road vehicles and other machinery
-            ['ii. Off-road vehicles and other machinery', ['1.A.4.c.ii', 'Total'], 2],
-            ['Gasoline', ['1.A.4.c.ii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.4.c.ii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.4.c.ii', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.4.c.ii', 'OtherLiquid'], 3],
-            ['Other Kerosene', ['1.A.4.c.ii', 'Kerosene'], 4],  # HRV
-            ['Lubricants', ['1.A.4.c.ii', 'Lubricants'], 4],  # HRV
-            ['Gasoil', ['1.A.4.c.ii', 'Gasoil'], 4],  # FIN
-            ['Marine gasoil', ['1.A.4.c.ii', 'MarineGasoil'], 4],  # NOR
-            ['heavy fuel oil', ['1.A.4.c.ii', 'HeavyFuelOil'], 4],  # NOR
-            ['Other motor fuels', ['1.A.4.c.ii', 'OMotorFuels'], 4],  # RUS
-            ['Biodiesel (5 percent fossil portion)', ['1.A.4.c.ii', 'OLBiodieselFC'], 4],  # CAN
-            ['Gaseous fuels', ['1.A.4.c.ii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.4.c.ii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.4.c.ii', 'OtherFF'], 3],
-            ['fossil part of biodiesel', ['1.A.4.c.ii', 'OFFBiodieselFC'], 4],
-            ['Fossil part of biodiesel and biogasoline', ['1.A.4.c.ii', 'OFFBiofuelFC'], 4],
-            ['Biodiesel (fossil component)', ['1.A.4.c.ii', 'OFFBiodieselFC'], 4], # LUX
-            ['Alkylate Gasoline', ['1.A.4.c.ii', 'OFFAlkylateGasoline'], 4], # LIE
-            # iii. Fishing
-            ['iii. Fishing', ['1.A.4.c.iii', 'Total'], 2],
-            ['Residual fuel oil', ['1.A.4.c.iii', 'ResFuelOil'], 3],
-            ['Gas/diesel oil', ['1.A.4.c.iii', 'GasDieselOil'], 3],
-            ['Gasoline', ['1.A.4.c.iii', 'Gasoline'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.4.c.iii', 'OtherLiquid'], 3],
-            ['Biodiesel (5 percent fossil portion)', ['1.A.4.c.iii', 'OLBiodieselFC'], 4],  # CAN
-            ['Gaseous fuels', ['1.A.4.c.iii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.4.c.iii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.4.c.iii', 'OtherFF'], 3],
-            ['Fossil part of biodiesel and biogasoline', ['1.A.4.c.iii', 'OFFBiofuelFC'], 3],
-            # 1.A.5 Other (Not specified elsewhere)(14)
-            ['1.A.5 Other (Not specified elsewhere)(14)', ['1.A.5', 'Total'], 0],
-            # a. Stationary (please specify)
-            ['a. Stationary (please specify)', ['1.A.5.a', 'Total'], 1],
-            # temp
-            ['Liquid Fuels', ['1.A.5.a', 'Liquid'], 2],
-            ['Solid Fuels', ['1.A.5.a', 'Solid'], 2],
-            ['Gaseous Fuels', ['1.A.5.a', 'Gaseous'], 2],
-            ['Other Fossil Fuels', ['1.A.5.a', 'OtherFF'], 2],
-            ['Peat', ['1.A.5.a', 'Peat'], 2],
-            ['Biomass', ['1.A.5.a', 'Biomass'], 2],
-            # temp
-            # GBK, GBR
-            ['Military fuel use', ['1.A.5.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.i', 'Biomass'], 3],
-            # TUR
-            ['Liquid fuels', ['1.A.5.a', 'Liquid'], 2],
-            # ESP, FIN, SWE
-            ['Other non-specified', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # ROU, SVK, RUS
-            ['Other', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # FRA, FRK
-            ['Other not specified', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # CYP
-            ['Other (not specified elsewhere)', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # NOR, HUN
-            ['Military', ['1.A.5.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.i', 'Biomass'], 3],
-            ['Non-fuel Use', ['1.A.5.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iii', 'Liquid'], 3],
-            # DNM, DKE, DNK
-            ['Other stationary combustion', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # LUX
-            ['Stationary', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # USA
-            ['Incineration of Waste', ['1.A.5.a.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.iv', 'Biomass'], 3],
-            ['U.S. Territories', ['1.A.5.a.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.v', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.v', 'Biomass'], 3],
-            ['Non Energy Use', ['1.A.5.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.iii', 'Biomass'], 3],
-            # b. Mobile (please specify)
-            ['b. Mobile (please specify)', ['1.A.5.b', 'Total'], 1],
-            # temp
-            ['Liquid Fuels', ['1.A.5.b', 'Liquid'], 2],
-            ['Solid Fuels', ['1.A.5.b', 'Solid'], 2],
-            ['Gaseous Fuels', ['1.A.5.b', 'Gaseous'], 2],
-            ['Other Fossil Fuels', ['1.A.5.b', 'OtherFF'], 2],
-            ['Biomass', ['1.A.5.b', 'Biomass'], 2],
-            # temp
-            # GBK, GBR
-            ['Military aviation and naval shipping', ['1.A.5.b.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.i', 'Liquid'], 3],
-            # HRV
-            ['Military aviation component', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.ii', 'Biomass'], 3],
-            ['Military water-borne component', ['1.A.5.b.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iii', 'Biomass'], 3],
-            # ESP, FIN
-            ['Other non-specified', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # NLD, DKE, DNM, DNK, SWE, UKR
-            ['Military use', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.v', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # AUT, NOR, USA, CHE, HUN, LTU
-            ['Military', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # PRT
-            ['Military Aviation', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            # ROU, MLT
-            ['Other', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # FRA, FRK
-            ['Other not specified', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # CYP
-            ['1A5b i Mobile (aviation component)', ['1.A.5.b.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vi', 'Liquid'], 3],
-            # GBK, GBR
-            ['Lubricants used in 2-stroke engines', ['1.A.5.b.vii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vii', 'Liquid'], 3],
-            # DNM, DKE, DNK
-            ['Recreational crafts', ['1.A.5.b.viii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.viii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.viii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.viii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.viii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.viii', 'Biomass'], 3],
-            # SVK
-            ['Military use Jet Kerosene', ['1.A.5.b.ix', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ix', 'Liquid'], 3],
-            ['Military Gasoline', ['1.A.5.b.x', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.x', 'Liquid'], 3],
-            ['Biomass', ['1.A.5.b.ix', 'Biomass'], 3],
-            ['Military Diesel Oil', ['1.A.5.b.xi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xi', 'Liquid'], 3],
-            ['Biomass', ['1.A.5.b.xi', 'Biomass'], 3],
-            # BEL
-            ['Military Use', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # AUS
-            ['Military Transport', ['1.A.5.b.xii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.xii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.xii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.xii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.xii', 'Biomass'], 3],
-            # CZE
-            ['Agriculture and Forestry and Fishing', ['1.A.5.b.xiii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xiii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.xiii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.xiii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.xiii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.xiii', 'Biomass'], 3],
-            ['Other mobile sources not included elsewhere', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # SVN
-            ['Military use of fuels', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            # LUX
-            ['Unspecified Mobile', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # LVA
-            ['Mobile', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # CAN
-            ['Domestic Military (Aviation)', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.ii', 'Biomass'], 3],
-            ['Military Water-borne Navigation', ['1.A.5.b.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iii', 'Biomass'], 3],
-            # ESP, FIN
-            # Information Item
-            ['Information item:(15)', ['\IGNORE', '\IGNORE'], 0],
-            ['Waste incineration with energy recovery included as:', ['\IGNORE', '\IGNORE'], 1],
-            ['Biomass(6)', ['\IGNORE', '\IGNORE'], 1],
-            ['Fossil fuels(4)', ['\IGNORE', '\IGNORE'], 1],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.B.1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 19,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA Amount of fuel produced',
-                'IMPLIED EMISSION FACTORS CH4(1)',
-                'IMPLIED EMISSION FACTORS CO2',
-                'EMISSIONS CH4 Recovery/Flaring(2)',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. B. 1. a. Coal mining and handling', ['1.B.1.a'], 0],
-            ['i. Underground mines(4)', ['1.B.1.a.i'], 1],
-            ['Mining activities', ['1.B.1.a.i.1'], 2],
-            ['Post-mining activities', ['1.B.1.a.i.2'], 2],
-            ['Abandoned underground mines', ['1.B.1.a.i.3'], 2],
-            ['ii. Surface mines(4)', ['1.B.1.a.ii'], 1],
-            ['Mining activities', ['1.B.1.a.ii.1'], 2],
-            ['Post-mining activities', ['1.B.1.a.ii.2'], 2],
-            ['1. B. 1. b. Solid fuel transformation(5)', ['1.B.1.b'], 0],
-            ['1. B. 1. c. Other (please specify)(6)', ['1.B.1.c'], 0],
-            ['Flaring', ['1.B.1.c.i'], 1],  # UKR, AUS
-            ['Flaring of gas', ['1.B.1.c.i'], 1],  # SWE
-            ['Coal Dumps', ['1.B.1.c.ii'], 1],  # JPN
-            ['SO2 scrubbing', ['1.B.1.c.iii'], 1],  # SVN
-            ['Flaring of coke oven gas', ['1.B.1.c.iv'], 1],  # KAZ
-            ['Emisson from Coke Oven Gas Subsystem', ['1.B.1.c.iv'], 1],  # POL
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(3)': 'CH4',
-            'EMISSIONS CO2 Emissions': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.B.2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 33,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA(1) Description(1)',
-                'ACTIVITY DATA(1) Unit(1)',
-                'ACTIVITY DATA(1) Value',
-                'IMPLIED EMISSION FACTORS CO2(2)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. B. 2. a. Oil(6)', ['1.B.2.a'], 0],
-            ['1. Exploration', ['1.B.2.a.1'], 1],
-            ['2. Production(7)', ['1.B.2.a.2'], 1],
-            ['3. Transport', ['1.B.2.a.3'], 1],
-            ['4. Refining/storage', ['1.B.2.a.4'], 1],
-            ['5. Distribution of oil products', ['1.B.2.a.5'], 1],
-            ['6. Other', ['1.B.2.a.6'], 1],
-            ['1. B. 2. b. Natural gas', ['1.B.2.b'], 0],
-            ['1. Exploration', ['1.B.2.b.1'], 1],
-            ['2. Production(7)', ['1.B.2.b.2'], 1],
-            ['3. Processing', ['1.B.2.b.3'], 1],
-            ['4. Transmission and storage', ['1.B.2.b.4'], 1],
-            ['5. Distribution', ['1.B.2.b.5'], 1],
-            ['6. Other', ['1.B.2.b.6'], 1],
-            ['1. B. 2. c. Venting and flaring', ['1.B.2.c'], 0],
-            ['Venting', ['1.B.2.c-ven'], 1],
-            ['i. Oil', ['1.B.2.c-ven.i'], 2],
-            ['ii. Gas', ['1.B.2.c-ven.ii'], 2],
-            ['iii. Combined', ['1.B.2.c-ven.iii'], 2],
-            ['Flaring(8)', ['1.B.2.c-fla'], 1],
-            ['i. Oil', ['1.B.2.c-fla.i'], 2],
-            ['ii. Gas', ['1.B.2.c-fla.ii'], 2],
-            ['iii. Combined', ['1.B.2.c-fla.iii'], 2],
-            ['1.B.2.d. Other (please specify)(9)', ['1.B.2.d'], 0],
-            ['Groundwater extraction and CO2 mining', ['1.B.2.d.i'], 1],  # HUN
-            ['Geothermal', ['1.B.2.d.ii'], 1],  # NOR, DEU, PRT, NZL
-            ['Geothermal Energy', ['1.B.2.d.ii'], 1],  # ISL
-            ['Geothermal Generation', ['1.B.2.d.ii'], 1],  # JPN
-            ['Geotherm', ['1.B.2.d.ii'], 1],  # ITA
-            ['City Gas Production', ['1.B.2.d.iii'], 1],  # PRT
-            ['Other', ['1.B.2.d.iv'], 1],  # UKR, ROU
-            ['Other non-specified', ['1.B.2.d.iv'], 1],  # SWE
-            ['Flaring in refineries', ['1.B.2.d.v'], 1],  # ITA
-            ['LPG transport', ['1.B.2.d.vi'], 1],  # GRC
-            ['Distribution of town gas', ['1.B.2.d.vii'], 1],  # FIN
-            ['Petrol distribution', ['1.B.2.d.viii'], 1],  # IRL
-            ['Natural Gas Transport', ['1.B.2.d.ix'], 1],  # BLR
-            ['Natural gas exploration - N2O emissions', ['1.B.2.d.x'], 1],  # GBR, GBK
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 (4) Amount captured': 'CH4',
-            'EMISSIONS CO2 Emissions(3)': 'CO2',
-            'EMISSIONS N2O Amount captured': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.C": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 24,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA CO2 transported or injected(1)',
-                'IMPLIED EMISSION FACTORS CO2',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Transport of CO2', ['1.C.1']],
-            ['a. Pipelines', ['1.C.1.a']],
-            ['b. Ships', ['1.C.1.b']],
-            ['c. Other', ['1.C.1.c']],
-            ['2. Injection and storage(3)', ['1.C.2']],
-            ['a. Injection', ['1.C.2.a']],
-            ['b. Storage', ['1.C.2.b']],
-            ['3. Other', ['1.C.3']],
-            ['Information item(4, 5)', ['\IGNORE']],
-            ['Total amount captured for storage', ['M.Info.A.TACS']],
-            ['Total amount of imports for storage', ['M.Info.A.TAIS']],
-            ['Total A', ['M.Info.A']],
-            ['Total amount of exports for storage', ['M.Info.B.TAES']],
-            ['Total amount of CO2 injected at storage sites', ['M.Info.B.TAI']],
-            ['Total leakage from transport, injection and storage', ['M.Info.B.TLTIS']],
-            ['Total B', ['M.Info.B']],
-            ['Difference (A-B)(6)', ['\IGNORE']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CO2(2)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.D": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 20,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(I)s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 31,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["industry"],
-        },
-        "sector_mapping": [
-            ['Total industrial processes', ['2']],
-            ['A. Mineral industry', ['2.A']],
-            ['1. Cement production', ['2.A.1']],
-            ['2. Lime production', ['2.A.2']],
-            ['3. Glass production', ['2.A.3']],
-            ['4. Other process uses of carbonates', ['2.A.4']],
-            ['B. Chemical industry', ['2.B']],
-            ['1. Ammonia production', ['2.B.1']],
-            ['2. Nitric acid production', ['2.B.2']],
-            ['3. Adipic acid production', ['2.B.3']],
-            ['4. Caprolactam, glyoxal and glyoxylic acid production', ['2.B.4']],
-            ['5. Carbide production', ['2.B.5']],
-            ['6. Titanium dioxide production', ['2.B.6']],
-            ['7. Soda ash production', ['2.B.7']],
-            ['8. Petrochemical and carbon black production', ['2.B.8']],
-            ['9. Fluorochemical production', ['2.B.9']],
-            ['10. Other (as specified in table 2(I).A-H)', ['2.B.10']],
-            ['C. Metal industry', ['2.C']],
-            ['1. Iron and steel production', ['2.C.1']],
-            ['2. Ferroalloys production', ['2.C.2']],
-            ['3. Aluminium production', ['2.C.3']],
-            ['4. Magnesium production', ['2.C.4']],
-            ['5. Lead production', ['2.C.5']],
-            ['6. Zinc production', ['2.C.6']],
-            ['7. Other (as specified in table 2(I).A-H)', ['2.C.7']],
-        ],
-        "entity_mapping": {
-            'HFCs(1)': 'HFCS (AR4GWP100)',
-            'PFCs(1)': 'PFCS (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table2(I)s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 29,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["industry"],
-        },
-        "sector_mapping": [
-            ['D. Non-energy products from fuels and solvent use', ['2.D']],
-            ['1. Lubricant use', ['2.D.1']],
-            ['2. Paraffin wax use', ['2.D.2']],
-            ['3. Other', ['2.D.3']],
-            ['E. Electronics industry', ['2.E']],
-            ['1. Integrated circuit or semiconductor', ['2.E.1']],
-            ['2. TFT flat panel display', ['2.E.2']],
-            ['3. Photovoltaics', ['2.E.3']],
-            ['4. Heat transfer fluid', ['2.E.4']],
-            ['5. Other (as specified in table 2(II))', ['2.E.5']],
-            ['F. Product uses as substitutes for ODS(2)', ['2.F']],
-            ['1. Refrigeration and air conditioning', ['2.F.1']],
-            ['2. Foam blowing agents', ['2.F.2']],
-            ['3. Fire protection', ['2.F.3']],
-            ['4. Aerosols', ['2.F.4']],
-            ['5. Solvents', ['2.F.5']],
-            ['6. Other applications', ['2.F.6']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['1. Electrical equipment', ['2.G.1']],
-            ['2. SF6 and PFCs from other product use', ['2.G.2']],
-            ['3. N2O from product uses', ['2.G.3']],
-            ['4. Other', ['2.G.4']],
-            ['H. Other (as specified in tables 2(I).A-H and 2(II))(3)', ['2.H']],
-        ],
-        "entity_mapping": {
-            'HFCs(1)': 'HFCS (AR4GWP100)',
-            'PFCs(1)': 'PFCS (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table2(I).A-Hs1": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 40,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(I).A-Hs2": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 36,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(II)": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 38,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["fgases"],
-        },
-        "sector_mapping": [
-            ['Total actual emissions of halocarbons (by chemical) and SF6', ['2']],
-            ['B. Chemical industry', ['2.B']],
-            ['9. Flurochemical production', ['2.B.9']],
-            ['By-product emissions', ['2.B.9.a']],
-            ['Fugitive emissions', ['2.B.9.b']],
-            ['10. Other', ['2.B.10']],
-            ['C. Metal industry', ['2.C']],
-            ['3. Aluminium production', ['2.C.3']],
-            ['4. Magnesium production', ['2.C.4']],
-            ['7. Other', ['2.C.7']],
-            ['E. Electronics industry', ['2.E']],
-            ['1. Integrated circuit or semiconductor', ['2.E.1']],
-            ['2. TFT flat panel display', ['2.E.2']],
-            ['3. Photovoltaics', ['2.E.3']],
-            ['4. Heat transfer fluid', ['2.E.4']],
-            ['5. Other (as specified in table 2(II))', ['2.E.5']],
-            ['F. Product uses as substitutes for ODS(2)', ['2.F']],
-            ['1. Refrigeration and air conditioning', ['2.F.1']],
-            ['2. Foam blowing agents', ['2.F.2']],
-            ['3. Fire protection', ['2.F.3']],
-            ['4. Aerosols', ['2.F.4']],
-            ['5. Solvents', ['2.F.5']],
-            ['6. Other applications', ['2.F.6']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['1. Electrical equipment', ['2.G.1']],
-            ['2. SF6 and PFCs from other product use', ['2.G.2']],
-            ['4. Other', ['2.G.4']],
-            ['H. Other (please specify)', ['2.H']],
-            ['2.H.1 Pulp and paper', ['2.H.1']],
-            ['2.H.2 Food and beverages industry', ['2.H.2']],
-            ['2.H.3 Other (please specify)', ['2.H.3']],
-        ],
-        "entity_mapping": {
-            'C 3F8': 'C3F8',
-            #'C10F18' 'C2F6' 'C4F10' 'C5F12' 'C6F14' 'CF4'
-            'HFC-125': 'HFC125',
-            'HFC-134': 'HFC134',
-            'HFC-134a': 'HFC134a',
-            'HFC-143': 'HFC143',
-            'HFC-143a': 'HFC143a',
-            'HFC-152': 'HFC152',
-            'HFC-152a': 'HFC152a',
-            'HFC-161': 'HFC161',
-            'HFC-227ea': 'HFC227ea',
-            'HFC-23': 'HFC23',
-            'HFC-236cb': 'HFC236cb',
-            'HFC-236ea': 'HFC236ea',
-            'HFC-236fa': 'HFC236fa',
-            'HFC-245ca': 'HFC245ca',
-            'HFC-245fa': 'HFC245fa',
-            'HFC-32': 'HFC32',
-            'HFC-365mfc': 'HFC365mfc',
-            'HFC-41': 'HFC41',
-            'HFC-43-10mee': 'HFC4310mee',
-            'Unspecified mix of HFCs (1)': 'UnspMixOfHFCs (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-            'Unspecified mix of PFCs (1)': 'UnspMixOfPFCs (AR4GWP100)',
-            'c-C3F6': 'cC3F6',
-            'c-C4F8': 'cC4F8',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3s1": {  # Agriculture summary sheet 1
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 75,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['3. Total agriculture', ['3'], 0],
-            # I. Livestock
-            ['I. Livestock', ['M.3.LV'], 1],
-            # A. Enteric fermentation
-            ['A. Enteric fermentation', ['3.A'], 2],
-            ['1. Cattle(1)', ['3.A.1'], 3],
-            ['Option A:', ['\IGNORE'], 4],
-            ['Dairy cattle', ['3.A.1.Aa'], 5],
-            ['Non-dairy cattle', ['3.A.1.Ab'], 5],
-            ['Option B:', ['\IGNORE'], 4],
-            ['Mature dairy cattle', ['3.A.1.Ba'], 5],
-            ['Other mature cattle', ['3.A.1.Bb'], 5],
-            ['Growing cattle', ['3.A.1.Bc'], 5],
-            ['Option C (country-specific):', ['\IGNORE'], 4],
-            # all countries not specified explcitly
-            ['\C!-AUS-MLT-LUX-POL-SVN-USA\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            # Australia
-            ['\C-AUS\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-AUS\ Dairy Cattle', ['3.A.1.C-AUS-a'], 6],
-            ['\C-AUS\ Beef Cattle - Pasture', ['3.A.1.C-AUS-b'], 6],
-            ['\C-AUS\ Beef Cattle - Feedlot', ['3.A.1.C-AUS-c'], 6],
-            # malta
-            ['\C-MLT\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-MLT\ dairy cows', ['3.A.1.C-MLT-a'], 6],
-            ['\C-MLT\ non-lactating cows', ['3.A.1.C-MLT-b'], 6],
-            ['\C-MLT\ bulls', ['3.A.1.C-MLT-c'], 6],
-            ['\C-MLT\ calves', ['3.A.1.C-MLT-d'], 6],
-            ['\C-MLT\ growing cattle 1-2 years', ['3.A.1.C-MLT-e'], 6],
-            # Luxembourg
-            ['\C-LUX\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-LUX\ Bulls', ['3.A.1.C-LUX-a'], 6],
-            ['\C-LUX\ Calves', ['3.A.1.C-LUX-b'], 6],
-            ['\C-LUX\ Young Cattle', ['3.A.1.C-LUX-c'], 6],
-            ['\C-LUX\ Suckler Cows', ['3.A.1.C-LUX-d'], 6],
-            ['\C-LUX\ Bulls under 2 years', ['3.A.1.C-LUX-e'], 6],
-            ['\C-LUX\ Dairy Cows', ['3.A.1.C-LUX-f'], 6],
-            # Poland
-            ['\C-POL\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-POL\ Bulls (older than 2 years)', ['3.A.1.C-POL-a'], 6],
-            ['\C-POL\ Non-dairy Heifers (older than 2 years)', ['3.A.1.C-POL-b'], 6],
-            ['\C-POL\ Non-dairy Young Cattle (younger than 1 year)', ['3.A.1.C-POL-c'], 6],
-            ['\C-POL\ Dairy Cattle', ['3.A.1.C-POL-d'], 6],
-            ['\C-POL\ Non-dairy Young Cattle (1-2 years)', ['3.A.1.C-POL-e'], 6],
-            # Slovenia
-            ['\C-SVN\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-SVN\ Dairy cows', ['3.A.1.C-SVN-a'], 6],
-            ['\C-SVN\ Non-dairy cattle', ['3.A.1.C-SVN-b'], 6],
-            ['\C-SVN\ Other cows', ['3.A.1.C-SVN-c'], 6],
-            # USA
-            ['\C-USA\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-USA\ Steer Stocker', ['3.A.1.C-USA-a'], 6],
-            ['\C-USA\ Heifer Stocker', ['3.A.1.C-USA-b'], 6],
-            ['\C-USA\ Beef Cows', ['3.A.1.C-USA-c'], 6],
-            ['\C-USA\ Dairy Replacements', ['3.A.1.C-USA-d'], 6],
-            ['\C-USA\ Beef Replacements', ['3.A.1.C-USA-e'], 6],
-            ['\C-USA\ Steer Feedlot', ['3.A.1.C-USA-f'], 6],
-            ['\C-USA\ Heifer Feedlot', ['3.A.1.C-USA-g'], 6],
-            ['\C-USA\ Bulls', ['3.A.1.C-USA-h'], 6],
-            ['\C-USA\ Dairy Cows', ['3.A.1.C-USA-i'], 6],
-            ['\C-USA\ Beef Calves', ['3.A.1.C-USA-j'], 6],
-            ['\C-USA\ Dairy Calves', ['3.A.1.C-USA-k'], 6],
-            # Other livestock
-            ['2. Sheep', ['3.A.2'], 3],
-            ['3. Swine', ['3.A.3'], 3],
-            ['4. Other livestock', ['3.A.4'], 3],
-            ['Buffalo', ['3.A.4.a'], 4],
-            ['Camels', ['3.A.4.b'], 4],
-            ['Deer', ['3.A.4.c'], 4],
-            ['Goats', ['3.A.4.d'], 4],
-            ['Horses', ['3.A.4.e'], 4],
-            ['Mules and Asses', ['3.A.4.f'], 4],
-            ['Poultry', ['3.A.4.g'], 4],
-            ['Other (please specify)', ['3.A.4.h'], 4],
-            ['Rabbit', ['3.A.4.h.i'], 5],
-            ['Reindeer', ['3.A.4.h.ii'], 5],
-            ['Ostrich', ['3.A.4.h.iii'], 5],
-            ['Fur-bearing Animals', ['3.A.4.h.iv'], 5],
-            ['Other', ['3.A.4.h.v'], 5],
-            # Manure Management
-            ['B. Manure management', ['3.B'], 2],
-            ['1. Cattle(1)', ['3.B.1'], 3],
-            ['Option A:', ['\IGNORE'], 4],
-            ['Dairy cattle', ['3.B.1.Aa'], 5],
-            ['Non-dairy cattle', ['3.B.1.Ab'], 5],
-            ['Option B:', ['\IGNORE'], 4],
-            ['Mature dairy cattle', ['3.B.1.Ba'], 5],
-            ['Other mature cattle', ['3.B.1.Bb'], 5],
-            ['Growing cattle', ['3.B.1.Bc'], 5],
-            ['Option C (country-specific):', ['\IGNORE'], 4],
-            # all countries not specified explicitly
-            ['\C!-AUS-MLT-LUX-POL-SVN-USA\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            # Australia
-            ['\C-AUS\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-AUS\ Dairy Cattle', ['3.B.1.C-AUS-a'], 6],
-            ['\C-AUS\ Beef Cattle - Pasture', ['3.B.1.C-AUS-b'], 6],
-            ['\C-AUS\ Beef Cattle - Feedlot', ['3.B.1.C-AUS-c'], 6],
-            # Malta
-            ['\C-MLT\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-MLT\ dairy cows', ['3.B.1.C-MLT-a'], 6],
-            ['\C-MLT\ non-lactating cows', ['3.B.1.C-MLT-b'], 6],
-            ['\C-MLT\ bulls', ['3.B.1.C-MLT-c'], 6],
-            ['\C-MLT\ calves', ['3.B.1.C-MLT-d'], 6],
-            ['\C-MLT\ growing cattle 1-2 years', ['3.B.1.C-MLT-e'], 6],
-            # Luxembourg
-            ['\C-LUX\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-LUX\ Bulls', ['3.B.1.C-LUX-a'], 6],
-            ['\C-LUX\ Calves', ['3.B.1.C-LUX-b'], 6],
-            ['\C-LUX\ Young Cattle', ['3.B.1.C-LUX-c'], 6],
-            ['\C-LUX\ Suckler Cows', ['3.B.1.C-LUX-d'], 6],
-            ['\C-LUX\ Bulls under 2 years', ['3.B.1.C-LUX-e'], 6],
-            ['\C-LUX\ Dairy Cows', ['3.B.1.C-LUX-f'], 6],
-            # Poland
-            ['\C-POL\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-POL\ Non-dairy Cattle', ['3.B.1.C-POL-a'], 6],
-            ['\C-POL\ Dairy Cattle', ['3.B.1.C-POL-b'], 6],
-            # Slovenia
-            ['\C-SVN\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-SVN\ Dairy cows', ['3.B.1.C-SVN-a'], 6],
-            ['\C-SVN\ Non-dairy cattle', ['3.B.1.C-SVN-b'], 6],
-            ['\C-SVN\ Other cows', ['3.B.1.C-SVN-c'], 6],
-            # USA
-            ['\C-USA\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-USA\ Dairy Cattle', ['\IGNORE'], 6],
-            ['\C-USA\ Non-Dairy Cattle', ['\IGNORE'], 6],
-            ['\C-USA\ Steer Stocker', ['3.B.1.C-USA-a'], 6],
-            ['\C-USA\ Heifer Stocker', ['3.B.1.C-USA-b'], 6],
-            ['\C-USA\ Beef Cows', ['3.B.1.C-USA-c'], 6],
-            ['\C-USA\ Dairy Replacements', ['3.B.1.C-USA-d'], 6],
-            ['\C-USA\ Beef Replacements', ['3.B.1.C-USA-e'], 6],
-            ['\C-USA\ Steer Feedlot', ['3.B.1.C-USA-f'], 6],
-            ['\C-USA\ Heifer Feedlot', ['3.B.1.C-USA-g'], 6],
-            ['\C-USA\ Bulls', ['3.B.1.C-USA-h'], 6],
-            ['\C-USA\ Dairy Cows', ['3.B.1.C-USA-i'], 6],
-            ['\C-USA\ Beef Calves', ['3.B.1.C-USA-j'], 6],
-            ['\C-USA\ Dairy Calves', ['3.B.1.C-USA-k'], 6],
-            # other animals
-            ['2. Sheep', ['3.B.2'], 3],
-            ['3. Swine', ['3.B.3'], 3],
-            ['4. Other livestock', ['3.B.4'], 3],
-            ['Buffalo', ['3.B.4.a'], 4],
-            ['Camels', ['3.B.4.b'], 4],
-            ['Deer', ['3.B.4.c'], 4],
-            ['Goats', ['3.B.4.d'], 4],
-            ['Horses', ['3.B.4.e'], 4],
-            ['Mules and Asses', ['3.B.4.f'], 4],
-            ['Poultry', ['3.B.4.g'], 4],
-            ['Other (please specify)', ['3.B.4.h'], 4],
-            ['Rabbit', ['3.B.4.h.i'], 5],
-            ['Reindeer', ['3.B.4.h.ii'], 5],
-            ['Ostrich', ['3.B.4.h.iii'], 5],
-            ['Fur-bearing Animals', ['3.B.4.h.iv'], 5],
-            ['Other', ['3.B.4.h.v'], 5],
-            ['5. Indirect N2O emissions', ['3.B.5'], 3],
-        ],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3s2": {  # Agriculture summary sheet 2
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 18,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['C. Rice cultivation', ['3.C']],
-            ['D. Agricultural soils(2) (3) (4)', ['3.D']],
-            ['E. Prescribed burning of savannahs', ['3.E']],
-            ['E. Prescribed burning of savannas', ['3.E']],
-            ['F. Field burning of agricultural residues', ['3.F']],
-            ['G. Liming', ['3.G']],
-            ['H. Urea application', ['3.H']],
-            ['I. Other carbon-containing fertilizers', ['3.I']],
-            ['J. Other (please specify)', ['3.J']],
-            ['NOx from Manure Management', ['3.J.1']],
-            ['3.B NOx Emissions', ['3.J.1']],
-            ['NOx from 3B', ['3.J.1']],
-            ['NOX emissions from manure management', ['3.J.1']],
-            ['NOx from manure management', ['3.J.1']],
-            ['Other', ['3.J.2']],
-            ['Other UK emissions', ['3.J.2']],
-            ['Other non-specified', ['3.J.2']],
-            ['OTs and CDs - Livestock', ['3.J.3']],
-            ['OTs and CDs - soils', ['3.J.4']],
-            ['OTs and CDs - other', ['3.J.5']],
-            ['Digestate renewable raw material (storage of N)', ['3.J.6']],
-            ['Digestate renewable raw material (atmospheric deposition)', ['3.J.7']],
-            ['Digestate renewable raw material (storage of dry matter)', ['3.J.8']],
-            ['NOx from Livestock', ['3.J.9']],
-        ],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.C": {  # rice cultivation details
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 21,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Harvested area(2)',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Organic amendments added(3)',
-                'IMPLIED EMISSION FACTOR (1) CH4',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Irrigated', ['3.C.1']],
-            ['Continuously flooded', ['3.C.1.a']],
-            ['Intermittently flooded Single aeration', ['3.C.1.a.i']],
-            ['Intermittently flooded Multiple aeration', ['3.C.1.b.ii']],
-            ['2. Rainfed', ['3.C.2']],
-            ['Flood prone', ['3.C.2.a']],
-            ['Drought prone', ['3.C.2.b']],
-            ['3. Deep water', ['3.C.3']],
-            ['Water depth 50–100 cm', ['3.C.3.a']],
-            ['Water depth > 100 cm', ['3.C.3.b']],
-            ['4. Other (please specify)', ['3.C.4']],
-            ['Non-specified', ['3.C.4.a']],  # EST
-            ['Other', ['3.C.4.a']],  # DEU
-            ['other', ['3.C.4.a']],  # LVA
-            ['Other cultivation', ['3.C.4.a']],  # CZE
-            ['Upland rice(4)', ['\IGNORE']],
-            ['Total(4)', ['\IGNORE']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': 'CH4',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.D": {  # direct and indirect N2O from soils
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 21,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                "ACTIVITY DATA AND OTHER RELATED INFORMATION Description",
-                "ACTIVITY DATA AND OTHER RELATED INFORMATION Value",
-                "IMPLIED EMISSION FACTORS Value",
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['a. Direct N2O emissions from managed soils', ['3.D.a']],
-            ['1. Inorganic N fertilizers(3)', ['3.D.a.1']],
-            ['2. Organic N fertilizers(3)', ['3.D.a.2']],
-            ['a. Animal manure applied to soils', ['3.D.a.2.a']],
-            ['b. Sewage sludge applied to soils', ['3.D.a.2.b']],
-            ['c. Other organic fertilizers applied to soils', ['3.D.a.2.c']],
-            ['3. Urine and dung deposited by grazing animals', ['3.D.a.3']],
-            ['4. Crop residues', ['3.D.a.4']],
-            ['5. Mineralization/immobilization associated with loss/gain of soil organic matter (4)(5)', ['3.D.a.5']],
-            ['6. Cultivation of organic soils (i.e. histosols)(2)', ['3.D.a.6']],
-            ['7. Other', ['3.D.a.7']],
-            ['b. Indirect N2O Emissions from managed soils', ['3.D.b']],
-            ['1. Atmospheric deposition(6)', ['3.D.b.1']],
-            ['2. Nitrogen leaching and run-off', ['3.D.b.2']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.E": {  # savanna burning details
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 14,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Area of savanna burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Average above-ground biomass density',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Biomass burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Fraction of savanna burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Nitrogen fraction in biomass',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-            ],
-            "stop_cats": ["", ".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Forest land (specify ecological zone)(1)', ['3.E.1'], 0],
-            ['Savanna Grassland', ['3.E.1.b'], 1],  # AUS
-            ['Savanna Woodland', ['3.E.1.a'], 1],  # AUS
-            ['Forest land', ['3.E.1.a'], 1],  # SWE, CHE, CZE, HRV
-            ['Luxembourg', ['3.E.1.c'], 1],  # LUX
-            ['Other non-specified', ['3.E.1.d'], 1],  # EST
-            ['All', ['3.E.1.d'], 1],  # DNK, DNM, DKE
-            ['Unspecified', ['3.E.1.d'], 1],  # DEU
-            ['forest land', ['3.E.1.a'], 1],  # MLT
-            ['Zone', ['3.E.1.d'], 1],  # LVA
-            ['Grassland (specify ecological zone)(1)', ['3.E.2'], 0],
-            ['Savanna Woodland', ['3.E.2.a'], 1],  # AUS
-            ['Savanna Grassland', ['3.E.2.b'], 1],  # AUS
-            ['Temperate Grassland', ['3.E.2.c'], 1],  # AUS
-            ['Grassland', ['3.E.2.d'], 1],  # SWE, CHE, CZE, HRV
-            ['Luxembourg', ['3.E.2.e'], 1],  # LUX
-            ['Other non-specified', ['3.E.2.f'], 1],  # EST
-            ['All', ['3.E.2.f'], 1],  # DNK, DNM, DKE
-            ['Unspecified', ['3.E.2.f'], 1],  # DEU
-            ['Tussock', ['3.E.2.g'], 1],  # NZL
-            ['grassland', ['3.E.2.d'], 1],  # MLT
-            ['Zone_', ['3.E.2.f'], 1],  # LVA
-        ],
-        "entity_mapping": {
-            'EMISSIONS (2) CH4': 'CH4',
-            'EMISSIONS (2) N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.F": {  # field burning details
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 30,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table3.G-I": {  # liming, urea, carbon containing fertilizer
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 13,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table4": {  # LULUCF overview
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 29,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", ".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['4. Total LULUCF', ['4']],
-            ['A. Forest land', ['4.A']],
-            ['1. Forest land remaining forest land', ['4.A.1']],
-            ['2. Land converted to forest land', ['4.A.2']],
-            ['B. Cropland', ['4.B']],
-            ['1. Cropland remaining cropland', ['4.B.1']],
-            ['2. Land converted to cropland', ['4.B.2']],
-            ['C. Grassland', ['4.C']],
-            ['1. Grassland remaining grassland', ['4.C.1']],
-            ['2. Land converted to grassland', ['4.C.2']],
-            ['D. Wetlands(3)', ['4.D']],
-            ['1. Wetlands remaining wetlands', ['4.D.1']],
-            ['2. Land converted to wetlands', ['4.D.2']],
-            ['E. Settlements', ['4.E']],
-            ['1. Settlements remaining settlements', ['4.E.1']],
-            ['2. Land converted to settlements', ['4.E.2']],
-            ['F. Other land (4)', ['4.F']],
-            ['1. Other land remaining other land', ['4.F.1']],
-            ['2. Land converted to other land', ['4.F.2']],
-            ['G. Harvested wood products (5)', ['4.G']],
-            ['H. Other (please specify)', ['4.H']],
-            ['Land converted to Settlement', ['4.H.1']],
-            ['Reservoir of Petit-Saut in French Guiana', ['4.H.5']],
-            ['Biogenic NMVOCs from managed forest', ['4.H.4']],
-            ['All other', ['4.H.9']],
-            ['Luxembourg', ['4.H.8']],
-            ['Settlements Remaining Settlements', ['4.H.2']],
-            ['4.E Settlements', ['4.H.2']],
-            ['4.C Grassland', ['4.H.3']],
-            ['Settlements', ['4.H.2']],
-            ['Other', ['4.H.9']],
-            ['N2O Emissions from Aquaculture Use', ['4.H.6']],
-            ['CH4 from artificial water bodies', ['4.H.7']],
-        ],
-        "entity_mapping": {
-            'CH4(2)': 'CH4',
-            'N2O(2)': 'N2O',
-            'Net CO2 emissions/removals(1), (2)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    # TODO: all other LULUCF tables
-    "Table5": {  # Waste overview
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 27,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Total waste', ['5']],
-            ['A. Solid waste disposal', ['5.A']],
-            ['1. Managed waste disposal sites', ['5.A.1']],
-            ['2. Unmanaged waste disposal sites', ['5.A.2']],
-            ['3. Uncategorized waste disposal sites', ['5.A.3']],
-            ['B. Biological treatment of solid waste', ['5.B']],
-            ['1. Composting', ['5.B.1']],
-            ['2. Anaerobic digestion at biogas facilities', ['5.B.2']],
-            ['C. Incineration and open burning of waste', ['5.C']],
-            ['1. Waste incineration', ['5.C.1']],
-            ['2. Open burning of waste', ['5.C.2']],
-            ['D. Wastewater treatment and discharge', ['5.D']],
-            ['1. Domestic wastewater', ['5.D.1']],
-            ['2. Industrial wastewater', ['5.D.2']],
-            ['3. Other (as specified in table 5.D)', ['5.D.3']],
-            ['E. Other (please specify)', ['5.E']],
-            ['Other', ['5.E.5']],  # EST, NOR
-            ['Recycling activities', ['5.E.1']],  # NLD
-            ['Mechanical-Biological Treatment MBT', ['5.E.2']],  # DEU
-            ['Accidental fires', ['5.E.3']],  # DEU, DKE, DNK, DNM
-            ['Decomposition of Petroleum-Derived Surfactants', ['5.E.4']],  # JPN
-            ['Other non-specified', ['5.E.5']],  # USA
-            ['Biogas burning without energy recovery', ['5.E.6']],  # PRT
-            ['Sludge spreading', ['5.E.7']],  # ESP
-            ['Accidental combustion', ['5.E.3']],  # ESP
-            ['Other waste', ['5.E.5']],  # CZE
-            ['Memo item:(2)', ['\IGNORE']],
-            ['Long-term storage of C in waste disposal sites', ['M.Memo.LTSW']],
-            ['Annual change in total long-term C storage', ['M.Memo.ACLT']],
-            ['Annual change in total long-term C storage in HWP waste(3)', ['M.Memo.ACLTHWP']],
-        ],
-        "entity_mapping": {
-            'CO2(1)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested; memo items not read because of empty lines
-    "Table5.A": {  # solid waste disposal
-        "status": "tested",
-        "table": {
-            "firstrow": 6,
-            "lastrow": 15,
-            "header": ['group', 'group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES Annual waste at the SWDS',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES MCF',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES DOCf',
-                'IMPLIED EMISSION FACTOR SINK CATEGORIES CH4(1)',
-                'IMPLIED EMISSION FACTOR SINK CATEGORIES CO2',
-                'EMISSIONS SINK CATEGORIES CH4 Amount of CH4 flared',
-                'EMISSIONS SINK CATEGORIES CH4 Amount of CH4 for energy recovery(3)',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Managed waste disposal sites', ['5.A.1']],
-            ['a. Anaerobic', ['5.A.1.a']],
-            ['b. Semi-aerobic', ['5.A.1.b']],
-            ['2. Unmanaged waste disposal sites', ['5.A.2']],
-            ['3. Uncategorized waste disposal sites', ['5.A.3']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS SINK CATEGORIES CH4 Emissions(2)': 'CH4',
-            'EMISSIONS SINK CATEGORIES CO2(4) Amount of CH4 for energy recovery(3)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.B": {  # Biological treatment of solid waste
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 16,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Annual waste amount treated',
-                'IMPLIED EMISSION FACTOR CH4(1)',
-                'IMPLIED EMISSION FACTOR N2O',
-                'EMISSIONS CH4 Amount of CH4 flared',
-                'EMISSIONS CH4 Amount of CH4 for energy recovery(3)',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Composting', ['5.B.1'], 0],
-            ['Municipal solid waste', ['5.B.1.a'], 1],
-            ['Other (please specify)(4)', ['5.B.1.b'], 1],
-            ['Organic wastes households', ['5.B.1.b.i'], 2],  # NLD
-            ['Organic wastes from gardens and horticulture', ['5.B.1.b.ii'], 2],  # NLD
-            ['Food and garden waste', ['5.B.1.b.ii'], 2],  # DNM, DNK, DKE
-            ['Industrial Solid Waste', ['5.B.1.b.iii'], 2],  # POL
-            ['Home composting', ['5.B.1.b.iv'], 2],  # NOR
-            ['Mixed waste', ['5.B.1.b.v'], 2],  # LTU
-            ['Other waste', ['5.B.1.b.v'], 2],  # SWE
-            ['Sludge', ['5.B.1.b.vi'], 2],  # HUN, EST
-            ['Textile', ['5.B.1.b.vii'], 2],  # EST
-            ['Wood', ['5.B.1.b.viii'], 2],  # EST
-            ['Organic', ['5.B.1.b.ix'], 2],  # EST
-            ['Paper', ['5.B.1.b.x'], 2],  # EST
-            ['Other_SW', ['5.B.1.b.v'], 2],  # CZE
-            ['MBA treated MSW', ['5.B.1.b.xi'], 2],  # LUX
-            ['Specific Agricultural and Industrial Waste', ['5.B.1.b.xii'], 2],  # UKR
-            ['Industrial solid waste and constr. waste', ['5.B.1.b.xiii'], 2],  # FIN
-            ['Municipal sludge', ['5.B.1.b.xiv'], 2],  # FIN
-            ['Industrial sludge', ['5.B.1.b.xv'], 2],  # FIN
-            ['Open air composting', ['5.B.1.b.xvi'], 2],  # LIE
-            ['Industrial Waste', ['5.B.1.b.xvii'], 2],  # JPN
-            ['Human Waste and Johkasou sludge', ['5.B.1.b.xviii'], 2],  # JPN
-            ['2. Anaerobic digestion at biogas facilities(3)', ['5.B.2'], 0],
-            ['Municipal solid waste', ['5.B.2.a'], 1],
-            ['Other (please specify)(4)', ['5.B.2.b'], 1],
-            ['Organic wastes households', ['5.B.2.b.i'], 2],  # NLD
-            ['Organic wastes from gardens and horticulture', ['5.B.2.b.ii'], 2],  # NLD
-            ['Animal manure and other organic waste', ['5.B.2.b.iii'], 2],  # DNM, DNK, DKE
-            ['sewage sludge', ['5.B.2.b.iv'], 2],  # LTU
-            ['Other waste', ['5.B.2.b.v'], 2],  # SWE
-            ['Agricultural biogas facilities', ['5.B.2.b.vi'], 2],  # CHE
-            ['Other biogases from anaerobic fermentation', ['5.B.2.b.vii'], 2],  # HUN
-            ['Sludge', ['5.B.2.b.iv'], 2],  # EST
-            ['Anaerobic Digestion On-Farm and at Wastewater Treatment Facilities', ['5.B.2.b.viii'], 2],  # USA
-            ['Other_AD', ['5.B.2.b.v'], 2],  # CZE
-            ['Biogenic waste incl. wastes from Agriculture (manure)', ['5.B.2.b.ix'], 2],  # LUX
-            ['Industrial solid waste and constr. waste', ['5.B.2.b.x'], 2],  # FIN
-            ['Municipal sludge', ['5.B.2.b.xi'], 2],  # FIN
-            ['Industrial sludge', ['5.B.2.b.xii'], 2],  # FIN
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(2)': 'CH4',
-            'EMISSIONS N2O Amount of CH4 for energy recovery(3)': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.C": {  # Waste incineration and open burning
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 38,
-            "header": ['group', 'group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA Amount of wastes (incinerated/open burned)',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) CO2',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) CH4',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) N2O',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Waste Incineration', ['5.C.1'], 0],
-            ['Biogenic (1)', ['5.C.1.a'], 1],
-            ['Municipal solid waste', ['5.C.1.a.i'], 2],
-            ['Other (please specify)(2)', ['5.C.1.a.ii'], 2],
-            ['Industrial Solid Wastes', ['5.C.1.a.ii.1'], 3],
-            ['Hazardous Waste', ['5.C.1.a.ii.2'], 3],
-            ['Clinical Waste', ['5.C.1.a.ii.3'], 3],
-            ['Sewage Sludge', ['5.C.1.a.ii.4'], 3],
-            ['Other (please specify)', ['5.C.1.a.ii.5'], 3],
-            ['Animal cremations', ['5.C.1.a.ii.5.a'], 4],  # DKE, DNK, DNM
-            ['Human cremations', ['5.C.1.a.ii.5.b'], 4],  # DKE, DNK, DNM
-            ['Cremation', ['5.C.1.a.ii.5.c'], 4],  # CHE, NOR, FRA, FRK
-            ['cremation', ['5.C.1.a.ii.5.c'], 4],  # DEU
-            ['Industrial waste', ['5.C.1.a.ii.5.d'], 4],  # NOR
-            ['Biogenic other waste', ['5.C.1.a.ii.5.e'], 4],  # EST
-            ['Biogenic waste other than Municipal Solid Waste', ['5.C.1.a.ii.5.e'], 4],  # ROU
-            ['Sludge', ['5.C.1.a.ii.5.f'], 4],  # JPN
-            ['Non-fossile liquid waste', ['5.C.1.a.ii.5.g'], 4],  # JPN
-            ['Non-biogenic', ['5.C.1.b'], 1],
-            ['Municipal solid waste', ['5.C.1.b.i'], 2],
-            ['Other (please specify)(3)', ['5.C.1.b.ii'], 2],
-            ['Industrial Solid Wastes', ['5.C.1.b.ii.1'], 3],
-            ['Hazardous Waste', ['5.C.1.b.ii.2'], 3],
-            ['Clinical Waste', ['5.C.1.b.ii.3'], 3],
-            ['Sewage Sludge', ['5.C.1.b.ii.4'], 3],
-            ['Fossil liquid waste', ['5.C.1.b.ii.5'], 3],
-            ['Other (please specify)', ['5.C.1.b.ii.6'], 3],
-            ['Quarantine and other waste', ['5.C.1.b.ii.6.a'], 4],  # NZL
-            ['Industrial waste', ['5.C.1.b.ii.6.b'], 4],  # CHE
-            ['Chemical waste', ['5.C.1.b.ii.6.c'], 4],  # GBR, GBK
-            ['Flaring in the chemical industry', ['5.C.1.a.ii.6.d'], 4],  # BEL
-            ['Sludge', ['5.C.1.a.ii.6.e'], 4],  # JPN
-            ['Solvents', ['5.C.1.a.ii.6.f'], 4],  # GRC, AUS
-            ['2. Open burning of waste', ['5.C.2'], 0],
-            ['Biogenic (1)', ['5.C.2.a'], 1],
-            ['Municipal solid waste', ['5.C.2.a.i'], 2],
-            ['Other (please specify)', ['5.C.2.a.ii'], 2],
-            ['agricultural waste', ['5.C.2.a.ii.1'], 3],  # ITA
-            ['Agricultural residues', ['5.C.2.a.ii.1'], 3],  # ESP
-            ['Natural residues', ['5.C.2.a.ii.2'], 3],  # CHE
-            ['Wood waste', ['5.C.2.a.ii.3'], 3],  # GBR, GBK
-            ['Bonfires etc.', ['5.C.2.a.ii.4'], 3],  # DEU
-            ['Bonfires', ['5.C.2.a.ii.4'], 3],  # NLD, ISL
-            ['Other', ['5.C.2.a.ii.5'], 3],  # EST
-            ['Other waste', ['5.C.2.a.ii.5'], 3],  # CZE
-            ['Industrial Solid Waste', ['5.C.2.a.ii.6'], 3],  # JPN
-            ['Non-biogenic', ['5.C.2.b'], 1],
-            ['Municipal solid waste', ['5.C.2.b.i'], 2],
-            ['Other (please specify)', ['5.C.2.b.ii'], 2],
-            ['Rural waste', ['5.C.2.b.ii.1'], 3],  # NZL
-            ['Accidental fires (vehicles)', ['5.C.2.b.ii.2'], 3],  # GBR, GBK
-            ['Accidental fires (buildings)', ['5.C.2.b.ii.3'], 3],  # GBR, GBK
-            ['Bonfires', ['5.C.2.b.ii.4'], 3],  # ISL
-            ['Other', ['5.C.2.b.ii.5'], 3],  # EST
-            ['Other waste', ['5.C.2.b.ii.5'], 3],  # CZE
-            ['Industrial Solid Waste', ['5.C.2.b.ii.6'], 3],  # JPN
-        ],
-        "entity_mapping": {
-            'EMISSIONS Amount of wastes (incinerated/open burned) CH4': 'CH4',
-            'EMISSIONS Amount of wastes (incinerated/open burned) CO2': 'CO2',
-            'EMISSIONS Amount of wastes (incinerated/open burned) N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.D": {  # Waste incineration and open burning
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 13,
-            "header": ['group', 'entity', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND RELATED INFORMATION Total organic product',
-                'ACTIVITY DATA AND RELATED INFORMATION Sludge removed(1)',
-                'ACTIVITY DATA AND RELATED INFORMATION Sludge removed(1) N in effluent',
-                'IMPLIED EMISSION FACTOR CH4(2) N in effluent',
-                'IMPLIED EMISSION FACTOR N2O(3) N in effluent',
-                'EMISSIONS CH4 Amount of CH4 flared',
-                'EMISSIONS CH4 Amount of CH4 for Energy Recovery(5)',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Domestic wastewater', ['5.D.1']],
-            ['2. Industrial wastewater', ['5.D.2']],
-            ['3. Other (please specify)', ['5.D.3']],
-            ['Other', ['5.D.3.a']],  # EST
-            ['Septic tanks', ['5.D.3.b']],  # NLD
-            ['Wastewater Effluent', ['5.D.3.c']],  # NLD
-            ['Fish farming', ['5.D.3.d']],  # FIN
-            ['Uncategorized wastewater', ['5.D.3.a']],  # CZE
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(4)': 'CH4',
-            'EMISSIONS N2O(3) Amount of CH4 for Energy Recovery(5)': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-}

+ 0 - 2666
UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/CRF2022_specification.py

@@ -1,2666 +0,0 @@
-""" CRF2022 specification.
-Currently not all tables are included. Extend if you need all country
-specific items in categories 2, 3.H-G, 4
-
-tables included:
-* Energy
-    'Table1s1', Table1s2',
-    'Table1.A(a)s1', 'Table1.A(a)s2', 'Table1.A(a)s3', 'Table1.A(a)s4',
-    'Table1.B.1', 'Table1.B.2', 'Table1.C', 'Table1.D',
-* Industrial processes
-    'Table2(I)s1', 'Table2(I)s2',
-    'Table2(II)',
-* Agriculture
-    'Table3s1', 'Table3s2',
-    'Table3.C', 'Table3.D', 'Table3.E',
-* LULUCF
-    'Table4',
-* Waste
-    'Table5', 'Table5.A', 'Table5.B', 'Table5.C', 'Table5.D'
-
-missing tables are:
-* Energy
-    'Table1.D'
-* Industrial processes
-    'Table2(I).A-Hs1', 'Table2(I).A-Hs2',
-    'Table2(II)B-Hs1', 'Table2(II)B-Hs2',
-* Agriculture
-    'Table3.As1', 'Table3.As2' (no additional emissions data)
-    'Table3.F', 'Table3.G-I',
-* LULUCF
-    All tables except Table4
-* Waste
-    All tables read
-
-TODO:
-* Add missing tables
-* Add activity data
-
-"""
-
-import numpy as np
-from .util import unit_info
-
-CRF2022 = {
-    "Table1s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 26,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Total Energy', ['1']],
-            ['A. Fuel combustion activities (sectoral approach)', ['1.A']],
-            ['1. Energy industries', ['1.A.1']],
-            ['a. Public electricity and heat production', ['1.A.1.a']],
-            ['b. Petroleum refining', ['1.A.1.b']],
-            ['c. Manufacture of solid fuels and other energy industries', ['1.A.1.c']],
-            ['2. Manufacturing industries and construction', ['1.A.2']],
-            ['a. Iron and steel', ['1.A.2.a']],
-            ['b. Non-ferrous metals', ['1.A.2.b']],
-            ['c. Chemicals', ['1.A.2.c']],
-            ['d. Pulp, paper and print', ['1.A.2.d']],
-            ['e. Food processing, beverages and tobacco', ['1.A.2.e']],
-            ['f. Non-metallic minerals', ['1.A.2.f']],
-            ['g. Other (please specify)', ['1.A.2.g']],
-            ['3. Transport', ['1.A.3']],
-            ['a. Domestic aviation', ['1.A.3.a']],
-            ['b. Road transportation', ['1.A.3.b']],
-            ['c. Railways', ['1.A.3.c']],
-            ['d. Domestic navigation', ['1.A.3.d']],
-            ['e. Other transportation', ['1.A.3.e']],
-        ],
-        "entity_mapping": {
-            "NOX": "NOx",
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 36,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['4. Other sectors', ['1.A.4']],
-            ['a. Commercial/institutional', ['1.A.4.a']],
-            ['b. Residential', ['1.A.4.b']],
-            ['c. Agriculture/forestry/fishing', ['1.A.4.c']],
-            ['5. Other (as specified in table 1.A(a) sheet 4)', ['1.A.5']],
-            ['a. Stationary', ['1.A.5.a']],
-            ['b. Mobile', ['1.A.5.b']],
-            ['B. Fugitive emissions from fuels', ['1.B']],
-            ['1. Solid fuels', ['1.B.1']],
-            ['a. Coal mining and handling', ['1.B.1.a']],
-            ['b. Solid fuel transformation', ['1.B.1.b']],
-            ['c. Other (as specified in table 1.B.1)', ['1.B.1.c']],
-            ['2. Oil and natural gas and other emissions from energy production', ['1.B.2']],
-            ['a. Oil', ['1.B.2.a']],
-            ['b. Natural gas', ['1.B.2.b']],
-            ['c. Venting and flaring', ['1.B.2.c']],
-            ['d. Other (as specified in table 1.B.2)', ['1.B.2.d']],
-            ['C. CO2 Transport and storage', ['1.C']],
-            ['1. Transport of CO2', ['1.C.1']],
-            ['2. Injection and storage', ['1.C.2']],
-            ['3. Other', ['1.C.3']],
-            ['Memo items: (1)', ['\IGNORE']],
-            ['International bunkers', ['M.Memo.Int']],
-            ['Aviation', ['M.Memo.Int.Avi']],
-            ['Navigation', ['M.Memo.Int.Mar']],
-            ['Multilateral operations', ['M.Memo.Mult']],
-            ['CO2 emissions from biomass', ['M.Memo.Bio']],
-            ['CO2 captured', ['M.Memo.CO2Cap']],
-            ['For domestic storage', ['M.Memo.CO2Cap.Dom']],
-            ['For storage in other countries', ['M.Memo.CO2Cap.Exp']],
-        ],
-        "entity_mapping": {
-            "NOX": "NOx",
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.A(a)s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 6,
-            "lastrow": 104,  # template, countries report less
-            # check the resulting data as the templates have nan rows
-            # which would stop the reading process (actual reported
-            # data does not seem to have the nan rows)
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured'
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A. Fuel combustion', ['1.A', 'Total'], 0],
-            ['Liquid fuels', ['1.A', 'Liquid'], 1],
-            ['Solid fuels', ['1.A', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A', 'Peat'], 1],
-            ['Biomass(6)', ['1.A', 'Biomass'], 1],
-            # 1.A.1. Energy industries
-            ['1.A.1. Energy industries', ['1.A.1', 'Total'], 1],
-            ['Liquid fuels', ['1.A.1', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.1', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.1', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.1', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.1', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.1', 'Biomass'], 2],
-            # a. Public electricity and heat production
-            ['a. Public electricity and heat production(7)', ['1.A.1.a', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.a', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.a', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.a', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.a', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.a', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.a', 'Biomass'], 3],
-            # 1.A.1.a.i Electricity Generation
-            ['1.A.1.a.i Electricity Generation', ['1.A.1.a.i', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.i', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.i', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.i', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.i', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.i', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.i', 'Biomass'], 4],
-            # 1.A.1.a.ii Combined heat and power generation
-            ['1.A.1.a.ii Combined heat and power generation', ['1.A.1.a.ii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.ii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.ii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.ii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.ii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.ii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.ii', 'Biomass'], 4],
-            # 1.A.1.a.iii heat plants
-            ['1.A.1.a.iii Heat plants', ['1.A.1.a.iii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.iii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.iii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.iii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.iii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.iii', 'Biomass'], 4],
-            # 1.A.1.a.iv Other (please specify)
-            ['1.A.1.a.iv Other (please specify)', ['1.A.1.a.iv', 'Total'], 3],
-            # AUT
-            ['Total Public Electricity and Heat Production', ['1.A.1.a.iv.4', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.4', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.4', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.4', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.4', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.4', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.4', 'Biomass'], 5],
-            # DEU
-            ['1.A.1.a Public Electricity and Heat Production', ['1.A.1.a.iv.4', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.4', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.4', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.4', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.4', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.4', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.4', 'Biomass'], 5],
-            # ESP
-            ['Other', ['1.A.1.a.iv.3', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.3', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.3', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.3', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.3', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.3', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.3', 'Biomass'], 5],
-            # SVK
-            ['Methane Cogeneration (Mining)', ['1.A.1.a.iv.1', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.1', 'OtherFF'], 5],
-            ['Municipal Solid Waste Incineration (Energy use)', ['1.A.1.a.iv.2', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.2', 'OtherFF'], 5],
-            ['Biomass', ['1.A.1.a.iv.2', 'Biomass'], 5],
-            # CHE
-            ['Municipal and special waste incineration plants', ['1.A.1.a.iv.2', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.2', 'OtherFF'], 5],
-            ['Biomass', ['1.A.1.a.iv.2', 'Biomass'], 5],
-            # b. Petroleum refining
-            ['b. Petroleum refining', ['1.A.1.b', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.b', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.b', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.b', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.b', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.b', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.b', 'Biomass'], 3],
-            # c. Manufacture of solid fuels and other energy industries
-            ['c. Manufacture of solid fuels and other energy industries(8)', ['1.A.1.c', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.c', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.c', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.c', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.c', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.c', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.c', 'Biomass'], 3],
-            # 1.A.1.c.i Manufacture of solid fuels
-            ['1.A.1.c.i Manufacture of solid fuels', ['1.A.1.c.i', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.i', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.i', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.i', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.i', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.i', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.i', 'Biomass'], 4],
-            # 1.A.1.c.ii Oil and gas extraction
-            ['1.A.1.c.ii Oil and gas extraction', ['1.A.1.c.ii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.ii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.ii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.ii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.ii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.ii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.ii', 'Biomass'], 4],
-            # 1.A.1.c.iii Other energy industries
-            ['1.A.1.c.iii Other energy industries', ['1.A.1.c.iii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.iii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.iii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.iii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.iii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.iii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.iii', 'Biomass'], 4],
-            # 1.A.1.c.iv Other (please specify)
-            ['1.A.1.c.iv Other (please specify)', ['1.A.1.c.iv', 'Total'], 3],
-            # DEU
-            ['1.A.1.c Manufacture of Solid Fuels and Other Energy Industries', ['1.A.1.c.iv.2', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.2', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.2', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.2', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.2', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.2', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.2', 'Biomass'], 5],
-            # ESP
-            ['Other', ['1.A.1.c.iv.3', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.3', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.3', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.3', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.3', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.3', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.3', 'Biomass'], 5],
-            # CYP
-            ['Charcoal Production', ['1.A.1.c.iv.1', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.1', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.1', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.1', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.1', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.1', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.1', 'Biomass'], 5],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 114,  # template, countries report less
-            # check the resulting data as the templates have nan rows
-            # which would stop the reading process (actual reported
-            # data does not seem to have the nan rows)
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.2 Manufacturing industries and construction', ['1.A.2', 'Total'], 0],
-            ['Liquid fuels', ['1.A.2', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.2', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.2', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.2', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A.2', 'Peat'], 1],
-            ['Biomass(6)', ['1.A.2', 'Biomass'], 1],
-            # a. Iron and Steel
-            ['a. Iron and steel', ['1.A.2.a', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.a', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.a', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.a', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.a', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.a', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.a', 'Biomass'], 2],
-            # b. non-ferrous metals
-            ['b. Non-ferrous metals', ['1.A.2.b', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.b', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.b', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.b', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.b', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.b', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.b', 'Biomass'], 2],
-            # c. Chemicals
-            ['c. Chemicals', ['1.A.2.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.c', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.c', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.c', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.c', 'Biomass'], 2],
-            # d. Pulp paper print
-            ['d. Pulp, paper and print', ['1.A.2.d', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.d', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.d', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.d', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.d', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.d', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.d', 'Biomass'], 2],
-            # e. Food processing, beverages and tobacco
-            ['e. Food processing, beverages and tobacco', ['1.A.2.e', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.e', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.e', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.e', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.e', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.e', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.e', 'Biomass'], 2],
-            # f. non-metallic minerals
-            ['f. Non-metallic minerals', ['1.A.2.f', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.f', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.f', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.f', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.f', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.f', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.f', 'Biomass'], 2],
-            # g. other
-            ['g. Other (please specify)(9)', ['1.A.2.g', 'Total'], 1],
-            #1.A.2.g.i Manufacturing of machinery
-            ['1.A.2.g.i Manufacturing of machinery', ['1.A.2.g.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.i', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.i', 'Biomass'], 3],
-            # 1.A.2.g.ii Manufacturing of transport equipment
-            ['1.A.2.g.ii Manufacturing of transport equipment', ['1.A.2.g.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.ii', 'Biomass'], 3],
-            # 1.A.2.g.iii Mining (excluding fuels) and quarrying
-            ['1.A.2.g.iii Mining (excluding fuels) and quarrying', ['1.A.2.g.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.iii', 'Biomass'], 3],
-            # 1.A.2.g.iv Wood and wood products
-            ['1.A.2.g.iv Wood and wood products', ['1.A.2.g.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.iv', 'Biomass'], 3],
-            # 1.A.2.g.v Construction
-            ['1.A.2.g.v Construction', ['1.A.2.g.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.v', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.v', 'Biomass'], 3],
-            # 1.A.2.g.vi Textile and leather
-            ['1.A.2.g.vi Textile and leather', ['1.A.2.g.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.vi', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.vi', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.vi', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.vi', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.vi', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.vi', 'Biomass'], 3],
-            # 1.A.2.g.vii Off-road vehicles and other machinery
-            ['1.A.2.g.vii Off-road vehicles and other machinery', ['1.A.2.g.vii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.vii', 'Liquid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.vii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.vii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.2.g.vii', 'Biomass'], 3],
-            # 1.A.2.g.viii Other (please specify)
-            ['1.A.2.g.viii Other (please specify)', ['1.A.2.g.viii', 'Total'], 2],
-            # DKE
-            ['Construction', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            ['Mining', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            # DNK, DKE, USA, CZE
-            ['Other non-specified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #SVK, CYP
-            ['Non-specified Industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #BEL
-            ['Other non specified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #PRT, LTU
-            ['Non-specified industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # MLT
-            ['Undefined Industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # TUR
-            ['Other unspecified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # DKE
-            ['Textile', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            # DNK, DNM, FIN, DKE
-            ['Other manufacturing industries', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # CAN
-            ['Other Manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # AUT, LUX
-            ['Other Manufacturing Industries', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # NOR
-            ['Other manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # AUS
-            ['All Other Manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # NLD
-            ['Other Industrial Sectors', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # GBR, GBK
-            ['Other industry (not specified above)', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # UKR
-            ['Oter Industries', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # RUS
-            ['Other industries', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # RUS
-            ['Non-CO2 emissions from BFG combustion', ['1.A.2.g.viii.5', 'Total'], 3],
-            ['Solid Fuels', ['1.A.2.g.viii.5', 'Solid'], 4],
-            # BLR, DNK, ESP, LVA, NZL, POL, ROU, SVN,
-            ['Other', ['1.A.2.g.viii.10', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.10', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.10', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.10', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.10', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.10', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.10', 'Biomass'], 4],
-            # BLR
-            ['Manufacture and construction Aggregated', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # HRV
-            ['Other Industry', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # HRV
-            ['1A2 Total for 1990 to 2000', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # MLT
-            ['All Industry', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # PRT
-            ['Rubber', ['1.A.2.g.viii.6', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.6', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.6', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.6', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.6', 'OtherFF'], 4],
-            ['Biomass', ['1.A.2.g.viii.6', 'Biomass'], 4],
-            # SWE
-            ['All stationary combustin within CRF 1.A.2.g', ['1.A.2.g.viii.7', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.7', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.7', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.7', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.7', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.7', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.7', 'Biomass'], 4],
-            # IRL
-            ['Other stationary combustion', ['1.A.2.g.viii.8', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.8', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.8', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.8', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.8', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.8', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.8', 'Biomass'], 4],
-            # HUN
-            ['Other Stationary Combustion', ['1.A.2.g.viii.8', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.8', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.8', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.8', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.8', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.8', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.8', 'Biomass'], 4],
-            # CHE
-            ['Other Boilers and Engines Industry', ['1.A.2.g.viii.9', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.9', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.9', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.9', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.9', 'OtherFF'], 4],
-            ['Biomass', ['1.A.2.g.viii.9', 'Biomass'], 4],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s3": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 115,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-            ],
-            "stop_cats": ["Note: All footnotes for this table are given at the end of the table on sheet 4.", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.3 Transport', ['1.A.3', 'Total'], 0],
-            ['Liquid fuels', ['1.A.3', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.3', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.3', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.3', 'OtherFF'], 1],
-            ['Biomass(6)', ['1.A.3', 'Biomass'], 1],
-            # a. Domestic Aviation
-            ['a. Domestic aviation(10)', ['1.A.3.a', 'Total'], 1],
-            ['Aviation gasoline', ['1.A.3.a', 'AvGasoline'], 2],
-            ['Jet kerosene', ['1.A.3.a', 'JetKerosene'], 2],
-            ['Biomass', ['1.A.3.a', 'Biomass'], 2],
-            # b. road Transportation
-            ['b. Road transportation(11)', ['1.A.3.b', 'Total'], 1],
-            ['Gasoline', ['1.A.3.b', 'Gasoline'], 2],
-            ['Diesel oil', ['1.A.3.b', 'DieselOil'], 2],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b', 'LPG'], 2],
-            ['Other liquid fuels (please specify)', ['1.A.3.b', 'OtherLiquid'], 2],
-            ['Gaseous fuels', ['1.A.3.b', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.b', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b', 'OtherFF'], 2],
-            # i. Cars
-            ['i. Cars', ['1.A.3.b.i', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.i', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.i', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.i', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.i', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.i', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.i', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant oil', ['1.A.3.b.i', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.i', 'OLBiodieselFC'], 4],  # CAN
-            ['Fossil part of biodiesel', ['1.A.3.b.i', 'OLBiodieselFC'], 4],  # LTU
-            ['Other', ['1.A.3.b.i', 'OLOther'], 4],  # UKR, MLT
-            ['Other Liquid Fuels', ['1.A.3.b.i', 'OLOther'], 4],  # CYP
-            ['Other motor fuels', ['1.A.3.b.i', 'OMotorFuels'], 4],  # RUS
-            ['Lubricants in 2-stroke engines', ['1.A.3.b.i', 'Lubricants'], 4],  # HUN
-            ['LNG', ['1.A.3.b.i', 'LNG'], 4],  ## USA
-            ['Gaseous fuels', ['1.A.3.b.i', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.i', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.i', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.i', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN, LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.i', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Natural Gas', ['1.A.3.b.i', 'OFFNaturalGas'], 4],  # USA
-            ['Fossil part of biofuel', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # IRL
-            ['Other', ['1.A.3.b.i', 'OFFOther'], 4],  # MLT
-            # ii. Light duty trucks
-            ['ii. Light duty trucks', ['1.A.3.b.ii', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.ii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.ii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.ii', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.ii', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.ii', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.ii', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant Oil', ['1.A.3.b.ii', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.ii', 'OLBiodieselFC'], 4],  # CAN
-            ['Other', ['1.A.3.b.ii', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.ii', 'OLOther'], 4],  # CYP
-            ['Other motor fuels', ['1.A.3.b.ii', 'OMotorFuels'], 4],  # RUS
-            ['LNG', ['1.A.3.b.ii', 'LNG'], 4],  ## USA
-            ['Gaseous fuels', ['1.A.3.b.ii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.ii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.ii', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.ii', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN, LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.ii', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biofuel', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # IRL
-            # iii. Heavy duty trucks and buses
-            ['iii. Heavy duty trucks and buses', ['1.A.3.b.iii', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.iii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.iii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.iii', 'LPG'], 3],
-            ['Other liquid fFuels (please specify)', ['1.A.3.b.iii', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.iii', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.iii', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant Oil', ['1.A.3.b.iii', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.iii', 'OLBiodieselFC'], 4],  # CAN
-            ['Other', ['1.A.3.b.iii', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.iii', 'OLOther'], 4],  # CYP
-            ['Other motor fuels', ['1.A.3.b.iii', 'OMotorFuels'], 4],  # RUS
-            ['LNG', ['1.A.3.b.iii', 'LNG'], 4],  # USA
-            ['GTL', ['1.A.3.b.iii', 'GTL'], 4],  # MCO, new in 2022
-            ['Gaseous fuels', ['1.A.3.b.iii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.iii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.iii', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.iii', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN. LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.iii', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biofuel', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # IRL
-            # iv. Motorcycles
-            ['iv. Motorcycles', ['1.A.3.b.iv', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.iv', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.iv', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.iv', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.iv', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.iv', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.iv', 'Lubricants'], 4],  # UKR, JPN, HRV
-            ['Lubricant Oil', ['1.A.3.b.iv', 'Lubricants'], 4],  # PRT
-            ['Other', ['1.A.3.b.iv', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.iv', 'OLOther'], 4],  # CYP
-            ['Lube', ['1.A.3.b.iv', 'Lubricants'], 4],  # MCO
-            ['Lubricants in 2-stroke engines', ['1.A.3.b.iv', 'Lubricants'], 4],  # HUN
-            ['Lubricants (two-stroke engines)', ['1.A.3.b.iv', 'Lubricants'], 4],  # ESP
-            ['lubricants', ['1.A.3.b.iv', 'Lubricants'], 4],  # SVN
-            ['Gaseous fuels', ['1.A.3.b.iv', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.iv', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.iv', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.iv', 'OFFOther'], 4],  # CYP
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # CZE
-            ['Fossil part of biodiesel', ['1.A.3.b.iv', 'OFFBiodieselFC'], 4],  # BEL
-            ['Fossil part of biogasoline', ['1.A.3.b.iv', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biodiese', ['1.A.3.b.iv', 'OFFBiodieselFC'], 4],  # LVA
-            ['Fossil part of biofuel', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # IRL
-            ['fossil part of biodiesel', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # HRV
-            # v. Other
-            ['v. Other (please specify)', ['1.A.3.b.v', 'Total'], 2],
-            # TUR
-            ['Road total', ['1.A.3.b.v.1', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.1', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.1', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.1', 'LPG'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.1', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.1', 'Biomass'], 4],
-            # CYP
-            ['Buses', ['1.A.3.b.v.2', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.2', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.2', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.2', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.2', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.2', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.2', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.2', 'OtherFF'], 4],
-            # GBK, GBR
-            ['All vehicles - biofuel use', ['1.A.3.b.v.3', 'Total'], 3],
-            ['Biomass', ['1.A.3.b.v.3', 'Biomass'], 4],
-            ['All vehicles - LPG use', ['1.A.3.b.v.4', 'Total'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.4', 'LPG'], 4],
-            ['All vehicles - biofuel use (fossil component)', ['1.A.3.b.v.5', 'Total'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.5', 'OtherFF'], 4],
-            # CAN
-            ['Propane and Natural Gas Vehicles', ['1.A.3.b.v.6', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.6', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.6', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.6', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.6', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.6', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.6', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.6', 'OtherFF'], 4],
-            # BEL
-            ['Lubricant Two-Stroke Engines', ['1.A.3.b.v.7', 'Lubricants'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            # ROU
-            ['Gaseous Fuels', ['1.A.3.b.v.8', 'Total'], 3],
-            ['Gaseous Fuels', ['1.A.3.b.v.8', 'Gaseous'], 4],
-            ['Other Liquid Fuels', ['1.A.3.b.v.9', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.9', 'OtherLiquid'], 4],
-            ['Other Kerosene', ['1.A.3.b.v.9', 'Kerosene'], 5],
-            ['Heating and Other Gasoil', ['1.A.3.b.v.9', 'HeatingGasoil'], 5],
-            ['Biomass', ['1.A.3.b.v.10', 'Total'], 3],
-            ['Biomass', ['1.A.3.b.v.10', 'Biomass'], 4],
-            # DEU
-            ['CO2 from lubricant co-incineration in 2-stroke road vehicles', ['1.A.3.b.v.7', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            ['lubricant used in 2-stroke mix', ['1.A.3.b.v.7', 'Lubricants'], 5],
-            # USA
-            ['Evaporative Emissions', ['1.A.3.b.v.11', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.11', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.11', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.11', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.11', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.11', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.11', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.11', 'OtherFF'], 4],
-            # SVK
-            ['Urea-based catalysts', ['1.A.3.b.v.12', 'Total'], 3],
-            ['Diesel Oil', ['1.A.3.b.v.12', 'DieselOil'], 4],
-            # ESP
-            ['Other non-specified', ['1.A.3.b.v.13', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.13', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.13', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.13', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.13', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.13', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.13', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.13', 'OtherFF'], 4],
-            # BGR
-            ['Urea', ['1.A.3.b.v.12', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.12', 'OtherLiquid'], 4],
-            ['Lubricants', ['1.A.3.b.v.7', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            # c. Railways
-            ['c. Railways', ['1.A.3.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.3.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.3.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.3.c', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.c', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)', ['1.A.3.c', 'OtherFF'], 2],
-            ['Biodiesel (fossil component)', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # LUX
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # SWE
-            ['Fossil part of biodiesel', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # LVA, new in 2022
-            ['Other fossil fuels', ['1.A.3.c', 'OFFOther'], 3],  # ROU, new in 2022
-            # d. Domestic navigation
-            ['d. Domestic Navigation(10)', ['1.A.3.d', 'Total'], 1],
-            ['Residual fuel oil', ['1.A.3.d', 'ResFuelOil'], 2],
-            ['Gas/diesel oil', ['1.A.3.d', 'GasDieselOil'], 2],
-            ['Gasoline', ['1.A.3.d', 'Gasoline'], 2],
-            ['Other liquid fuels (please specify)', ['1.A.3.d', 'OtherLiquid'], 2],
-            ['Lubricants', ['1.A.3.d', 'Lubricants'], 3],  # UKR, JPN
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.d', 'OLBiodieselFC'], 3],  # CAN
-            ['Light Fuel Oil', ['1.A.3.d', 'LightFuelOil'], 3],  # CAN
-            ['Kerosene and stove oil', ['1.A.3.d', 'KeroseStoveOil'], 3],  # CAN
-            ['Kerosene', ['1.A.3.d', 'Kerosene'], 3],  # DKE, DNK
-            ['Natural Gas Liquids', ['1.A.3.d', 'NGL'], 3],  # DKE, DNK
-            ['Fossil part of biodiesel', ['1.A.3.d', 'OLBiodieselFC'], 3],  # LTU
-            ['Other non-specified', ['1.A.3.d', 'OLOther'], 3],  # SWE
-            ['Other motor fuels', ['1.A.3.d', 'OMotorFuels'], 3],  # RUS
-            ['Fuel oil A', ['1.A.3.d', 'FuelOilA'], 3],  # JPN
-            ['Fuel oil B', ['1.A.3.d', 'FuelOilB'], 3],  # JPN
-            ['Fuel oil C', ['1.A.3.d', 'FuelOilC'], 3],  # JPN
-            ['Diesel Oil', ['1.A.3.d', 'OLDiesel'], 3],  # FIN
-            ['Other Liquid Fuels', ['1.A.3.d', 'OLOther'], 3],  # ROU, new in 2022
-            ['Gaseous fuels', ['1.A.3.d', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.d', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.d', 'OtherFF'], 2],
-            ['Liquified natural gas', ['1.A.3.d', 'LNG'], 3],  # DKE, DNK, DNM
-            ['Biodiesel (fossil component)', ['1.A.3.d', 'OFFBiodieselFC'], 3],  # LUX
-            ['Coal', ['1.A.3.d', 'OFFCoal'], 3],  # NZL, NDL
-            ['fossil part of biodiesel', ['1.A.3.d', 'OFFBiodieselFC'], 3],  # AUT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.d', 'OFFBioGasDieselFC'], 3],  # SWE
-            ['Solid Fuels', ['1.A.3.d', 'OFFSolid'], 3],  # AUS
-            ['Other Fossil Fuels', ['1.A.3.d', 'OFFOther'], 3],  # ROU, new in 2022
-            # e. other transportation
-            # keep details also for top category as it's present
-            ['e. Other transportation (please specify)', ['1.A.3.e', 'Total'], 1],
-            ['Liquid fuels', ['1.A.3.e', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.3.e', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.3.e', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.3.e', 'OtherFF'], 2],
-            ['Biomass(6)', ['1.A.3.e', 'Biomass'], 2],
-            # i. pipeline
-            ['i. Pipeline transport', ['1.A.3.e.i', 'Total'], 2],
-            ['Liquid fuels', ['1.A.3.e.i', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.3.e.i', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.3.e.i', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.3.e.i', 'OtherFF'], 3],
-            ['Biomass(6)', ['1.A.3.e.i', 'Biomass'], 3],
-            # ii other
-            ['ii. Other (please specify)', ['1.A.3.e.ii', 'Total'], 2],
-            # UKR, SWE
-            ['Off-road vehicles and other machinery', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # GBR, GBK
-            ['Aircraft support vehicles', ['1.A.3.e.ii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.2', 'Liquid'], 4],
-            # CAN
-            ['Off Road', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # LTU
-            ['Off-road transport', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # BEL
-            ['Other non-specified', ['1.A.3.e.ii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.3', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.3', 'Biomass'], 4],
-            # AUS
-            ['Off-Road Vehicles', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # USA
-            ['Non-Transportation Mobile', ['1.A.3.e.ii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.4', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.4', 'Biomass'], 4],
-            # AUT (new in 2022)
-            ['Airport ground activities', ['1.A.3.e.ii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.4', 'Liquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.4', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.4', 'Biomass'], 4],
-            # ROU, new in 2022
-            ['Other', ['1.A.3.e.ii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.3', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.3', 'Biomass'], 4],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s4": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 127,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.4 Other sectors', ['1.A.4', 'Total'], 0],
-            ['Liquid fuels', ['1.A.4', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.4', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.4', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.4', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A.4', 'Peat'], 1],
-            ['Biomass(6)', ['1.A.4', 'Biomass'], 1],
-            # a. Commercial/institutional(12)
-            ['a. Commercial/institutional(12)', ['1.A.4.a', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.a', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.a', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.a', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.a', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.a', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.a', 'Biomass'], 2],
-            # 1.A.4.a.i Stationary combustion
-            ['1.A.4.a.i Stationary combustion', ['1.A.4.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.4.a.i', 'Biomass'], 3],
-            # 1.A.4.a.ii Off-road vehicles and other machinery
-            ['1.A.4.a.ii Off-road vehicles and other machinery', ['1.A.4.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.4.a.ii', 'Biomass'], 3],
-            # 1.A.4.a.iii Other (please specify)
-            ['1.A.4.a.iii Other (please specify)', ['1.A.4.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.a.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.4.a.iii', 'Biomass'], 3],
-            # b. Residential(13)
-            ['b. Residential(13)', ['1.A.4.b', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.b', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.b', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.b', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.b', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.b', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.b', 'Biomass'], 2],
-            # 1.A.4.b.i Stationary combustion
-            ['1.A.4.b.i Stationary combustion', ['1.A.4.b.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.b.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.b.i', 'Peat'], 3],
-            ['Biomass', ['1.A.4.b.i', 'Biomass'], 3],
-            # 1.A.4.b.ii Off-road vehicles and other machinery
-            ['1.A.4.b.ii Off-road vehicles and other machinery', ['1.A.4.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.4.b.ii', 'Biomass'], 3],
-            # 1.A.4.b.iii Other (please specify)
-            ['1.A.4.b.iii Other (please specify)', ['1.A.4.b.iii', 'Total'], 2],
-            # CYP, USA
-            ['Residential', ['1.A.4.b.iii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.4.b.iii.1', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.iii.1', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.iii.1', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.iii.1', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.b.iii.1', 'Peat'], 3],
-            ['Biomass', ['1.A.4.b.iii.1', 'Biomass'], 3],
-            # c. Agriculture/forestry/fishing
-            ['c. Agriculture/forestry/fishing', ['1.A.4.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.c', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.c', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.c', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.c', 'Biomass'], 2],
-            # i. Stationary
-            ['i. Stationary', ['1.A.4.c.i', 'Total'], 2],
-            ['Liquid fuels', ['1.A.4.c.i', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.4.c.i', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.4.c.i', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.4.c.i', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.4.c.i', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.4.c.i', 'Biomass'], 3],
-            # ii. Off-road vehicles and other machinery
-            ['ii. Off-road vehicles and other machinery', ['1.A.4.c.ii', 'Total'], 2],
-            ['Gasoline', ['1.A.4.c.ii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.4.c.ii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.4.c.ii', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.4.c.ii', 'OtherLiquid'], 3],
-            ['Other Kerosene', ['1.A.4.c.ii', 'Kerosene'], 4],  # HRV
-            ['Lubricants', ['1.A.4.c.ii', 'Lubricants'], 4],  # HRV
-            ['Gasoil', ['1.A.4.c.ii', 'Gasoil'], 4],  # FIN
-            ['Marine gasoil', ['1.A.4.c.ii', 'MarineGasoil'], 4],  # NOR
-            ['heavy fuel oil', ['1.A.4.c.ii', 'HeavyFuelOil'], 4],  # NOR
-            ['Other motor fuels', ['1.A.4.c.ii', 'OMotorFuels'], 4],  # RUS
-            ['Biodiesel (5 percent fossil portion)', ['1.A.4.c.ii', 'OLBiodieselFC'], 4],  # CAN
-            ['Gaseous fuels', ['1.A.4.c.ii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.4.c.ii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.4.c.ii', 'OtherFF'], 3],
-            ['fossil part of biodiesel', ['1.A.4.c.ii', 'OFFBiodieselFC'], 4],
-            ['Fossil part of biodiesel and biogasoline', ['1.A.4.c.ii', 'OFFBiofuelFC'], 4],
-            ['Biodiesel (fossil component)', ['1.A.4.c.ii', 'OFFBiodieselFC'], 4], # LUX
-            ['Alkylate Gasoline', ['1.A.4.c.ii', 'OFFAlkylateGasoline'], 4], # LIE
-            # iii. Fishing
-            ['iii. Fishing', ['1.A.4.c.iii', 'Total'], 2],
-            ['Residual fuel oil', ['1.A.4.c.iii', 'ResFuelOil'], 3],
-            ['Gas/diesel oil', ['1.A.4.c.iii', 'GasDieselOil'], 3],
-            ['Gasoline', ['1.A.4.c.iii', 'Gasoline'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.4.c.iii', 'OtherLiquid'], 3],
-            ['Biodiesel (5 percent fossil portion)', ['1.A.4.c.iii', 'OLBiodieselFC'], 4],  # CAN
-            ['Gaseous fuels', ['1.A.4.c.iii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.4.c.iii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.4.c.iii', 'OtherFF'], 3],
-            ['Fossil part of biodiesel and biogasoline', ['1.A.4.c.iii', 'OFFBiofuelFC'], 3],
-            # 1.A.5 Other (Not specified elsewhere)(14)
-            ['1.A.5 Other (Not specified elsewhere)(14)', ['1.A.5', 'Total'], 0],
-            # a. Stationary (please specify)
-            ['a. Stationary (please specify)', ['1.A.5.a', 'Total'], 1],
-            # temp
-            ['Liquid Fuels', ['1.A.5.a', 'Liquid'], 2],
-            ['Solid Fuels', ['1.A.5.a', 'Solid'], 2],
-            ['Gaseous Fuels', ['1.A.5.a', 'Gaseous'], 2],
-            ['Other Fossil Fuels', ['1.A.5.a', 'OtherFF'], 2],
-            ['Peat', ['1.A.5.a', 'Peat'], 2],
-            ['Biomass', ['1.A.5.a', 'Biomass'], 2],
-            # temp
-            # GBK, GBR
-            ['Military fuel use', ['1.A.5.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.i', 'Biomass'], 3],
-            # TUR
-            ['Liquid fuels', ['1.A.5.a', 'Liquid'], 2],
-            # ESP, FIN, SWE
-            ['Other non-specified', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # ROU, SVK, RUS
-            ['Other', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # FRA, FRK
-            ['Other not specified', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # CYP
-            ['Other (not specified elsewhere)', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # NOR, HUN
-            ['Military', ['1.A.5.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.i', 'Biomass'], 3],
-            ['Non-fuel Use', ['1.A.5.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iii', 'Liquid'], 3],
-            # DNM, DKE, DNK
-            ['Other stationary combustion', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # LUX
-            ['Stationary', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # USA
-            ['Incineration of Waste', ['1.A.5.a.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.iv', 'Biomass'], 3],
-            ['U.S. Territories', ['1.A.5.a.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.v', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.v', 'Biomass'], 3],
-            ['Non Energy Use', ['1.A.5.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.iii', 'Biomass'], 3],
-            # b. Mobile (please specify)
-            ['b. Mobile (please specify)', ['1.A.5.b', 'Total'], 1],
-            # temp
-            ['Liquid Fuels', ['1.A.5.b', 'Liquid'], 2],
-            ['Solid Fuels', ['1.A.5.b', 'Solid'], 2],
-            ['Gaseous Fuels', ['1.A.5.b', 'Gaseous'], 2],
-            ['Other Fossil Fuels', ['1.A.5.b', 'OtherFF'], 2],
-            ['Biomass', ['1.A.5.b', 'Biomass'], 2],
-            # temp
-            # GBK, GBR
-            ['Military aviation and naval shipping', ['1.A.5.b.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.i', 'Liquid'], 3],
-            # HRV
-            ['Military aviation component', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.ii', 'Biomass'], 3],
-            ['Military water-borne component', ['1.A.5.b.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iii', 'Biomass'], 3],
-            # ESP, FIN
-            ['Other non-specified', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # NLD, DKE, DNM, DNK, SWE, UKR
-            ['Military use', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.v', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # AUT, NOR, USA, CHE, HUN, LTU
-            ['Military', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # PRT
-            ['Military Aviation', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            # ROU, MLT
-            ['Other', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # FRA, FRK
-            ['Other not specified', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # CYP
-            ['1A5b i Mobile (aviation component)', ['1.A.5.b.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vi', 'Liquid'], 3],
-            # GBK, GBR
-            ['Lubricants used in 2-stroke engines', ['1.A.5.b.vii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vii', 'Liquid'], 3],
-            # DNM, DKE, DNK
-            ['Recreational crafts', ['1.A.5.b.viii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.viii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.viii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.viii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.viii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.viii', 'Biomass'], 3],
-            # SVK
-            ['Military use Jet Kerosene', ['1.A.5.b.ix', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ix', 'Liquid'], 3],
-            ['Military Gasoline', ['1.A.5.b.x', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.x', 'Liquid'], 3],
-            ['Biomass', ['1.A.5.b.ix', 'Biomass'], 3],
-            ['Military Diesel Oil', ['1.A.5.b.xi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xi', 'Liquid'], 3],
-            ['Biomass', ['1.A.5.b.xi', 'Biomass'], 3],
-            # BEL
-            ['Military Use', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # AUS
-            ['Military Transport', ['1.A.5.b.xii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.xii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.xii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.xii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.xii', 'Biomass'], 3],
-            # CZE
-            ['Agriculture and Forestry and Fishing', ['1.A.5.b.xiii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xiii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.xiii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.xiii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.xiii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.xiii', 'Biomass'], 3],
-            ['Other mobile sources not included elsewhere', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # SVN
-            ['Military use of fuels', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            # LUX
-            ['Unspecified Mobile', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # LVA
-            ['Mobile', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # CAN
-            ['Domestic Military (Aviation)', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.ii', 'Biomass'], 3],
-            ['Military Water-borne Navigation', ['1.A.5.b.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iii', 'Biomass'], 3],
-            # CZE, new in 2022
-            ['i. Mobile (aviation component)', ['1.A.5.b.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vi', 'Liquid'], 3],
-            ['iii. Mobile (other)', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            # Information Item
-            ['Information item:(15)', ['\IGNORE', '\IGNORE'], 0],
-            ['Waste incineration with energy recovery included as:', ['\IGNORE', '\IGNORE'], 1],
-            ['Biomass(6)', ['\IGNORE', '\IGNORE'], 1],
-            ['Fossil fuels(4)', ['\IGNORE', '\IGNORE'], 1],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.B.1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 19,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA Amount of fuel produced',
-                'IMPLIED EMISSION FACTORS CH4(1)',
-                'IMPLIED EMISSION FACTORS CO2',
-                'EMISSIONS CH4 Recovery/Flaring(2)',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. B. 1. a. Coal mining and handling', ['1.B.1.a'], 0],
-            ['i. Underground mines(4)', ['1.B.1.a.i'], 1],
-            ['Mining activities', ['1.B.1.a.i.1'], 2],
-            ['Post-mining activities', ['1.B.1.a.i.2'], 2],
-            ['Abandoned underground mines', ['1.B.1.a.i.3'], 2],
-            ['ii. Surface mines(4)', ['1.B.1.a.ii'], 1],
-            ['Mining activities', ['1.B.1.a.ii.1'], 2],
-            ['Post-mining activities', ['1.B.1.a.ii.2'], 2],
-            ['1. B. 1. b. Solid fuel transformation(5)', ['1.B.1.b'], 0],
-            ['1. B. 1. c. Other (please specify)(6)', ['1.B.1.c'], 0],
-            ['Flaring', ['1.B.1.c.i'], 1],  # UKR, AUS
-            ['Flaring of gas', ['1.B.1.c.i'], 1],  # SWE
-            ['Coal Dumps', ['1.B.1.c.ii'], 1],  # JPN
-            ['SO2 scrubbing', ['1.B.1.c.iii'], 1],  # SVN
-            ['Flaring of coke oven gas', ['1.B.1.c.iv'], 1],  # KAZ
-            ['Emisson from Coke Oven Gas Subsystem', ['1.B.1.c.iv'], 1],  # POL
-            ['Other', ['1.B.1.c.v'], 1],  # ROU, new in 2022
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(3)': 'CH4',
-            'EMISSIONS CO2 Emissions': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.B.2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 33,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA(1) Description(1)',
-                'ACTIVITY DATA(1) Unit(1)',
-                'ACTIVITY DATA(1) Value',
-                'IMPLIED EMISSION FACTORS CO2(2)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. B. 2. a. Oil(6)', ['1.B.2.a'], 0],
-            ['1. Exploration', ['1.B.2.a.1'], 1],
-            ['2. Production(7)', ['1.B.2.a.2'], 1],
-            ['3. Transport', ['1.B.2.a.3'], 1],
-            ['4. Refining/storage', ['1.B.2.a.4'], 1],
-            ['5. Distribution of oil products', ['1.B.2.a.5'], 1],
-            ['6. Other', ['1.B.2.a.6'], 1],
-            ['1. B. 2. b. Natural gas', ['1.B.2.b'], 0],
-            ['1. Exploration', ['1.B.2.b.1'], 1],
-            ['2. Production(7)', ['1.B.2.b.2'], 1],
-            ['3. Processing', ['1.B.2.b.3'], 1],
-            ['4. Transmission and storage', ['1.B.2.b.4'], 1],
-            ['5. Distribution', ['1.B.2.b.5'], 1],
-            ['6. Other', ['1.B.2.b.6'], 1],
-            ['1. B. 2. c. Venting and flaring', ['1.B.2.c'], 0],
-            ['Venting', ['1.B.2.c-ven'], 1],
-            ['i. Oil', ['1.B.2.c-ven.i'], 2],
-            ['ii. Gas', ['1.B.2.c-ven.ii'], 2],
-            ['iii. Combined', ['1.B.2.c-ven.iii'], 2],
-            ['Flaring(8)', ['1.B.2.c-fla'], 1],
-            ['i. Oil', ['1.B.2.c-fla.i'], 2],
-            ['ii. Gas', ['1.B.2.c-fla.ii'], 2],
-            ['iii. Combined', ['1.B.2.c-fla.iii'], 2],
-            ['1.B.2.d. Other (please specify)(9)', ['1.B.2.d'], 0],
-            ['Groundwater extraction and CO2 mining', ['1.B.2.d.i'], 1],  # HUN
-            ['Geothermal', ['1.B.2.d.ii'], 1],  # NOR, DEU, PRT, NZL
-            ['Geothermal Energy', ['1.B.2.d.ii'], 1],  # ISL
-            ['Geothermal Generation', ['1.B.2.d.ii'], 1],  # JPN
-            ['Geotherm', ['1.B.2.d.ii'], 1],  # ITA
-            ['City Gas Production', ['1.B.2.d.iii'], 1],  # PRT
-            ['Other', ['1.B.2.d.iv'], 1],  # UKR, ROU
-            ['Other non-specified', ['1.B.2.d.iv'], 1],  # SWE
-            ['Flaring in refineries', ['1.B.2.d.v'], 1],  # ITA
-            ['LPG transport', ['1.B.2.d.vi'], 1],  # GRC
-            ['Distribution of town gas', ['1.B.2.d.vii'], 1],  # FIN
-            ['Petrol distribution', ['1.B.2.d.viii'], 1],  # IRL
-            ['Natural Gas Transport', ['1.B.2.d.ix'], 1],  # BLR
-            ['Natural gas exploration - N2O emissions', ['1.B.2.d.x'], 1],  # GBR, GBK
-            ['flue gas desulfurisation', ['1.B.2.d.xi'], 1],  # GBR, GBK, new in 2022
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 (4) Amount captured': 'CH4',
-            'EMISSIONS CO2 Emissions(3)': 'CO2',
-            'EMISSIONS N2O Amount captured': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.C": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 24,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA CO2 transported or injected(1)',
-                'IMPLIED EMISSION FACTORS CO2',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Transport of CO2', ['1.C.1']],
-            ['a. Pipelines', ['1.C.1.a']],
-            ['b. Ships', ['1.C.1.b']],
-            ['c. Other', ['1.C.1.c']],
-            ['2. Injection and storage(3)', ['1.C.2']],
-            ['a. Injection', ['1.C.2.a']],
-            ['b. Storage', ['1.C.2.b']],
-            ['3. Other', ['1.C.3']],
-            ['Information item(4, 5)', ['\IGNORE']],
-            ['Total amount captured for storage', ['M.Info.A.TACS']],
-            ['Total amount of imports for storage', ['M.Info.A.TAIS']],
-            ['Total A', ['M.Info.A']],
-            ['Total amount of exports for storage', ['M.Info.B.TAES']],
-            ['Total amount of CO2 injected at storage sites', ['M.Info.B.TAI']],
-            ['Total leakage from transport, injection and storage', ['M.Info.B.TLTIS']],
-            ['Total B', ['M.Info.B']],
-            ['Difference (A-B)(6)', ['\IGNORE']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CO2(2)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.D": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 20,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(I)s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 31,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["industry"],
-        },
-        "sector_mapping": [
-            ['Total industrial processes', ['2']],
-            ['A. Mineral industry', ['2.A']],
-            ['1. Cement production', ['2.A.1']],
-            ['2. Lime production', ['2.A.2']],
-            ['3. Glass production', ['2.A.3']],
-            ['4. Other process uses of carbonates', ['2.A.4']],
-            ['B. Chemical industry', ['2.B']],
-            ['1. Ammonia production', ['2.B.1']],
-            ['2. Nitric acid production', ['2.B.2']],
-            ['3. Adipic acid production', ['2.B.3']],
-            ['4. Caprolactam, glyoxal and glyoxylic acid production', ['2.B.4']],
-            ['5. Carbide production', ['2.B.5']],
-            ['6. Titanium dioxide production', ['2.B.6']],
-            ['7. Soda ash production', ['2.B.7']],
-            ['8. Petrochemical and carbon black production', ['2.B.8']],
-            ['9. Fluorochemical production', ['2.B.9']],
-            ['10. Other (as specified in table 2(I).A-H)', ['2.B.10']],
-            ['C. Metal industry', ['2.C']],
-            ['1. Iron and steel production', ['2.C.1']],
-            ['2. Ferroalloys production', ['2.C.2']],
-            ['3. Aluminium production', ['2.C.3']],
-            ['4. Magnesium production', ['2.C.4']],
-            ['5. Lead production', ['2.C.5']],
-            ['6. Zinc production', ['2.C.6']],
-            ['7. Other (as specified in table 2(I).A-H)', ['2.C.7']],
-        ],
-        "entity_mapping": {
-            'HFCs(1)': 'HFCS (AR4GWP100)',
-            'PFCs(1)': 'PFCS (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table2(I)s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 29,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["industry"],
-        },
-        "sector_mapping": [
-            ['D. Non-energy products from fuels and solvent use', ['2.D']],
-            ['1. Lubricant use', ['2.D.1']],
-            ['2. Paraffin wax use', ['2.D.2']],
-            ['3. Other', ['2.D.3']],
-            ['E. Electronics industry', ['2.E']],
-            ['1. Integrated circuit or semiconductor', ['2.E.1']],
-            ['2. TFT flat panel display', ['2.E.2']],
-            ['3. Photovoltaics', ['2.E.3']],
-            ['4. Heat transfer fluid', ['2.E.4']],
-            ['5. Other (as specified in table 2(II))', ['2.E.5']],
-            ['F. Product uses as substitutes for ODS(2)', ['2.F']],
-            ['1. Refrigeration and air conditioning', ['2.F.1']],
-            ['2. Foam blowing agents', ['2.F.2']],
-            ['3. Fire protection', ['2.F.3']],
-            ['4. Aerosols', ['2.F.4']],
-            ['5. Solvents', ['2.F.5']],
-            ['6. Other applications', ['2.F.6']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['1. Electrical equipment', ['2.G.1']],
-            ['2. SF6 and PFCs from other product use', ['2.G.2']],
-            ['3. N2O from product uses', ['2.G.3']],
-            ['4. Other', ['2.G.4']],
-            ['H. Other (as specified in tables 2(I).A-H and 2(II))(3)', ['2.H']],
-        ],
-        "entity_mapping": {
-            'HFCs(1)': 'HFCS (AR4GWP100)',
-            'PFCs(1)': 'PFCS (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table2(I).A-Hs1": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 40,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(I).A-Hs2": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 36,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(II)": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 38,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["fgases"],
-        },
-        "sector_mapping": [
-            ['Total actual emissions of halocarbons (by chemical) and SF6', ['2']],
-            ['B. Chemical industry', ['2.B']],
-            ['9. Flurochemical production', ['2.B.9']],
-            ['By-product emissions', ['2.B.9.a']],
-            ['Fugitive emissions', ['2.B.9.b']],
-            ['10. Other', ['2.B.10']],
-            ['C. Metal industry', ['2.C']],
-            ['3. Aluminium production', ['2.C.3']],
-            ['4. Magnesium production', ['2.C.4']],
-            ['7. Other', ['2.C.7']],
-            ['E. Electronics industry', ['2.E']],
-            ['1. Integrated circuit or semiconductor', ['2.E.1']],
-            ['2. TFT flat panel display', ['2.E.2']],
-            ['3. Photovoltaics', ['2.E.3']],
-            ['4. Heat transfer fluid', ['2.E.4']],
-            ['5. Other (as specified in table 2(II))', ['2.E.5']],
-            ['F. Product uses as substitutes for ODS(2)', ['2.F']],
-            ['1. Refrigeration and air conditioning', ['2.F.1']],
-            ['2. Foam blowing agents', ['2.F.2']],
-            ['3. Fire protection', ['2.F.3']],
-            ['4. Aerosols', ['2.F.4']],
-            ['5. Solvents', ['2.F.5']],
-            ['6. Other applications', ['2.F.6']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['1. Electrical equipment', ['2.G.1']],
-            ['2. SF6 and PFCs from other product use', ['2.G.2']],
-            ['4. Other', ['2.G.4']],
-            ['H. Other (please specify)', ['2.H']],
-            ['2.H.1 Pulp and paper', ['2.H.1']],
-            ['2.H.2 Food and beverages industry', ['2.H.2']],
-            ['2.H.3 Other (please specify)', ['2.H.3']],
-        ],
-        "entity_mapping": {
-            'C 3F8': 'C3F8',
-            #'C10F18' 'C2F6' 'C4F10' 'C5F12' 'C6F14' 'CF4'
-            'HFC-125': 'HFC125',
-            'HFC-134': 'HFC134',
-            'HFC-134a': 'HFC134a',
-            'HFC-143': 'HFC143',
-            'HFC-143a': 'HFC143a',
-            'HFC-152': 'HFC152',
-            'HFC-152a': 'HFC152a',
-            'HFC-161': 'HFC161',
-            'HFC-227ea': 'HFC227ea',
-            'HFC-23': 'HFC23',
-            'HFC-236cb': 'HFC236cb',
-            'HFC-236ea': 'HFC236ea',
-            'HFC-236fa': 'HFC236fa',
-            'HFC-245ca': 'HFC245ca',
-            'HFC-245fa': 'HFC245fa',
-            'HFC-32': 'HFC32',
-            'HFC-365mfc': 'HFC365mfc',
-            'HFC-41': 'HFC41',
-            'HFC-43-10mee': 'HFC4310mee',
-            'Unspecified mix of HFCs (1)': 'UnspMixOfHFCs (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-            'Unspecified mix of PFCs (1)': 'UnspMixOfPFCs (AR4GWP100)',
-            'c-C3F6': 'cC3F6',
-            'c-C4F8': 'cC4F8',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3s1": {  # Agriculture summary sheet 1
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 75,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['3. Total agriculture', ['3'], 0],
-            # I. Livestock
-            ['I. Livestock', ['M.3.LV'], 1],
-            # A. Enteric fermentation
-            ['A. Enteric fermentation', ['3.A'], 2],
-            ['1. Cattle(1)', ['3.A.1'], 3],
-            ['Option A:', ['\IGNORE'], 4],
-            ['Dairy cattle', ['3.A.1.Aa'], 5],
-            ['Non-dairy cattle', ['3.A.1.Ab'], 5],
-            ['Option B:', ['\IGNORE'], 4],
-            ['Mature dairy cattle', ['3.A.1.Ba'], 5],
-            ['Other mature cattle', ['3.A.1.Bb'], 5],
-            ['Growing cattle', ['3.A.1.Bc'], 5],
-            ['Option C (country-specific):', ['\IGNORE'], 4],
-            # all countries not specified explcitly
-            ['\C!-AUS-MLT-LUX-POL-SVN-USA\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            # Australia
-            ['\C-AUS\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-AUS\ Dairy Cattle', ['3.A.1.C-AUS-a'], 6],
-            ['\C-AUS\ Beef Cattle - Pasture', ['3.A.1.C-AUS-b'], 6],
-            ['\C-AUS\ Beef Cattle - Feedlot', ['3.A.1.C-AUS-c'], 6],
-            # malta
-            ['\C-MLT\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-MLT\ dairy cows', ['3.A.1.C-MLT-a'], 6],
-            ['\C-MLT\ non-lactating cows', ['3.A.1.C-MLT-b'], 6],
-            ['\C-MLT\ bulls', ['3.A.1.C-MLT-c'], 6],
-            ['\C-MLT\ calves', ['3.A.1.C-MLT-d'], 6],
-            ['\C-MLT\ growing cattle 1-2 years', ['3.A.1.C-MLT-e'], 6],
-            # Luxembourg
-            ['\C-LUX\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-LUX\ Bulls', ['3.A.1.C-LUX-a'], 6],
-            ['\C-LUX\ Calves', ['3.A.1.C-LUX-b'], 6],
-            ['\C-LUX\ Young Cattle', ['3.A.1.C-LUX-c'], 6],
-            ['\C-LUX\ Suckler Cows', ['3.A.1.C-LUX-d'], 6],
-            ['\C-LUX\ Bulls under 2 years', ['3.A.1.C-LUX-e'], 6],
-            ['\C-LUX\ Dairy Cows', ['3.A.1.C-LUX-f'], 6],
-            # Poland
-            ['\C-POL\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-POL\ Bulls (older than 2 years)', ['3.A.1.C-POL-a'], 6],
-            ['\C-POL\ Non-dairy Heifers (older than 2 years)', ['3.A.1.C-POL-b'], 6],
-            ['\C-POL\ Non-dairy Young Cattle (younger than 1 year)', ['3.A.1.C-POL-c'], 6],
-            ['\C-POL\ Dairy Cattle', ['3.A.1.C-POL-d'], 6],
-            ['\C-POL\ Non-dairy Young Cattle (1-2 years)', ['3.A.1.C-POL-e'], 6],
-            # Slovenia
-            ['\C-SVN\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-SVN\ Dairy cows', ['3.A.1.C-SVN-a'], 6],
-            ['\C-SVN\ Non-dairy cattle', ['3.A.1.C-SVN-b'], 6],
-            ['\C-SVN\ Other cows', ['3.A.1.C-SVN-c'], 6],
-            # USA
-            ['\C-USA\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-USA\ Steer Stocker', ['3.A.1.C-USA-a'], 6],
-            ['\C-USA\ Heifer Stocker', ['3.A.1.C-USA-b'], 6],
-            ['\C-USA\ Beef Cows', ['3.A.1.C-USA-c'], 6],
-            ['\C-USA\ Dairy Replacements', ['3.A.1.C-USA-d'], 6],
-            ['\C-USA\ Beef Replacements', ['3.A.1.C-USA-e'], 6],
-            ['\C-USA\ Steer Feedlot', ['3.A.1.C-USA-f'], 6],
-            ['\C-USA\ Heifer Feedlot', ['3.A.1.C-USA-g'], 6],
-            ['\C-USA\ Bulls', ['3.A.1.C-USA-h'], 6],
-            ['\C-USA\ Dairy Cows', ['3.A.1.C-USA-i'], 6],
-            ['\C-USA\ Beef Calves', ['3.A.1.C-USA-j'], 6],
-            ['\C-USA\ Dairy Calves', ['3.A.1.C-USA-k'], 6],
-            # Other livestock
-            ['2. Sheep', ['3.A.2'], 3],
-            ['3. Swine', ['3.A.3'], 3],
-            ['4. Other livestock', ['3.A.4'], 3],
-            ['Buffalo', ['3.A.4.a'], 4],
-            ['Camels', ['3.A.4.b'], 4],
-            ['Deer', ['3.A.4.c'], 4],
-            ['Goats', ['3.A.4.d'], 4],
-            ['Horses', ['3.A.4.e'], 4],
-            ['Mules and Asses', ['3.A.4.f'], 4],
-            ['Poultry', ['3.A.4.g'], 4],
-            ['Other (please specify)', ['3.A.4.h'], 4],
-            ['Rabbit', ['3.A.4.h.i'], 5],
-            ['Reindeer', ['3.A.4.h.ii'], 5],
-            ['Ostrich', ['3.A.4.h.iii'], 5],
-            ['Fur-bearing Animals', ['3.A.4.h.iv'], 5],
-            ['Other', ['3.A.4.h.v'], 5],
-            # Manure Management
-            ['B. Manure management', ['3.B'], 2],
-            ['1. Cattle(1)', ['3.B.1'], 3],
-            ['Option A:', ['\IGNORE'], 4],
-            ['Dairy cattle', ['3.B.1.Aa'], 5],
-            ['Non-dairy cattle', ['3.B.1.Ab'], 5],
-            ['Option B:', ['\IGNORE'], 4],
-            ['Mature dairy cattle', ['3.B.1.Ba'], 5],
-            ['Other mature cattle', ['3.B.1.Bb'], 5],
-            ['Growing cattle', ['3.B.1.Bc'], 5],
-            ['Option C (country-specific):', ['\IGNORE'], 4],
-            # all countries not specified explicitly
-            ['\C!-AUS-MLT-LUX-POL-SVN-USA\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            # Australia
-            ['\C-AUS\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-AUS\ Dairy Cattle', ['3.B.1.C-AUS-a'], 6],
-            ['\C-AUS\ Beef Cattle - Pasture', ['3.B.1.C-AUS-b'], 6],
-            ['\C-AUS\ Beef Cattle - Feedlot', ['3.B.1.C-AUS-c'], 6],
-            # Malta
-            ['\C-MLT\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-MLT\ dairy cows', ['3.B.1.C-MLT-a'], 6],
-            ['\C-MLT\ non-lactating cows', ['3.B.1.C-MLT-b'], 6],
-            ['\C-MLT\ bulls', ['3.B.1.C-MLT-c'], 6],
-            ['\C-MLT\ calves', ['3.B.1.C-MLT-d'], 6],
-            ['\C-MLT\ growing cattle 1-2 years', ['3.B.1.C-MLT-e'], 6],
-            # Luxembourg
-            ['\C-LUX\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-LUX\ Bulls', ['3.B.1.C-LUX-a'], 6],
-            ['\C-LUX\ Calves', ['3.B.1.C-LUX-b'], 6],
-            ['\C-LUX\ Young Cattle', ['3.B.1.C-LUX-c'], 6],
-            ['\C-LUX\ Suckler Cows', ['3.B.1.C-LUX-d'], 6],
-            ['\C-LUX\ Bulls under 2 years', ['3.B.1.C-LUX-e'], 6],
-            ['\C-LUX\ Dairy Cows', ['3.B.1.C-LUX-f'], 6],
-            # Poland
-            ['\C-POL\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-POL\ Non-dairy Cattle', ['3.B.1.C-POL-a'], 6],
-            ['\C-POL\ Dairy Cattle', ['3.B.1.C-POL-b'], 6],
-            # Slovenia
-            ['\C-SVN\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-SVN\ Dairy cows', ['3.B.1.C-SVN-a'], 6],
-            ['\C-SVN\ Non-dairy cattle', ['3.B.1.C-SVN-b'], 6],
-            ['\C-SVN\ Other cows', ['3.B.1.C-SVN-c'], 6],
-            # USA
-            ['\C-USA\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-USA\ Dairy Cattle', ['\IGNORE'], 6],
-            ['\C-USA\ Non-Dairy Cattle', ['\IGNORE'], 6],
-            ['\C-USA\ Steer Stocker', ['3.B.1.C-USA-a'], 6],
-            ['\C-USA\ Heifer Stocker', ['3.B.1.C-USA-b'], 6],
-            ['\C-USA\ Beef Cows', ['3.B.1.C-USA-c'], 6],
-            ['\C-USA\ Dairy Replacements', ['3.B.1.C-USA-d'], 6],
-            ['\C-USA\ Beef Replacements', ['3.B.1.C-USA-e'], 6],
-            ['\C-USA\ Steer Feedlot', ['3.B.1.C-USA-f'], 6],
-            ['\C-USA\ Heifer Feedlot', ['3.B.1.C-USA-g'], 6],
-            ['\C-USA\ Bulls', ['3.B.1.C-USA-h'], 6],
-            ['\C-USA\ Dairy Cows', ['3.B.1.C-USA-i'], 6],
-            ['\C-USA\ Beef Calves', ['3.B.1.C-USA-j'], 6],
-            ['\C-USA\ Dairy Calves', ['3.B.1.C-USA-k'], 6],
-            # other animals
-            ['2. Sheep', ['3.B.2'], 3],
-            ['3. Swine', ['3.B.3'], 3],
-            ['4. Other livestock', ['3.B.4'], 3],
-            ['Buffalo', ['3.B.4.a'], 4],
-            ['Camels', ['3.B.4.b'], 4],
-            ['Deer', ['3.B.4.c'], 4],
-            ['Goats', ['3.B.4.d'], 4],
-            ['Horses', ['3.B.4.e'], 4],
-            ['Mules and Asses', ['3.B.4.f'], 4],
-            ['Poultry', ['3.B.4.g'], 4],
-            ['Other (please specify)', ['3.B.4.h'], 4],
-            ['Rabbit', ['3.B.4.h.i'], 5],
-            ['Reindeer', ['3.B.4.h.ii'], 5],
-            ['Ostrich', ['3.B.4.h.iii'], 5],
-            ['Fur-bearing Animals', ['3.B.4.h.iv'], 5],
-            ['Other', ['3.B.4.h.v'], 5],
-            ['5. Indirect N2O emissions', ['3.B.5'], 3],
-        ],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3s2": {  # Agriculture summary sheet 2
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 18,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['C. Rice cultivation', ['3.C']],
-            ['D. Agricultural soils(2) (3) (4)', ['3.D']],
-            ['E. Prescribed burning of savannahs', ['3.E']],
-            ['E. Prescribed burning of savannas', ['3.E']],
-            ['F. Field burning of agricultural residues', ['3.F']],
-            ['G. Liming', ['3.G']],
-            ['H. Urea application', ['3.H']],
-            ['I. Other carbon-containing fertilizers', ['3.I']],
-            ['J. Other (please specify)', ['3.J']],
-            ['NOx from Manure Management', ['3.J.1']],
-            ['3.B NOx Emissions', ['3.J.1']],
-            ['NOx from 3B', ['3.J.1']],
-            ['NOX emissions from manure management', ['3.J.1']],
-            ['NOx from manure management', ['3.J.1']],
-            ['Other', ['3.J.2']],
-            ['Other UK emissions', ['3.J.2']],
-            ['Other non-specified', ['3.J.2']],
-            ['OTs and CDs - Livestock', ['3.J.3']],
-            ['OTs and CDs - soils', ['3.J.4']],
-            ['OTs and CDs - other', ['3.J.5']],
-            ['Digestate renewable raw material (storage of N)', ['3.J.6']],
-            ['Digestate renewable raw material (atmospheric deposition)', ['3.J.7']],
-            ['Digestate renewable raw material (storage of dry matter)', ['3.J.8']],
-            ['NOx from Livestock', ['3.J.9']],
-        ],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.C": {  # rice cultivation details
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 21,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Harvested area(2)',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Organic amendments added(3)',
-                'IMPLIED EMISSION FACTOR (1) CH4',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Irrigated', ['3.C.1']],
-            ['Continuously flooded', ['3.C.1.a']],
-            ['Intermittently flooded Single aeration', ['3.C.1.a.i']],
-            ['Intermittently flooded Multiple aeration', ['3.C.1.b.ii']],
-            ['2. Rainfed', ['3.C.2']],
-            ['Flood prone', ['3.C.2.a']],
-            ['Drought prone', ['3.C.2.b']],
-            ['3. Deep water', ['3.C.3']],
-            ['Water depth 50–100 cm', ['3.C.3.a']],
-            ['Water depth > 100 cm', ['3.C.3.b']],
-            ['4. Other (please specify)', ['3.C.4']],
-            ['Non-specified', ['3.C.4.a']],  # EST
-            ['Other', ['3.C.4.a']],  # DEU
-            ['other', ['3.C.4.a']],  # LVA
-            ['Other cultivation', ['3.C.4.a']],  # CZE
-            ['Upland rice(4)', ['\IGNORE']],
-            ['Total(4)', ['\IGNORE']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': 'CH4',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.D": {  # direct and indirect N2O from soils
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 21,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                "ACTIVITY DATA AND OTHER RELATED INFORMATION Description",
-                "ACTIVITY DATA AND OTHER RELATED INFORMATION Value",
-                "IMPLIED EMISSION FACTORS Value",
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['a. Direct N2O emissions from managed soils', ['3.D.a']],
-            ['1. Inorganic N fertilizers(3)', ['3.D.a.1']],
-            ['2. Organic N fertilizers(3)', ['3.D.a.2']],
-            ['a. Animal manure applied to soils', ['3.D.a.2.a']],
-            ['b. Sewage sludge applied to soils', ['3.D.a.2.b']],
-            ['c. Other organic fertilizers applied to soils', ['3.D.a.2.c']],
-            ['3. Urine and dung deposited by grazing animals', ['3.D.a.3']],
-            ['4. Crop residues', ['3.D.a.4']],
-            ['5. Mineralization/immobilization associated with loss/gain of soil organic matter (4)(5)', ['3.D.a.5']],
-            ['6. Cultivation of organic soils (i.e. histosols)(2)', ['3.D.a.6']],
-            ['7. Other', ['3.D.a.7']],
-            ['b. Indirect N2O Emissions from managed soils', ['3.D.b']],
-            ['1. Atmospheric deposition(6)', ['3.D.b.1']],
-            ['2. Nitrogen leaching and run-off', ['3.D.b.2']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.E": {  # savanna burning details
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 14,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Area of savanna burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Average above-ground biomass density',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Biomass burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Fraction of savanna burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Nitrogen fraction in biomass',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-            ],
-            "stop_cats": ["", ".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Forest land (specify ecological zone)(1)', ['3.E.1'], 0],
-            ['Savanna Grassland', ['3.E.1.b'], 1],  # AUS
-            ['Savanna Woodland', ['3.E.1.a'], 1],  # AUS
-            ['Forest land', ['3.E.1.a'], 1],  # SWE, CHE, CZE, HRV
-            ['Luxembourg', ['3.E.1.c'], 1],  # LUX
-            ['Other non-specified', ['3.E.1.d'], 1],  # EST
-            ['All', ['3.E.1.d'], 1],  # DNK, DNM, DKE
-            ['Unspecified', ['3.E.1.d'], 1],  # DEU
-            ['forest land', ['3.E.1.a'], 1],  # MLT
-            ['Zone', ['3.E.1.d'], 1],  # LVA
-            ['Grassland (specify ecological zone)(1)', ['3.E.2'], 0],
-            ['Savanna Woodland', ['3.E.2.a'], 1],  # AUS
-            ['Savanna Grassland', ['3.E.2.b'], 1],  # AUS
-            ['Temperate Grassland', ['3.E.2.c'], 1],  # AUS
-            ['Grassland', ['3.E.2.d'], 1],  # SWE, CHE, CZE, HRV
-            ['Luxembourg', ['3.E.2.e'], 1],  # LUX
-            ['Other non-specified', ['3.E.2.f'], 1],  # EST
-            ['All', ['3.E.2.f'], 1],  # DNK, DNM, DKE
-            ['Unspecified', ['3.E.2.f'], 1],  # DEU
-            ['Tussock', ['3.E.2.g'], 1],  # NZL
-            ['grassland', ['3.E.2.d'], 1],  # MLT
-            ['Zone_', ['3.E.2.f'], 1],  # LVA
-        ],
-        "entity_mapping": {
-            'EMISSIONS (2) CH4': 'CH4',
-            'EMISSIONS (2) N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.F": {  # field burning details
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 30,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table3.G-I": {  # liming, urea, carbon containing fertilizer
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 13,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table4": {  # LULUCF overview
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 29,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", ".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['4. Total LULUCF', ['4']],
-            ['A. Forest land', ['4.A']],
-            ['1. Forest land remaining forest land', ['4.A.1']],
-            ['2. Land converted to forest land', ['4.A.2']],
-            ['B. Cropland', ['4.B']],
-            ['1. Cropland remaining cropland', ['4.B.1']],
-            ['2. Land converted to cropland', ['4.B.2']],
-            ['C. Grassland', ['4.C']],
-            ['1. Grassland remaining grassland', ['4.C.1']],
-            ['2. Land converted to grassland', ['4.C.2']],
-            ['D. Wetlands(3)', ['4.D']],
-            ['1. Wetlands remaining wetlands', ['4.D.1']],
-            ['2. Land converted to wetlands', ['4.D.2']],
-            ['E. Settlements', ['4.E']],
-            ['1. Settlements remaining settlements', ['4.E.1']],
-            ['2. Land converted to settlements', ['4.E.2']],
-            ['F. Other land (4)', ['4.F']],
-            ['1. Other land remaining other land', ['4.F.1']],
-            ['2. Land converted to other land', ['4.F.2']],
-            ['G. Harvested wood products (5)', ['4.G']],
-            ['H. Other (please specify)', ['4.H']],
-            ['Land converted to Settlement', ['4.H.1']],
-            ['Reservoir of Petit-Saut in French Guiana', ['4.H.5']],
-            ['Biogenic NMVOCs from managed forest', ['4.H.4']],
-            ['All other', ['4.H.9']],
-            ['Luxembourg', ['4.H.8']],
-            ['Settlements Remaining Settlements', ['4.H.2']],
-            ['4.E Settlements', ['4.H.2']],
-            ['4.C Grassland', ['4.H.3']],
-            ['Settlements', ['4.H.2']],
-            ['Other', ['4.H.9']],
-            ['N2O Emissions from Aquaculture Use', ['4.H.6']],
-            ['CH4 from artificial water bodies', ['4.H.7']],
-        ],
-        "entity_mapping": {
-            'CH4(2)': 'CH4',
-            'N2O(2)': 'N2O',
-            'Net CO2 emissions/removals(1), (2)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    # TODO: all other LULUCF tables
-    "Table5": {  # Waste overview
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 27,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Total waste', ['5']],
-            ['A. Solid waste disposal', ['5.A']],
-            ['1. Managed waste disposal sites', ['5.A.1']],
-            ['2. Unmanaged waste disposal sites', ['5.A.2']],
-            ['3. Uncategorized waste disposal sites', ['5.A.3']],
-            ['B. Biological treatment of solid waste', ['5.B']],
-            ['1. Composting', ['5.B.1']],
-            ['2. Anaerobic digestion at biogas facilities', ['5.B.2']],
-            ['C. Incineration and open burning of waste', ['5.C']],
-            ['1. Waste incineration', ['5.C.1']],
-            ['2. Open burning of waste', ['5.C.2']],
-            ['D. Wastewater treatment and discharge', ['5.D']],
-            ['1. Domestic wastewater', ['5.D.1']],
-            ['2. Industrial wastewater', ['5.D.2']],
-            ['3. Other (as specified in table 5.D)', ['5.D.3']],
-            ['E. Other (please specify)', ['5.E']],
-            ['Other', ['5.E.5']],  # EST, NOR
-            ['Recycling activities', ['5.E.1']],  # NLD
-            ['Mechanical-Biological Treatment MBT', ['5.E.2']],  # DEU
-            ['Accidental fires', ['5.E.3']],  # DEU, DKE, DNK, DNM
-            ['Decomposition of Petroleum-Derived Surfactants', ['5.E.4']],  # JPN
-            ['Other non-specified', ['5.E.5']],  # USA
-            ['Biogas burning without energy recovery', ['5.E.6']],  # PRT
-            ['Sludge spreading', ['5.E.7']],  # ESP
-            ['Accidental combustion', ['5.E.3']],  # ESP
-            ['Other waste', ['5.E.5']],  # CZE
-            ['5.E.1 Industrial Wastewater', ['5.E.8']],  # CAN, new in 2022
-            ['Accidental Fires at SWDS', ['5.E.9']],  # AUS, new in 2022
-            ['Memo item:(2)', ['\IGNORE']],
-            ['Long-term storage of C in waste disposal sites', ['M.Memo.LTSW']],
-            ['Annual change in total long-term C storage', ['M.Memo.ACLT']],
-            ['Annual change in total long-term C storage in HWP waste(3)', ['M.Memo.ACLTHWP']],
-        ],
-        "entity_mapping": {
-            'CO2(1)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested; memo items not read because of empty lines
-    "Table5.A": {  # solid waste disposal
-        "status": "tested",
-        "table": {
-            "firstrow": 6,
-            "lastrow": 15,
-            "header": ['group', 'group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES Annual waste at the SWDS',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES MCF',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES DOCf',
-                'IMPLIED EMISSION FACTOR SINK CATEGORIES CH4(1)',
-                'IMPLIED EMISSION FACTOR SINK CATEGORIES CO2',
-                'EMISSIONS SINK CATEGORIES CH4 Amount of CH4 flared',
-                'EMISSIONS SINK CATEGORIES CH4 Amount of CH4 for energy recovery(3)',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Managed waste disposal sites', ['5.A.1']],
-            ['a. Anaerobic', ['5.A.1.a']],
-            ['b. Semi-aerobic', ['5.A.1.b']],
-            ['2. Unmanaged waste disposal sites', ['5.A.2']],
-            ['3. Uncategorized waste disposal sites', ['5.A.3']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS SINK CATEGORIES CH4 Emissions(2)': 'CH4',
-            'EMISSIONS SINK CATEGORIES CO2(4) Amount of CH4 for energy recovery(3)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.B": {  # Biological treatment of solid waste
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 16,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Annual waste amount treated',
-                'IMPLIED EMISSION FACTOR CH4(1)',
-                'IMPLIED EMISSION FACTOR N2O',
-                'EMISSIONS CH4 Amount of CH4 flared',
-                'EMISSIONS CH4 Amount of CH4 for energy recovery(3)',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Composting', ['5.B.1'], 0],
-            ['Municipal solid waste', ['5.B.1.a'], 1],
-            ['Other (please specify)(4)', ['5.B.1.b'], 1],
-            ['Organic wastes households', ['5.B.1.b.i'], 2],  # NLD
-            ['Organic wastes from gardens and horticulture', ['5.B.1.b.ii'], 2],  # NLD
-            ['Food and garden waste', ['5.B.1.b.ii'], 2],  # DNM, DNK, DKE
-            ['Industrial Solid Waste', ['5.B.1.b.iii'], 2],  # POL
-            ['Home composting', ['5.B.1.b.iv'], 2],  # NOR
-            ['Mixed waste', ['5.B.1.b.v'], 2],  # LTU
-            ['Other waste', ['5.B.1.b.v'], 2],  # SWE
-            ['Sludge', ['5.B.1.b.vi'], 2],  # HUN, EST
-            ['Textile', ['5.B.1.b.vii'], 2],  # EST
-            ['Wood', ['5.B.1.b.viii'], 2],  # EST
-            ['Organic', ['5.B.1.b.ix'], 2],  # EST
-            ['Paper', ['5.B.1.b.x'], 2],  # EST
-            ['Other_SW', ['5.B.1.b.v'], 2],  # CZE
-            ['MBA treated MSW', ['5.B.1.b.xi'], 2],  # LUX
-            ['Specific Agricultural and Industrial Waste', ['5.B.1.b.xii'], 2],  # UKR
-            ['Industrial solid waste and constr. waste', ['5.B.1.b.xiii'], 2],  # FIN
-            ['Municipal sludge', ['5.B.1.b.xiv'], 2],  # FIN
-            ['Industrial sludge', ['5.B.1.b.xv'], 2],  # FIN
-            ['Open air composting', ['5.B.1.b.xvi'], 2],  # LIE
-            ['Industrial Waste', ['5.B.1.b.xvii'], 2],  # JPN
-            ['Human Waste and Johkasou sludge', ['5.B.1.b.xviii'], 2],  # JPN
-            ['2. Anaerobic digestion at biogas facilities(3)', ['5.B.2'], 0],
-            ['Municipal solid waste', ['5.B.2.a'], 1],
-            ['Other (please specify)(4)', ['5.B.2.b'], 1],
-            ['Organic wastes households', ['5.B.2.b.i'], 2],  # NLD
-            ['Organic wastes from gardens and horticulture', ['5.B.2.b.ii'], 2],  # NLD
-            ['Animal manure and other organic waste', ['5.B.2.b.iii'], 2],  # DNM, DNK, DKE
-            ['sewage sludge', ['5.B.2.b.iv'], 2],  # LTU
-            ['Other waste', ['5.B.2.b.v'], 2],  # SWE
-            ['Agricultural biogas facilities', ['5.B.2.b.vi'], 2],  # CHE
-            ['Other biogases from anaerobic fermentation', ['5.B.2.b.vii'], 2],  # HUN
-            ['Sludge', ['5.B.2.b.iv'], 2],  # EST
-            ['Anaerobic Digestion On-Farm and at Wastewater Treatment Facilities', ['5.B.2.b.viii'], 2],  # USA
-            ['Other_AD', ['5.B.2.b.v'], 2],  # CZE
-            ['Biogenic waste incl. wastes from Agriculture (manure)', ['5.B.2.b.ix'], 2],  # LUX
-            ['Industrial solid waste and constr. waste', ['5.B.2.b.x'], 2],  # FIN
-            ['Municipal sludge', ['5.B.2.b.xi'], 2],  # FIN
-            ['Industrial sludge', ['5.B.2.b.xii'], 2],  # FIN
-            ['Livestock manure co-digested', ['5.B.2.b.xiii'], 2],  # DEU, new in 2022
-            ['Waste water', ['5.B.2.b.xiv'], 2],  # NOR, new in 2022
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(2)': 'CH4',
-            'EMISSIONS N2O Amount of CH4 for energy recovery(3)': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.C": {  # Waste incineration and open burning
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 38,
-            "header": ['group', 'group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA Amount of wastes (incinerated/open burned)',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) CO2',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) CH4',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) N2O',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Waste Incineration', ['5.C.1'], 0],
-            ['Biogenic (1)', ['5.C.1.a'], 1],
-            ['Municipal solid waste', ['5.C.1.a.i'], 2],
-            ['Other (please specify)(2)', ['5.C.1.a.ii'], 2],
-            ['Industrial Solid Wastes', ['5.C.1.a.ii.1'], 3],
-            ['Hazardous Waste', ['5.C.1.a.ii.2'], 3],
-            ['Clinical Waste', ['5.C.1.a.ii.3'], 3],
-            ['Sewage Sludge', ['5.C.1.a.ii.4'], 3],
-            ['Other (please specify)', ['5.C.1.a.ii.5'], 3],
-            ['Animal cremations', ['5.C.1.a.ii.5.a'], 4],  # DKE, DNK, DNM
-            ['Human cremations', ['5.C.1.a.ii.5.b'], 4],  # DKE, DNK, DNM
-            ['Cremation', ['5.C.1.a.ii.5.c'], 4],  # CHE, NOR, FRA, FRK
-            ['cremation', ['5.C.1.a.ii.5.c'], 4],  # DEU
-            ['Industrial waste', ['5.C.1.a.ii.5.d'], 4],  # NOR
-            ['Biogenic other waste', ['5.C.1.a.ii.5.e'], 4],  # EST
-            ['Biogenic waste other than Municipal Solid Waste', ['5.C.1.a.ii.5.e'], 4],  # ROU
-            ['Sludge', ['5.C.1.a.ii.5.f'], 4],  # JPN
-            ['Non-fossile liquid waste', ['5.C.1.a.ii.5.g'], 4],  # JPN
-            ['Non-biogenic', ['5.C.1.b'], 1],
-            ['Municipal solid waste', ['5.C.1.b.i'], 2],
-            ['Other (please specify)(3)', ['5.C.1.b.ii'], 2],
-            ['Industrial Solid Wastes', ['5.C.1.b.ii.1'], 3],
-            ['Hazardous Waste', ['5.C.1.b.ii.2'], 3],
-            ['Clinical Waste', ['5.C.1.b.ii.3'], 3],
-            ['Sewage Sludge', ['5.C.1.b.ii.4'], 3],
-            ['Fossil liquid waste', ['5.C.1.b.ii.5'], 3],
-            ['Other (please specify)', ['5.C.1.b.ii.6'], 3],
-            ['Quarantine and other waste', ['5.C.1.b.ii.6.a'], 4],  # NZL
-            ['Industrial waste', ['5.C.1.b.ii.6.b'], 4],  # CHE
-            ['Chemical waste', ['5.C.1.b.ii.6.c'], 4],  # GBR, GBK
-            ['Flaring in the chemical industry', ['5.C.1.a.ii.6.d'], 4],  # BEL
-            ['Sludge', ['5.C.1.a.ii.6.e'], 4],  # JPN
-            ['Solvents', ['5.C.1.a.ii.6.f'], 4],  # GRC, AUS
-            ['2. Open burning of waste', ['5.C.2'], 0],
-            ['Biogenic (1)', ['5.C.2.a'], 1],
-            ['Municipal solid waste', ['5.C.2.a.i'], 2],
-            ['Other (please specify)', ['5.C.2.a.ii'], 2],
-            ['agricultural waste', ['5.C.2.a.ii.1'], 3],  # ITA
-            ['Agricultural residues', ['5.C.2.a.ii.1'], 3],  # ESP
-            ['Natural residues', ['5.C.2.a.ii.2'], 3],  # CHE
-            ['Wood waste', ['5.C.2.a.ii.3'], 3],  # GBR, GBK
-            ['Bonfires etc.', ['5.C.2.a.ii.4'], 3],  # DEU
-            ['Bonfires', ['5.C.2.a.ii.4'], 3],  # NLD, ISL
-            ['Other', ['5.C.2.a.ii.5'], 3],  # EST
-            ['Other waste', ['5.C.2.a.ii.5'], 3],  # CZE
-            ['Industrial Solid Waste', ['5.C.2.a.ii.6'], 3],  # JPN
-            ['Non-biogenic', ['5.C.2.b'], 1],
-            ['Municipal solid waste', ['5.C.2.b.i'], 2],
-            ['Other (please specify)', ['5.C.2.b.ii'], 2],
-            ['Rural waste', ['5.C.2.b.ii.1'], 3],  # NZL
-            ['Accidental fires (vehicles)', ['5.C.2.b.ii.2'], 3],  # GBR, GBK
-            ['Accidental fires (buildings)', ['5.C.2.b.ii.3'], 3],  # GBR, GBK
-            ['Bonfires', ['5.C.2.b.ii.4'], 3],  # ISL
-            ['Other', ['5.C.2.b.ii.5'], 3],  # EST
-            ['Other waste', ['5.C.2.b.ii.5'], 3],  # CZE
-            ['Industrial Solid Waste', ['5.C.2.b.ii.6'], 3],  # JPN
-        ],
-        "entity_mapping": {
-            'EMISSIONS Amount of wastes (incinerated/open burned) CH4': 'CH4',
-            'EMISSIONS Amount of wastes (incinerated/open burned) CO2': 'CO2',
-            'EMISSIONS Amount of wastes (incinerated/open burned) N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.D": {  # Waste incineration and open burning
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 13,
-            "header": ['group', 'entity', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND RELATED INFORMATION Total organic product',
-                'ACTIVITY DATA AND RELATED INFORMATION Sludge removed(1)',
-                'ACTIVITY DATA AND RELATED INFORMATION Sludge removed(1) N in effluent',
-                'IMPLIED EMISSION FACTOR CH4(2) N in effluent',
-                'IMPLIED EMISSION FACTOR N2O(3) N in effluent',
-                'EMISSIONS CH4 Amount of CH4 flared',
-                'EMISSIONS CH4 Amount of CH4 for Energy Recovery(5)',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Domestic wastewater', ['5.D.1']],
-            ['2. Industrial wastewater', ['5.D.2']],
-            ['3. Other (please specify)', ['5.D.3']],
-            ['Other', ['5.D.3.a']],  # EST
-            ['Septic tanks', ['5.D.3.b']],  # NLD
-            ['Wastewater Effluent', ['5.D.3.c']],  # NLD
-            ['Fish farming', ['5.D.3.d']],  # FIN
-            ['Uncategorized wastewater', ['5.D.3.a']],  # CZE
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(4)': 'CH4',
-            'EMISSIONS N2O(3) Amount of CH4 for Energy Recovery(5)': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Summary1.As1": {  # Summary 1, sheet 1
-        "status": "tested",
-         "table": {
-            "firstrow": 5,
-            "lastrow": 28,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["summary"],
-        },
-        "sector_mapping": [
-            ['Total national emissions and removals', ['0']],
-            ['1. Energy', ['1']],
-            ['A. Fuel combustion Reference approach(2)', ['1.A-ref']],
-            ['Sectoral approach(2)', ['1.A']],
-            ['1. Energy industries', ['1.A.1']],
-            ['2. Manufacturing industries and construction', ['1.A.2']],
-            ['3. Transport', ['1.A.3']],
-            ['4. Other sectors', ['1.A.4']],
-            ['5. Other', ['1.A.5']],
-            ['B. Fugitive emissions from fuels', ['1.B']],
-            ['1. Solid fuels', ['1.B.1']],
-            ['2. Oil and natural gas and other emissions from energy production',
-             ['1.B.2']],
-            ['C. CO2 Transport and storage', ['1.C']],
-            ['2. Industrial processes and product use', ['2']],
-            ['A. Mineral industry', ['2.A']],
-            ['B. Chemical industry', ['2.B']],
-            ['C. Metal industry', ['2.C']],
-            ['D. Non-energy products from fuels and solvent use', ['2.D']],
-            ['E. Electronic industry', ['2.E']],
-            ['F. Product uses as substitutes for ODS', ['2.F']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['H. Other(3)', ['2.H']],
-        ],
-        "entity_mapping": {
-            'NOX': 'NOx',
-            'Net CO2 emissions/removals': 'CO2',
-            'HFCs(1)': 'HFCS (AR4GWP100)',
-            'PFCs(1)': 'PFCS (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Summary1.As2": {  # Summary 1, sheet 2
-        "status": "tested",
-         "table": {
-            "firstrow": 5,
-            "lastrow": 34,
-            "header": ['entity', 'entity', 'unit'],
-            "header_fill": [True, False, True],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["summary"],
-        },
-        "sector_mapping": [
-            ['3. Agriculture', ['3']],
-            ['A. Enteric fermentation', ['3.A']],
-            ['B. Manure management', ['3.B']],
-            ['C. Rice cultivation', ['3.C']],
-            ['D. Agricultural soils', ['3.D']],
-            ['E. Prescribed burning of savannas', ['3.E']],
-            ['F. Field burning of agricultural residues', ['3.F']],
-            ['G. Liming', ['3.G']],
-            ['H. Urea application', ['3.H']],
-            ['I. Other carbon-contining fertilizers', ['3.I']],
-            ['J. Other', ['3.J']],
-            ['4. Land use, land-use change and forestry (4)', ['4']],
-            ['A. Forest land (4)', ['4.A']],
-            ['B. Cropland (4)', ['4.B']],
-            ['C. Grassland (4)', ['4.C']],
-            ['D. Wetlands (4)', ['4.D']],
-            ['E. Settlements (4)', ['4.E']],
-            ['F. Other land (4)', ['4.F']],
-            ['G. Harvested wood products', ['4.G']],
-            ['H. Other (4)', ['4.H']],
-            ['5. Waste', ['5']],
-            ['A. Solid waste disposal (5)', ['5.A']],
-            ['B. Biological treatment of solid waste (5)', ['5.B']],
-            ['C. Incineration and open burning of waste (5)', ['5.C']],
-            ['D. Wastewater treatment and discharge', ['5.D']],
-            ['E. Other (5)', ['5.E']],
-            ['6. Other (please specify)(6)', ['6']],
-        ],
-        "entity_mapping": {
-            'NOX': 'NOx',
-            'Net CO2 emissions/removals': 'CO2',
-            'HFCs (1)': 'HFCS (AR4GWP100)',
-            'PFCs(1)': 'PFCS (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Summary1.As3": {  # Summary 1, sheet 3
-        "status": "tested",
-         "table": {
-            "firstrow": 5,
-            "lastrow": 17,
-            "header": ['entity', 'entity', 'unit'],
-            "header_fill": [True, False, True],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["summary"],
-        },
-        "sector_mapping": [
-            ['Memo items:(7)', ['\IGNORE']],
-            ['International bunkers', ['M.Memo.Int']],
-            ['Aviation', ['M.Memo.Int.Avi']],
-            ['Navigation', ['M.Memo.Int.Mar']],
-            ['Multilateral operations', ['M.Memo.Mult']],
-            ['CO2 emissions from biomass', ['M.Memo.Bio']],
-            ['CO2 captured', ['M.Memo.CO2Cap']],
-            ['Long-term storage of C in waste disposal sites', ['M.Memo.LTSW']],
-            ['Indirect N2O', ['M.Memo.IndN2O']],
-            ['Indirect CO2', ['M.Memo.IndCO2']],
-        ],
-        "entity_mapping": {
-            'NOX': 'NOx',
-            'Net CO2 emissions/removals': 'CO2',
-            'HFCs(1)': 'HFCS (AR4GWP100)',
-            'PFCs(1)': 'PFCS (AR4GWP100)',
-            'Unspecified mix of HFCs and PFCs(1)': 'UnspMixOfHFCsPFCs (AR4GWP100)',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-}

+ 0 - 2688
UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/CRF2023_specification.py

@@ -1,2688 +0,0 @@
-""" CRF2022 specification.
-Currently not all tables are included. Extend if you need all country
-specific items in categories 2, 3.H-G, 4
-
-tables included:
-* Energy
-    'Table1s1', Table1s2',
-    'Table1.A(a)s1', 'Table1.A(a)s2', 'Table1.A(a)s3', 'Table1.A(a)s4',
-    'Table1.B.1', 'Table1.B.2', 'Table1.C', 'Table1.D',
-* Industrial processes
-    'Table2(I)s1', 'Table2(I)s2',
-    'Table2(II)',
-* Agriculture
-    'Table3s1', 'Table3s2',
-    'Table3.C', 'Table3.D', 'Table3.E',
-* LULUCF
-    'Table4',
-* Waste
-    'Table5', 'Table5.A', 'Table5.B', 'Table5.C', 'Table5.D'
-* Summary Tables (for "Other" and to check for consistency)
-
-missing tables are:
-* Energy
-    'Table1.D'
-* Industrial processes
-    'Table2(I).A-Hs1', 'Table2(I).A-Hs2',
-    'Table2(II)B-Hs1', 'Table2(II)B-Hs2',
-* Agriculture
-    'Table3.As1', 'Table3.As2' (no additional emissions data)
-    'Table3.F', 'Table3.G-I',
-* LULUCF
-    All tables except Table4
-* Waste
-    All tables read
-
-TODO:
-* Add missing tables
-* Add activity data
-
-"""
-
-import numpy as np
-from .util import unit_info
-
-# TODO: GWPs now differ by country. This has to be implemented (maybe giving
-#  gwp_to_use as a parameter to the specification)
-gwp_to_use = "AR4GWP100"
-
-CRF2023 = {
-    "Table1s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 26,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Total Energy', ['1']],
-            ['A. Fuel combustion activities (sectoral approach)', ['1.A']],
-            ['1. Energy industries', ['1.A.1']],
-            ['a. Public electricity and heat production', ['1.A.1.a']],
-            ['b. Petroleum refining', ['1.A.1.b']],
-            ['c. Manufacture of solid fuels and other energy industries', ['1.A.1.c']],
-            ['2. Manufacturing industries and construction', ['1.A.2']],
-            ['a. Iron and steel', ['1.A.2.a']],
-            ['b. Non-ferrous metals', ['1.A.2.b']],
-            ['c. Chemicals', ['1.A.2.c']],
-            ['d. Pulp, paper and print', ['1.A.2.d']],
-            ['e. Food processing, beverages and tobacco', ['1.A.2.e']],
-            ['f. Non-metallic minerals', ['1.A.2.f']],
-            ['g. Other (please specify)', ['1.A.2.g']],
-            ['3. Transport', ['1.A.3']],
-            ['a. Domestic aviation', ['1.A.3.a']],
-            ['b. Road transportation', ['1.A.3.b']],
-            ['c. Railways', ['1.A.3.c']],
-            ['d. Domestic navigation', ['1.A.3.d']],
-            ['e. Other transportation', ['1.A.3.e']],
-        ],
-        "entity_mapping": {
-            "NOX": "NOx",
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 36,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['4. Other sectors', ['1.A.4']],
-            ['a. Commercial/institutional', ['1.A.4.a']],
-            ['b. Residential', ['1.A.4.b']],
-            ['c. Agriculture/forestry/fishing', ['1.A.4.c']],
-            ['5. Other (as specified in table 1.A(a) sheet 4)', ['1.A.5']],
-            ['a. Stationary', ['1.A.5.a']],
-            ['b. Mobile', ['1.A.5.b']],
-            ['B. Fugitive emissions from fuels', ['1.B']],
-            ['1. Solid fuels', ['1.B.1']],
-            ['a. Coal mining and handling', ['1.B.1.a']],
-            ['b. Solid fuel transformation', ['1.B.1.b']],
-            ['c. Other (as specified in table 1.B.1)', ['1.B.1.c']],
-            ['2. Oil and natural gas and other emissions from energy production', ['1.B.2']],
-            ['a. Oil', ['1.B.2.a']],
-            ['b. Natural gas', ['1.B.2.b']],
-            ['c. Venting and flaring', ['1.B.2.c']],
-            ['d. Other (as specified in table 1.B.2)', ['1.B.2.d']],
-            ['C. CO2 Transport and storage', ['1.C']],
-            ['1. Transport of CO2', ['1.C.1']],
-            ['2. Injection and storage', ['1.C.2']],
-            ['3. Other', ['1.C.3']],
-            ['Memo items: (1)', ['\IGNORE']],
-            ['International bunkers', ['M.Memo.Int']],
-            ['Aviation', ['M.Memo.Int.Avi']],
-            ['Navigation', ['M.Memo.Int.Mar']],
-            ['Multilateral operations', ['M.Memo.Mult']],
-            ['CO2 emissions from biomass', ['M.Memo.Bio']],
-            ['CO2 captured', ['M.Memo.CO2Cap']],
-            ['For domestic storage', ['M.Memo.CO2Cap.Dom']],
-            ['For storage in other countries', ['M.Memo.CO2Cap.Exp']],
-        ],
-        "entity_mapping": {
-            "NOX": "NOx",
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.A(a)s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 6,
-            "lastrow": 104,  # template, countries report less
-            # check the resulting data as the templates have nan rows
-            # which would stop the reading process (actual reported
-            # data does not seem to have the nan rows)
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured'
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A. Fuel combustion', ['1.A', 'Total'], 0],
-            ['Liquid fuels', ['1.A', 'Liquid'], 1],
-            ['Solid fuels', ['1.A', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A', 'Peat'], 1],
-            ['Biomass(6)', ['1.A', 'Biomass'], 1],
-            # 1.A.1. Energy industries
-            ['1.A.1. Energy industries', ['1.A.1', 'Total'], 1],
-            ['Liquid fuels', ['1.A.1', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.1', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.1', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.1', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.1', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.1', 'Biomass'], 2],
-            # a. Public electricity and heat production
-            ['a. Public electricity and heat production(7)', ['1.A.1.a', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.a', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.a', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.a', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.a', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.a', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.a', 'Biomass'], 3],
-            # 1.A.1.a.i Electricity Generation
-            ['1.A.1.a.i Electricity Generation', ['1.A.1.a.i', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.i', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.i', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.i', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.i', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.i', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.i', 'Biomass'], 4],
-            # 1.A.1.a.ii Combined heat and power generation
-            ['1.A.1.a.ii Combined heat and power generation', ['1.A.1.a.ii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.ii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.ii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.ii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.ii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.ii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.ii', 'Biomass'], 4],
-            # 1.A.1.a.iii heat plants
-            ['1.A.1.a.iii Heat plants', ['1.A.1.a.iii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.a.iii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.a.iii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.a.iii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.a.iii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.a.iii', 'Biomass'], 4],
-            # 1.A.1.a.iv Other (please specify)
-            ['1.A.1.a.iv Other (please specify)', ['1.A.1.a.iv', 'Total'], 3],
-            # AUT
-            ['Total Public Electricity and Heat Production', ['1.A.1.a.iv.4', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.4', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.4', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.4', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.4', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.4', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.4', 'Biomass'], 5],
-            # DEU
-            ['1.A.1.a Public Electricity and Heat Production', ['1.A.1.a.iv.4', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.4', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.4', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.4', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.4', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.4', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.4', 'Biomass'], 5],
-            # ESP
-            ['Other', ['1.A.1.a.iv.3', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.a.iv.3', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.a.iv.3', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.a.iv.3', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.3', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.a.iv.3', 'Peat'], 5],
-            ['Biomass', ['1.A.1.a.iv.3', 'Biomass'], 5],
-            # SVK
-            ['Methane Cogeneration (Mining)', ['1.A.1.a.iv.1', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.1', 'OtherFF'], 5],
-            ['Municipal Solid Waste Incineration (Energy use)', ['1.A.1.a.iv.2', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.2', 'OtherFF'], 5],
-            ['Biomass', ['1.A.1.a.iv.2', 'Biomass'], 5],
-            # CHE
-            ['Municipal and special waste incineration plants', ['1.A.1.a.iv.2', 'Total'], 4],
-            ['Other Fossil Fuels', ['1.A.1.a.iv.2', 'OtherFF'], 5],
-            ['Biomass', ['1.A.1.a.iv.2', 'Biomass'], 5],
-            # b. Petroleum refining
-            ['b. Petroleum refining', ['1.A.1.b', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.b', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.b', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.b', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.b', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.b', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.b', 'Biomass'], 3],
-            # c. Manufacture of solid fuels and other energy industries
-            ['c. Manufacture of solid fuels and other energy industries(8)', ['1.A.1.c', 'Total'], 2],
-            ['Liquid fuels', ['1.A.1.c', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.1.c', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.1.c', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.1.c', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.1.c', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.1.c', 'Biomass'], 3],
-            # 1.A.1.c.i Manufacture of solid fuels
-            ['1.A.1.c.i Manufacture of solid fuels', ['1.A.1.c.i', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.i', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.i', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.i', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.i', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.i', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.i', 'Biomass'], 4],
-            # 1.A.1.c.ii Oil and gas extraction
-            ['1.A.1.c.ii Oil and gas extraction', ['1.A.1.c.ii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.ii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.ii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.ii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.ii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.ii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.ii', 'Biomass'], 4],
-            # 1.A.1.c.iii Other energy industries
-            ['1.A.1.c.iii Other energy industries', ['1.A.1.c.iii', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.1.c.iii', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.1.c.iii', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.1.c.iii', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.1.c.iii', 'OtherFF'], 4],
-            ['Peat', ['1.A.1.c.iii', 'Peat'], 4],
-            ['Biomass', ['1.A.1.c.iii', 'Biomass'], 4],
-            # 1.A.1.c.iv Other (please specify)
-            ['1.A.1.c.iv Other (please specify)', ['1.A.1.c.iv', 'Total'], 3],
-            # DEU
-            ['1.A.1.c Manufacture of Solid Fuels and Other Energy Industries', ['1.A.1.c.iv.2', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.2', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.2', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.2', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.2', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.2', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.2', 'Biomass'], 5],
-            # ESP
-            ['Other', ['1.A.1.c.iv.3', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.3', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.3', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.3', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.3', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.3', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.3', 'Biomass'], 5],
-            # CYP
-            ['Charcoal Production', ['1.A.1.c.iv.1', 'Total'], 4],
-            ['Liquid Fuels', ['1.A.1.c.iv.1', 'Liquid'], 5],
-            ['Solid Fuels', ['1.A.1.c.iv.1', 'Solid'], 5],
-            ['Gaseous Fuels', ['1.A.1.c.iv.1', 'Gaseous'], 5],
-            ['Other Fossil Fuels', ['1.A.1.c.iv.1', 'OtherFF'], 5],
-            ['Peat', ['1.A.1.c.iv.1', 'Peat'], 5],
-            ['Biomass', ['1.A.1.c.iv.1', 'Biomass'], 5],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 114,  # template, countries report less
-            # check the resulting data as the templates have nan rows
-            # which would stop the reading process (actual reported
-            # data does not seem to have the nan rows)
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.2 Manufacturing industries and construction', ['1.A.2', 'Total'], 0],
-            ['Liquid fuels', ['1.A.2', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.2', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.2', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.2', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A.2', 'Peat'], 1],
-            ['Biomass(6)', ['1.A.2', 'Biomass'], 1],
-            # a. Iron and Steel
-            ['a. Iron and steel', ['1.A.2.a', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.a', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.a', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.a', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.a', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.a', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.a', 'Biomass'], 2],
-            # b. non-ferrous metals
-            ['b. Non-ferrous metals', ['1.A.2.b', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.b', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.b', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.b', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.b', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.b', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.b', 'Biomass'], 2],
-            # c. Chemicals
-            ['c. Chemicals', ['1.A.2.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.c', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.c', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.c', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.c', 'Biomass'], 2],
-            # d. Pulp paper print
-            ['d. Pulp, paper and print', ['1.A.2.d', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.d', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.d', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.d', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.d', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.d', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.d', 'Biomass'], 2],
-            # e. Food processing, beverages and tobacco
-            ['e. Food processing, beverages and tobacco', ['1.A.2.e', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.e', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.e', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.e', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.e', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.e', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.e', 'Biomass'], 2],
-            # f. non-metallic minerals
-            ['f. Non-metallic minerals', ['1.A.2.f', 'Total'], 1],
-            ['Liquid fuels', ['1.A.2.f', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.2.f', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.2.f', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.2.f', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.2.f', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.2.f', 'Biomass'], 2],
-            # g. other
-            ['g. Other (please specify)(9)', ['1.A.2.g', 'Total'], 1],
-            #1.A.2.g.i Manufacturing of machinery
-            ['1.A.2.g.i Manufacturing of machinery', ['1.A.2.g.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.i', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.i', 'Biomass'], 3],
-            # 1.A.2.g.ii Manufacturing of transport equipment
-            ['1.A.2.g.ii Manufacturing of transport equipment', ['1.A.2.g.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.ii', 'Biomass'], 3],
-            # 1.A.2.g.iii Mining (excluding fuels) and quarrying
-            ['1.A.2.g.iii Mining (excluding fuels) and quarrying', ['1.A.2.g.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.iii', 'Biomass'], 3],
-            # 1.A.2.g.iv Wood and wood products
-            ['1.A.2.g.iv Wood and wood products', ['1.A.2.g.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.iv', 'Biomass'], 3],
-            # 1.A.2.g.v Construction
-            ['1.A.2.g.v Construction', ['1.A.2.g.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.v', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.v', 'Biomass'], 3],
-            # 1.A.2.g.vi Textile and leather
-            ['1.A.2.g.vi Textile and leather', ['1.A.2.g.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.vi', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.2.g.vi', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.vi', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.vi', 'OtherFF'], 3],
-            ['Peat', ['1.A.2.g.vi', 'Peat'], 3],
-            ['Biomass', ['1.A.2.g.vi', 'Biomass'], 3],
-            # 1.A.2.g.vii Off-road vehicles and other machinery
-            ['1.A.2.g.vii Off-road vehicles and other machinery', ['1.A.2.g.vii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.2.g.vii', 'Liquid'], 3],
-            ['Gaseous Fuels', ['1.A.2.g.vii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.2.g.vii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.2.g.vii', 'Biomass'], 3],
-            # 1.A.2.g.viii Other (please specify)
-            ['1.A.2.g.viii Other (please specify)', ['1.A.2.g.viii', 'Total'], 2],
-            # DKE
-            ['Construction', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            ['Mining', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            # DNK, DKE, USA, CZE
-            ['Other non-specified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #SVK, CYP
-            ['Non-specified Industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #BEL
-            ['Other non specified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            #PRT, LTU
-            ['Non-specified industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # MLT
-            ['Undefined Industry', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # TUR
-            ['Other unspecified', ['1.A.2.g.viii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.1', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.1', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.1', 'Biomass'], 4],
-            # DKE
-            ['Textile', ['\IGNORE', '\IGNORE'], 3],  # (empty)
-            # DNK, DNM, FIN, DKE
-            ['Other manufacturing industries', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # CAN
-            ['Other Manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # AUT, LUX
-            ['Other Manufacturing Industries', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # NOR
-            ['Other manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # AUS
-            ['All Other Manufacturing', ['1.A.2.g.viii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.3', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.3', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.3', 'Biomass'], 4],
-            # NLD
-            ['Other Industrial Sectors', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # GBR, GBK
-            ['Other industry (not specified above)', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # UKR
-            ['Oter Industries', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # RUS
-            ['Other industries', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # RUS
-            ['Non-CO2 emissions from BFG combustion', ['1.A.2.g.viii.5', 'Total'], 3],
-            ['Solid Fuels', ['1.A.2.g.viii.5', 'Solid'], 4],
-            # BLR, DNK, ESP, LVA, NZL, POL, ROU, SVN,
-            ['Other', ['1.A.2.g.viii.10', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.10', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.10', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.10', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.10', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.10', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.10', 'Biomass'], 4],
-            # BLR
-            ['Manufacture and construction Aggregated', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # HRV
-            ['Other Industry', ['1.A.2.g.viii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.4', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.4', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.4', 'Biomass'], 4],
-            # HRV
-            ['1A2 Total for 1990 to 2000', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # MLT
-            ['All Industry', ['1.A.2.g.viii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.2', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.2', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.2', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.2', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.2', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.2', 'Biomass'], 4],
-            # PRT
-            ['Rubber', ['1.A.2.g.viii.6', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.6', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.6', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.6', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.6', 'OtherFF'], 4],
-            ['Biomass', ['1.A.2.g.viii.6', 'Biomass'], 4],
-            # SWE
-            ['All stationary combustin within CRF 1.A.2.g', ['1.A.2.g.viii.7', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.7', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.7', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.7', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.7', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.7', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.7', 'Biomass'], 4],
-            # IRL
-            ['Other stationary combustion', ['1.A.2.g.viii.8', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.8', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.8', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.8', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.8', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.8', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.8', 'Biomass'], 4],
-            # HUN
-            ['Other Stationary Combustion', ['1.A.2.g.viii.8', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.8', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.8', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.8', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.8', 'OtherFF'], 4],
-            ['Peat', ['1.A.2.g.viii.8', 'Peat'], 4],
-            ['Biomass', ['1.A.2.g.viii.8', 'Biomass'], 4],
-            # CHE
-            ['Other Boilers and Engines Industry', ['1.A.2.g.viii.9', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.2.g.viii.9', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.2.g.viii.9', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.2.g.viii.9', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.2.g.viii.9', 'OtherFF'], 4],
-            ['Biomass', ['1.A.2.g.viii.9', 'Biomass'], 4],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s3": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 115,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-            ],
-            "stop_cats": ["Note: All footnotes for this table are given at the end of the table on sheet 4.", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.3 Transport', ['1.A.3', 'Total'], 0],
-            ['Liquid fuels', ['1.A.3', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.3', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.3', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.3', 'OtherFF'], 1],
-            ['Biomass(6)', ['1.A.3', 'Biomass'], 1],
-            # a. Domestic Aviation
-            ['a. Domestic aviation(10)', ['1.A.3.a', 'Total'], 1],
-            ['Aviation gasoline', ['1.A.3.a', 'AvGasoline'], 2],
-            ['Jet kerosene', ['1.A.3.a', 'JetKerosene'], 2],
-            ['Biomass', ['1.A.3.a', 'Biomass'], 2],
-            # b. road Transportation
-            ['b. Road transportation(11)', ['1.A.3.b', 'Total'], 1],
-            ['Gasoline', ['1.A.3.b', 'Gasoline'], 2],
-            ['Diesel oil', ['1.A.3.b', 'DieselOil'], 2],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b', 'LPG'], 2],
-            ['Other liquid fuels (please specify)', ['1.A.3.b', 'OtherLiquid'], 2],
-            ['Gaseous fuels', ['1.A.3.b', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.b', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b', 'OtherFF'], 2],
-            # i. Cars
-            ['i. Cars', ['1.A.3.b.i', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.i', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.i', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.i', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.i', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.i', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.i', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant oil', ['1.A.3.b.i', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.i', 'OLBiodieselFC'], 4],  # CAN
-            ['Fossil part of biodiesel', ['1.A.3.b.i', 'OLBiodieselFC'], 4],  # LTU
-            ['Other', ['1.A.3.b.i', 'OLOther'], 4],  # UKR, MLT
-            ['Other Liquid Fuels', ['1.A.3.b.i', 'OLOther'], 4],  # CYP
-            ['Other non-specified', ['1.A.3.b.i', 'OLOther'], 4],  # SWE new in 2023
-            ['Other motor fuels', ['1.A.3.b.i', 'OMotorFuels'], 4],  # RUS
-            ['Lubricants in 2-stroke engines', ['1.A.3.b.i', 'Lubricants'], 4],  # HUN
-            ['LNG', ['1.A.3.b.i', 'LNG'], 4],  ## USA
-            ['Gaseous fuels', ['1.A.3.b.i', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.i', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.i', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.i', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.i', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN, LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.i', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Natural Gas', ['1.A.3.b.i', 'OFFNaturalGas'], 4],  # USA
-            ['Fossil part of biofuel', ['1.A.3.b.i', 'OFFBiofuelFC'], 4],  # IRL
-            ['Other', ['1.A.3.b.i', 'OFFOther'], 4],  # MLT
-            # ii. Light duty trucks
-            ['ii. Light duty trucks', ['1.A.3.b.ii', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.ii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.ii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.ii', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.ii', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.ii', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.ii', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant Oil', ['1.A.3.b.ii', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.ii', 'OLBiodieselFC'], 4],  # CAN
-            ['Other', ['1.A.3.b.ii', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.ii', 'OLOther'], 4],  # CYP
-            ['Other non-specified', ['1.A.3.b.ii', 'OLOther'], 4],  # SWE new in 2023
-            ['Other motor fuels', ['1.A.3.b.ii', 'OMotorFuels'], 4],  # RUS
-            ['LNG', ['1.A.3.b.ii', 'LNG'], 4],  ## USA
-            ['Gaseous fuels', ['1.A.3.b.ii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.ii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.ii', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.ii', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.ii', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN, LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.ii', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biofuel', ['1.A.3.b.ii', 'OFFBiofuelFC'], 4],  # IRL
-            # iii. Heavy duty trucks and buses
-            ['iii. Heavy duty trucks and buses', ['1.A.3.b.iii', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.iii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.iii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.iii', 'LPG'], 3],
-            ['Other liquid fFuels (please specify)', ['1.A.3.b.iii', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.iii', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.iii', 'Lubricants'], 4],  # UKR, JPN
-            ['Lubricant Oil', ['1.A.3.b.iii', 'Lubricants'], 4],  # PRT
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.b.iii', 'OLBiodieselFC'], 4],  # CAN
-            ['Other', ['1.A.3.b.iii', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.iii', 'OLOther'], 4],  # CYP
-            ['Other non-specified', ['1.A.3.b.iii', 'OLOther'], 4],  # SWE new in 2023
-            ['Other motor fuels', ['1.A.3.b.iii', 'OMotorFuels'], 4],  # RUS
-            ['LNG', ['1.A.3.b.iii', 'LNG'], 4],  # USA
-            ['GTL', ['1.A.3.b.iii', 'GTL'], 4],  # MCO, new in 2022
-            ['Gaseous fuels', ['1.A.3.b.iii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.iii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.iii', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.iii', 'OFFOther'], 4],  # CYP, POL
-            ['Biodiesel (fossil component)', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # LUX
-            ['Biodiesel fossil fraction', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # NOR
-            ['Biodiesel (fossil fraction)', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # NZL
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # CZE
-            ['fossil part of biodiesel', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # DKE, DNK, HRV
-            ['Fossil part of biodiesel', ['1.A.3.b.iii', 'OFFBiodieselFC'], 4],  # DNM, BEL, HUN. LVA, ESP
-            ['Fossil part of biogasoline', ['1.A.3.b.iii', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biofuel', ['1.A.3.b.iii', 'OFFBiofuelFC'], 4],  # IRL
-            # iv. Motorcycles
-            ['iv. Motorcycles', ['1.A.3.b.iv', 'Total'], 2],
-            ['Gasoline', ['1.A.3.b.iv', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.3.b.iv', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.3.b.iv', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.3.b.iv', 'OtherLiquid'], 3],
-            ['Kerosene', ['1.A.3.b.iv', 'Kerosene'], 4],  # UKR (and probably others)
-            ['Lubricants', ['1.A.3.b.iv', 'Lubricants'], 4],  # UKR, JPN, HRV
-            ['Lubricant Oil', ['1.A.3.b.iv', 'Lubricants'], 4],  # PRT
-            ['Other', ['1.A.3.b.iv', 'OLOther'], 4],  # UKR (and probably others)
-            ['Other Liquid Fuels', ['1.A.3.b.iv', 'OLOther'], 4],  # CYP
-            ['Other non-specified', ['1.A.3.b.iv', 'OLOther'], 4],  # SWE new in 2023
-            ['Lube', ['1.A.3.b.iv', 'Lubricants'], 4],  # MCO
-            ['Lubricants in 2-stroke engines', ['1.A.3.b.iv', 'Lubricants'], 4],  # HUN
-            ['Lubricants (two-stroke engines)', ['1.A.3.b.iv', 'Lubricants'], 4],  # ESP
-            ['lubricants', ['1.A.3.b.iv', 'Lubricants'], 4],  # SVN
-            ['Gaseous fuels', ['1.A.3.b.iv', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.3.b.iv', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.b.iv', 'OtherFF'], 3],
-            ['Other Fossil Fuels', ['1.A.3.b.iv', 'OFFOther'], 4],  # CYP
-            ['Fossil part of biodiesel or biogasoline', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # PRT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # SWE
-            ['fossil part of biofuels', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # CZE
-            ['Fossil part of biodiesel', ['1.A.3.b.iv', 'OFFBiodieselFC'], 4],  # BEL
-            ['Fossil part of biogasoline', ['1.A.3.b.iv', 'OFFBiogasolineFC'], 4],  # BEL
-            ['Fossil part of biodiese', ['1.A.3.b.iv', 'OFFBiodieselFC'], 4],  # LVA
-            ['Fossil part of biofuel', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # IRL
-            ['fossil part of biodiesel', ['1.A.3.b.iv', 'OFFBiofuelFC'], 4],  # HRV
-            # v. Other
-            ['v. Other (please specify)', ['1.A.3.b.v', 'Total'], 2],
-            # TUR
-            ['Road total', ['1.A.3.b.v.1', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.1', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.1', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.1', 'LPG'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.1', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.1', 'Biomass'], 4],
-            # CYP
-            ['Buses', ['1.A.3.b.v.2', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.2', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.2', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.2', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.2', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.2', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.2', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.2', 'OtherFF'], 4],
-            # GBK, GBR
-            ['All vehicles - biofuel use', ['1.A.3.b.v.3', 'Total'], 3],
-            ['Biomass', ['1.A.3.b.v.3', 'Biomass'], 4],
-            ['All vehicles - LPG use', ['1.A.3.b.v.4', 'Total'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.4', 'LPG'], 4],
-            ['All vehicles - biofuel use (fossil component)', ['1.A.3.b.v.5', 'Total'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.5', 'OtherFF'], 4],
-            # CAN
-            ['Propane and Natural Gas Vehicles', ['1.A.3.b.v.6', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.6', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.6', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.6', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.6', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.6', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.6', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.6', 'OtherFF'], 4],
-            # BEL
-            ['Lubricant Two-Stroke Engines', ['1.A.3.b.v.7', 'Lubricants'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            # ROU
-            ['Gaseous Fuels', ['1.A.3.b.v.8', 'Total'], 3],
-            ['Gaseous Fuels', ['1.A.3.b.v.8', 'Gaseous'], 4],
-            ['Other Liquid Fuels', ['1.A.3.b.v.9', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.9', 'OtherLiquid'], 4],
-            ['Other Kerosene', ['1.A.3.b.v.9', 'Kerosene'], 5],
-            ['Heating and Other Gasoil', ['1.A.3.b.v.9', 'HeatingGasoil'], 5],
-            ['Biomass', ['1.A.3.b.v.10', 'Total'], 3],
-            ['Biomass', ['1.A.3.b.v.10', 'Biomass'], 4],
-            # DEU
-            ['CO2 from lubricant co-incineration in 2-stroke road vehicles', ['1.A.3.b.v.7', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            ['lubricant used in 2-stroke mix', ['1.A.3.b.v.7', 'Lubricants'], 5],
-            # USA
-            ['Evaporative Emissions', ['1.A.3.b.v.11', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.11', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.11', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.11', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.11', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.11', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.11', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.11', 'OtherFF'], 4],
-            # SVK
-            ['Urea-based catalysts', ['1.A.3.b.v.12', 'Total'], 3],
-            ['Diesel Oil', ['1.A.3.b.v.12', 'DieselOil'], 4],
-            # ESP
-            ['Other non-specified', ['1.A.3.b.v.13', 'Total'], 3],
-            ['Gasoline', ['1.A.3.b.v.13', 'Gasoline'], 4],
-            ['Diesel Oil', ['1.A.3.b.v.13', 'DieselOil'], 4],
-            ['Liquefied Petroleum Gases (LPG)', ['1.A.3.b.v.13', 'LPG'], 4],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.13', 'OtherLiquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.b.v.13', 'Gaseous'], 4],
-            ['Biomass', ['1.A.3.b.v.13', 'Biomass'], 4],
-            ['Other Fossil Fuels (please specify)', ['1.A.3.b.v.13', 'OtherFF'], 4],
-            # BGR
-            ['Urea', ['1.A.3.b.v.12', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.12', 'OtherLiquid'], 4],
-            ['Lubricants', ['1.A.3.b.v.7', 'Total'], 3],
-            ['Other Liquid Fuels (please specify)', ['1.A.3.b.v.7', 'OtherLiquid'], 4],
-            # c. Railways
-            ['c. Railways', ['1.A.3.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.3.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.3.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.3.c', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.c', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)', ['1.A.3.c', 'OtherFF'], 2],
-            ['Biodiesel (fossil component)', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # LUX
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # SWE
-            ['Fossil part of biodiesel', ['1.A.3.c', 'OFFBiodieselFC'], 3],  # LVA, new in 2022
-            ['Other fossil fuels', ['1.A.3.c', 'OFFOther'], 3],  # ROU, new in 2022
-            # d. Domestic navigation
-            ['d. Domestic Navigation(10)', ['1.A.3.d', 'Total'], 1],
-            ['Residual fuel oil', ['1.A.3.d', 'ResFuelOil'], 2],
-            ['Gas/diesel oil', ['1.A.3.d', 'GasDieselOil'], 2],
-            ['Gasoline', ['1.A.3.d', 'Gasoline'], 2],
-            ['Other liquid fuels (please specify)', ['1.A.3.d', 'OtherLiquid'], 2],
-            ['Lubricants', ['1.A.3.d', 'Lubricants'], 3],  # UKR, JPN
-            ['Biodiesel (5 percent fossil portion)', ['1.A.3.d', 'OLBiodieselFC'], 3],  # CAN
-            ['Light Fuel Oil', ['1.A.3.d', 'LightFuelOil'], 3],  # CAN
-            ['Kerosene and stove oil', ['1.A.3.d', 'KeroseStoveOil'], 3],  # CAN
-            ['Kerosene', ['1.A.3.d', 'Kerosene'], 3],  # DKE, DNK
-            ['Natural Gas Liquids', ['1.A.3.d', 'NGL'], 3],  # DKE, DNK
-            ['Fossil part of biodiesel', ['1.A.3.d', 'OLBiodieselFC'], 3],  # LTU
-            ['Other non-specified', ['1.A.3.d', 'OLOther'], 3],  # SWE
-            ['Other motor fuels', ['1.A.3.d', 'OMotorFuels'], 3],  # RUS
-            ['Fuel oil A', ['1.A.3.d', 'FuelOilA'], 3],  # JPN
-            ['Fuel oil B', ['1.A.3.d', 'FuelOilB'], 3],  # JPN
-            ['Fuel oil C', ['1.A.3.d', 'FuelOilC'], 3],  # JPN
-            ['Diesel Oil', ['1.A.3.d', 'OLDiesel'], 3],  # FIN
-            ['Other Liquid Fuels', ['1.A.3.d', 'OLOther'], 3],  # ROU, new in 2022
-            ['Heating and Other Gasoil', ['1.A.3.d', 'OLHeatingOtherGasoil'], 3],
-            # ROU, new in 2023
-            ['Liquified Petroleum Gas', ['1.A.3.d', 'OLLPG'], 3],  # ROU, new in 2023
-            ['Gaseous fuels', ['1.A.3.d', 'Gaseous'], 2],
-            ['Biomass(6)', ['1.A.3.d', 'Biomass'], 2],
-            ['Other fossil fuels (please specify)(4)', ['1.A.3.d', 'OtherFF'], 2],
-            ['Liquified natural gas', ['1.A.3.d', 'LNG'], 3],  # DKE, DNK, DNM
-            ['Biodiesel (fossil component)', ['1.A.3.d', 'OFFBiodieselFC'], 3],  # LUX
-            ['Coal', ['1.A.3.d', 'OFFCoal'], 3],  # NZL, NDL
-            ['fossil part of biodiesel', ['1.A.3.d', 'OFFBiodieselFC'], 3],  # AUT
-            ['Fossil part of biodiesel and biogasoline', ['1.A.3.d', 'OFFBioGasDieselFC'], 3],  # SWE
-            ['Solid Fuels', ['1.A.3.d', 'OFFSolid'], 3],  # AUS
-            ['Other Fossil Fuels', ['1.A.3.d', 'OFFOther'], 3],  # ROU, new in 2022
-            # e. other transportation
-            # keep details also for top category as it's present
-            ['e. Other transportation (please specify)', ['1.A.3.e', 'Total'], 1],
-            ['Liquid fuels', ['1.A.3.e', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.3.e', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.3.e', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.3.e', 'OtherFF'], 2],
-            ['Biomass(6)', ['1.A.3.e', 'Biomass'], 2],
-            # i. pipeline
-            ['i. Pipeline transport', ['1.A.3.e.i', 'Total'], 2],
-            ['Liquid fuels', ['1.A.3.e.i', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.3.e.i', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.3.e.i', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.3.e.i', 'OtherFF'], 3],
-            ['Biomass(6)', ['1.A.3.e.i', 'Biomass'], 3],
-            # ii other
-            ['ii. Other (please specify)', ['1.A.3.e.ii', 'Total'], 2],
-            # UKR, SWE
-            ['Off-road vehicles and other machinery', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # GBR, GBK
-            ['Aircraft support vehicles', ['1.A.3.e.ii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.2', 'Liquid'], 4],
-            # CAN
-            ['Off Road', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # LTU
-            ['Off-road transport', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # BEL
-            ['Other non-specified', ['1.A.3.e.ii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.3', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.3', 'Biomass'], 4],
-            # AUS
-            ['Off-Road Vehicles', ['1.A.3.e.ii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.1', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.1', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.1', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.1', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.1', 'Biomass'], 4],
-            # USA
-            ['Non-Transportation Mobile', ['1.A.3.e.ii.4', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.4', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.4', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.4', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.4', 'Biomass'], 4],
-            # AUT (new in 2022)
-            ['Airport ground activities', ['1.A.3.e.ii.2', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.4', 'Liquid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.4', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.4', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.4', 'Biomass'], 4],
-            # ROU, new in 2022
-            ['Other', ['1.A.3.e.ii.3', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.3.e.ii.3', 'Liquid'], 4],
-            ['Solid Fuels', ['1.A.3.e.ii.3', 'Solid'], 4],
-            ['Gaseous Fuels', ['1.A.3.e.ii.3', 'Gaseous'], 4],
-            ['Other Fossil Fuels', ['1.A.3.e.ii.3', 'OtherFF'], 4],
-            ['Biomass', ['1.A.3.e.ii.3', 'Biomass'], 4],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.A(a)s4": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 127,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'AGGREGATE ACTIVITY DATA Consumption',
-                'IMPLIED EMISSION FACTORS CO2(1)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1.A.4 Other sectors', ['1.A.4', 'Total'], 0],
-            ['Liquid fuels', ['1.A.4', 'Liquid'], 1],
-            ['Solid fuels', ['1.A.4', 'Solid'], 1],
-            ['Gaseous fuels', ['1.A.4', 'Gaseous'], 1],
-            ['Other fossil fuels(4)', ['1.A.4', 'OtherFF'], 1],
-            ['Peat(5)', ['1.A.4', 'Peat'], 1],
-            ['Biomass(6)', ['1.A.4', 'Biomass'], 1],
-            # a. Commercial/institutional(12)
-            ['a. Commercial/institutional(12)', ['1.A.4.a', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.a', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.a', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.a', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.a', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.a', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.a', 'Biomass'], 2],
-            # 1.A.4.a.i Stationary combustion
-            ['1.A.4.a.i Stationary combustion', ['1.A.4.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.4.a.i', 'Biomass'], 3],
-            # 1.A.4.a.ii Off-road vehicles and other machinery
-            ['1.A.4.a.ii Off-road vehicles and other machinery', ['1.A.4.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.4.a.ii', 'Biomass'], 3],
-            # 1.A.4.a.iii Other (please specify)
-            ['1.A.4.a.iii Other (please specify)', ['1.A.4.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.a.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.a.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.a.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.a.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.a.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.4.a.iii', 'Biomass'], 3],
-            # b. Residential(13)
-            ['b. Residential(13)', ['1.A.4.b', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.b', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.b', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.b', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.b', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.b', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.b', 'Biomass'], 2],
-            # 1.A.4.b.i Stationary combustion
-            ['1.A.4.b.i Stationary combustion', ['1.A.4.b.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.b.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.b.i', 'Peat'], 3],
-            ['Biomass', ['1.A.4.b.i', 'Biomass'], 3],
-            # 1.A.4.b.ii Off-road vehicles and other machinery
-            ['1.A.4.b.ii Off-road vehicles and other machinery', ['1.A.4.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.4.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.4.b.ii', 'Biomass'], 3],
-            # 1.A.4.b.iii Other (please specify)
-            ['1.A.4.b.iii Other (please specify)', ['1.A.4.b.iii', 'Total'], 2],
-            # CYP, USA
-            ['Residential', ['1.A.4.b.iii.1', 'Total'], 3],
-            ['Liquid Fuels', ['1.A.4.b.iii.1', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.4.b.iii.1', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.4.b.iii.1', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.4.b.iii.1', 'OtherFF'], 3],
-            ['Peat', ['1.A.4.b.iii.1', 'Peat'], 3],
-            ['Biomass', ['1.A.4.b.iii.1', 'Biomass'], 3],
-            # c. Agriculture/forestry/fishing
-            ['c. Agriculture/forestry/fishing', ['1.A.4.c', 'Total'], 1],
-            ['Liquid fuels', ['1.A.4.c', 'Liquid'], 2],
-            ['Solid fuels', ['1.A.4.c', 'Solid'], 2],
-            ['Gaseous fuels', ['1.A.4.c', 'Gaseous'], 2],
-            ['Other fossil fuels(4)', ['1.A.4.c', 'OtherFF'], 2],
-            ['Peat(5)', ['1.A.4.c', 'Peat'], 2],
-            ['Biomass(6)', ['1.A.4.c', 'Biomass'], 2],
-            # i. Stationary
-            ['i. Stationary', ['1.A.4.c.i', 'Total'], 2],
-            ['Liquid fuels', ['1.A.4.c.i', 'Liquid'], 3],
-            ['Solid fuels', ['1.A.4.c.i', 'Solid'], 3],
-            ['Gaseous fuels', ['1.A.4.c.i', 'Gaseous'], 3],
-            ['Other fossil fuels(4)', ['1.A.4.c.i', 'OtherFF'], 3],
-            ['Peat(5)', ['1.A.4.c.i', 'Peat'], 3],
-            ['Biomass(6)', ['1.A.4.c.i', 'Biomass'], 3],
-            # ii. Off-road vehicles and other machinery
-            ['ii. Off-road vehicles and other machinery', ['1.A.4.c.ii', 'Total'], 2],
-            ['Gasoline', ['1.A.4.c.ii', 'Gasoline'], 3],
-            ['Diesel oil', ['1.A.4.c.ii', 'DieselOil'], 3],
-            ['Liquefied petroleum gases (LPG)', ['1.A.4.c.ii', 'LPG'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.4.c.ii', 'OtherLiquid'], 3],
-            ['Other Kerosene', ['1.A.4.c.ii', 'Kerosene'], 4],  # HRV
-            ['Lubricants', ['1.A.4.c.ii', 'Lubricants'], 4],  # HRV
-            ['Gasoil', ['1.A.4.c.ii', 'Gasoil'], 4],  # FIN
-            ['Marine gasoil', ['1.A.4.c.ii', 'MarineGasoil'], 4],  # NOR
-            ['heavy fuel oil', ['1.A.4.c.ii', 'HeavyFuelOil'], 4],  # NOR
-            ['Other motor fuels', ['1.A.4.c.ii', 'OMotorFuels'], 4],  # RUS
-            ['Biodiesel (5 percent fossil portion)', ['1.A.4.c.ii', 'OLBiodieselFC'], 4],  # CAN
-            ['Lubricating Oil (Two-Stroke Engines)', ['1.A.4.c.ii', 'Lubricants'], 4],
-            # CAN new in 2023
-            ['Gaseous fuels', ['1.A.4.c.ii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.4.c.ii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.4.c.ii', 'OtherFF'], 3],
-            ['fossil part of biodiesel', ['1.A.4.c.ii', 'OFFBiodieselFC'], 4],
-            ['Fossil part of biodiesel and biogasoline', ['1.A.4.c.ii', 'OFFBiofuelFC'], 4],
-            ['Biodiesel (fossil component)', ['1.A.4.c.ii', 'OFFBiodieselFC'], 4], # LUX
-            ['Alkylate Gasoline', ['1.A.4.c.ii', 'OFFAlkylateGasoline'], 4], # LIE
-            # iii. Fishing
-            ['iii. Fishing', ['1.A.4.c.iii', 'Total'], 2],
-            ['Residual fuel oil', ['1.A.4.c.iii', 'ResFuelOil'], 3],
-            ['Gas/diesel oil', ['1.A.4.c.iii', 'GasDieselOil'], 3],
-            ['Gasoline', ['1.A.4.c.iii', 'Gasoline'], 3],
-            ['Other liquid fuels (please specify)', ['1.A.4.c.iii', 'OtherLiquid'], 3],
-            ['Biodiesel (5 percent fossil portion)', ['1.A.4.c.iii', 'OLBiodieselFC'], 4],  # CAN
-            ['Gaseous fuels', ['1.A.4.c.iii', 'Gaseous'], 3],
-            ['Biomass(6)', ['1.A.4.c.iii', 'Biomass'], 3],
-            ['Other fossil fuels (please specify)(4)', ['1.A.4.c.iii', 'OtherFF'], 3],
-            ['Fossil part of biodiesel and biogasoline', ['1.A.4.c.iii', 'OFFBiofuelFC'], 3],
-            # 1.A.5 Other (Not specified elsewhere)(14)
-            ['1.A.5 Other (Not specified elsewhere)(14)', ['1.A.5', 'Total'], 0],
-            # a. Stationary (please specify)
-            ['a. Stationary (please specify)', ['1.A.5.a', 'Total'], 1],
-            # temp
-            ['Liquid Fuels', ['1.A.5.a', 'Liquid'], 2],
-            ['Solid Fuels', ['1.A.5.a', 'Solid'], 2],
-            ['Gaseous Fuels', ['1.A.5.a', 'Gaseous'], 2],
-            ['Other Fossil Fuels', ['1.A.5.a', 'OtherFF'], 2],
-            ['Peat', ['1.A.5.a', 'Peat'], 2],
-            ['Biomass', ['1.A.5.a', 'Biomass'], 2],
-            # temp
-            # GBK, GBR
-            ['Military fuel use', ['1.A.5.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.i', 'Biomass'], 3],
-            # TUR
-            ['Liquid fuels', ['1.A.5.a', 'Liquid'], 2],
-            # ESP, FIN, SWE
-            ['Other non-specified', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # ROU, SVK, RUS
-            ['Other', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # FRA, FRK
-            ['Other not specified', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # CYP
-            ['Other (not specified elsewhere)', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.ii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # NOR, HUN
-            ['Military', ['1.A.5.a.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.i', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.i', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.i', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.i', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.i', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.i', 'Biomass'], 3],
-            ['Non-fuel Use', ['1.A.5.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iii', 'Liquid'], 3],
-            # DNM, DKE, DNK
-            ['Other stationary combustion', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # LUX
-            ['Stationary', ['1.A.5.a.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.a.ii', 'Biomass'], 3],
-            # USA
-            ['Incineration of Waste', ['1.A.5.a.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.iv', 'Biomass'], 3],
-            ['U.S. Territories', ['1.A.5.a.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.v', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.v', 'Biomass'], 3],
-            ['Non Energy Use', ['1.A.5.a.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.a.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.a.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.a.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.a.iii', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.a.iii', 'Peat'], 3],
-            ['Biomass', ['1.A.5.a.iii', 'Biomass'], 3],
-            # b. Mobile (please specify)
-            ['b. Mobile (please specify)', ['1.A.5.b', 'Total'], 1],
-            # temp
-            ['Liquid Fuels', ['1.A.5.b', 'Liquid'], 2],
-            ['Solid Fuels', ['1.A.5.b', 'Solid'], 2],
-            ['Gaseous Fuels', ['1.A.5.b', 'Gaseous'], 2],
-            ['Other Fossil Fuels', ['1.A.5.b', 'OtherFF'], 2],
-            ['Biomass', ['1.A.5.b', 'Biomass'], 2],
-            # temp
-            # GBK, GBR
-            ['Military aviation and naval shipping', ['1.A.5.b.i', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.i', 'Liquid'], 3],
-            # HRV
-            ['Military aviation component', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.ii', 'Biomass'], 3],
-            ['Military water-borne component', ['1.A.5.b.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iii', 'Biomass'], 3],
-            # ESP, FIN
-            ['Other non-specified', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # NLD, DKE, DNM, DNK, SWE, UKR
-            ['Military use', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.v', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # AUT, NOR, USA, CHE, HUN, LTU
-            ['Military', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # PRT
-            ['Military Aviation', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            # ROU, MLT
-            ['Other', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # FRA, FRK
-            ['Other not specified', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Peat', ['1.A.5.b.iv', 'Peat'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # CYP
-            ['1A5b i Mobile (aviation component)', ['1.A.5.b.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vi', 'Liquid'], 3],
-            # GBK, GBR
-            ['Lubricants used in 2-stroke engines', ['1.A.5.b.vii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vii', 'Liquid'], 3],
-            # DNM, DKE, DNK
-            ['Recreational crafts', ['1.A.5.b.viii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.viii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.viii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.viii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.viii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.viii', 'Biomass'], 3],
-            # SVK
-            ['Military use Jet Kerosene', ['1.A.5.b.ix', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ix', 'Liquid'], 3],
-            ['Military Gasoline', ['1.A.5.b.x', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.x', 'Liquid'], 3],
-            ['Biomass', ['1.A.5.b.ix', 'Biomass'], 3],
-            ['Military Diesel Oil', ['1.A.5.b.xi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xi', 'Liquid'], 3],
-            ['Biomass', ['1.A.5.b.xi', 'Biomass'], 3],
-            # BEL
-            ['Military Use', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.v', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.v', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.v', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.v', 'Biomass'], 3],
-            # AUS
-            ['Military Transport', ['1.A.5.b.xii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.xii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.xii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.xii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.xii', 'Biomass'], 3],
-            # CZE
-            ['Agriculture and Forestry and Fishing', ['1.A.5.b.xiii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.xiii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.xiii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.xiii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.xiii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.xiii', 'Biomass'], 3],
-            ['Other mobile sources not included elsewhere', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # SVN
-            ['Military use of fuels', ['1.A.5.b.v', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.v', 'Liquid'], 3],
-            # LUX
-            ['Unspecified Mobile', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # LVA
-            ['Mobile', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iv', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iv', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iv', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iv', 'Biomass'], 3],
-            # CAN
-            ['Domestic Military (Aviation)', ['1.A.5.b.ii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.ii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.ii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.ii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.ii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.ii', 'Biomass'], 3],
-            ['Military Water-borne Navigation', ['1.A.5.b.iii', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iii', 'Liquid'], 3],
-            ['Solid Fuels', ['1.A.5.b.iii', 'Solid'], 3],
-            ['Gaseous Fuels', ['1.A.5.b.iii', 'Gaseous'], 3],
-            ['Other Fossil Fuels', ['1.A.5.b.iii', 'OtherFF'], 3],
-            ['Biomass', ['1.A.5.b.iii', 'Biomass'], 3],
-            # CZE, new in 2022
-            ['i. Mobile (aviation component)', ['1.A.5.b.vi', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.vi', 'Liquid'], 3],
-            ['iii. Mobile (other)', ['1.A.5.b.iv', 'Total'], 2],
-            ['Liquid Fuels', ['1.A.5.b.iv', 'Liquid'], 3],
-            # Information Item
-            ['Information item:(15)', ['\IGNORE', '\IGNORE'], 0],
-            ['Waste incineration with energy recovery included as:', ['\IGNORE', '\IGNORE'], 1],
-            ['Biomass(6)', ['\IGNORE', '\IGNORE'], 1],
-            ['Fossil fuels(4)', ['\IGNORE', '\IGNORE'], 1],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': "CH4",
-            'EMISSIONS CO2(2)': "CO2",
-            'EMISSIONS N2O': "N2O",
-        },
-    },  # tested
-    "Table1.B.1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 19,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA Amount of fuel produced',
-                'IMPLIED EMISSION FACTORS CH4(1)',
-                'IMPLIED EMISSION FACTORS CO2',
-                'EMISSIONS CH4 Recovery/Flaring(2)',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. B. 1. a. Coal mining and handling', ['1.B.1.a'], 0],
-            ['i. Underground mines(4)', ['1.B.1.a.i'], 1],
-            ['Mining activities', ['1.B.1.a.i.1'], 2],
-            ['Post-mining activities', ['1.B.1.a.i.2'], 2],
-            ['Abandoned underground mines', ['1.B.1.a.i.3'], 2],
-            ['ii. Surface mines(4)', ['1.B.1.a.ii'], 1],
-            ['Mining activities', ['1.B.1.a.ii.1'], 2],
-            ['Post-mining activities', ['1.B.1.a.ii.2'], 2],
-            ['1. B. 1. b. Solid fuel transformation(5)', ['1.B.1.b'], 0],
-            ['1. B. 1. c. Other (please specify)(6)', ['1.B.1.c'], 0],
-            ['Flaring', ['1.B.1.c.i'], 1],  # UKR, AUS
-            ['Flaring of gas', ['1.B.1.c.i'], 1],  # SWE
-            ['Coal Dumps', ['1.B.1.c.ii'], 1],  # JPN
-            ['Uncontrolled combustion and burning coal dumps', ['1.B.1.c.ii'], 1],
-            # JPN since 2023
-            ['SO2 scrubbing', ['1.B.1.c.iii'], 1],  # SVN
-            ['Flaring of coke oven gas', ['1.B.1.c.iv'], 1],  # KAZ
-            ['Emisson from Coke Oven Gas Subsystem', ['1.B.1.c.iv'], 1],  # POL
-            ['Other', ['1.B.1.c.v'], 1],  # ROU, new in 2022
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(3)': 'CH4',
-            'EMISSIONS CO2 Emissions': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.B.2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 33,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA(1) Description(1)',
-                'ACTIVITY DATA(1) Unit(1)',
-                'ACTIVITY DATA(1) Value',
-                'IMPLIED EMISSION FACTORS CO2(2)',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-                'EMISSIONS CO2 Amount captured',
-            ],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. B. 2. a. Oil(6)', ['1.B.2.a'], 0],
-            ['1. Exploration', ['1.B.2.a.1'], 1],
-            ['2. Production(7)', ['1.B.2.a.2'], 1],
-            ['3. Transport', ['1.B.2.a.3'], 1],
-            ['4. Refining/storage', ['1.B.2.a.4'], 1],
-            ['5. Distribution of oil products', ['1.B.2.a.5'], 1],
-            ['6. Other', ['1.B.2.a.6'], 1],
-            ['1. B. 2. b. Natural gas', ['1.B.2.b'], 0],
-            ['1. Exploration', ['1.B.2.b.1'], 1],
-            ['2. Production(7)', ['1.B.2.b.2'], 1],
-            ['3. Processing', ['1.B.2.b.3'], 1],
-            ['4. Transmission and storage', ['1.B.2.b.4'], 1],
-            ['5. Distribution', ['1.B.2.b.5'], 1],
-            ['6. Other', ['1.B.2.b.6'], 1],
-            ['1. B. 2. c. Venting and flaring', ['1.B.2.c'], 0],
-            ['Venting', ['1.B.2.c-ven'], 1],
-            ['i. Oil', ['1.B.2.c-ven.i'], 2],
-            ['ii. Gas', ['1.B.2.c-ven.ii'], 2],
-            ['iii. Combined', ['1.B.2.c-ven.iii'], 2],
-            ['Flaring(8)', ['1.B.2.c-fla'], 1],
-            ['i. Oil', ['1.B.2.c-fla.i'], 2],
-            ['ii. Gas', ['1.B.2.c-fla.ii'], 2],
-            ['iii. Combined', ['1.B.2.c-fla.iii'], 2],
-            ['1.B.2.d. Other (please specify)(9)', ['1.B.2.d'], 0],
-            ['Groundwater extraction and CO2 mining', ['1.B.2.d.i'], 1],  # HUN
-            ['Geothermal', ['1.B.2.d.ii'], 1],  # NOR, DEU, PRT, NZL
-            ['Geothermal Energy', ['1.B.2.d.ii'], 1],  # ISL
-            ['Geothermal Generation', ['1.B.2.d.ii'], 1],  # JPN
-            ['Geotherm', ['1.B.2.d.ii'], 1],  # ITA
-            ['City Gas Production', ['1.B.2.d.iii'], 1],  # PRT
-            ['Other', ['1.B.2.d.iv'], 1],  # UKR, ROU
-            ['Other non-specified', ['1.B.2.d.iv'], 1],  # SWE
-            ['Flaring in refineries', ['1.B.2.d.v'], 1],  # ITA
-            ['LPG transport', ['1.B.2.d.vi'], 1],  # GRC
-            ['Distribution of town gas', ['1.B.2.d.vii'], 1],  # FIN
-            ['Petrol distribution', ['1.B.2.d.viii'], 1],  # IRL
-            ['Natural Gas Transport', ['1.B.2.d.ix'], 1],  # BLR
-            ['Natural gas exploration - N2O emissions', ['1.B.2.d.x'], 1],  # GBR, GBK
-            ['flue gas desulfurisation', ['1.B.2.d.xi'], 1],  # GBR, GBK, new in 2022
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 (4) Amount captured': 'CH4',
-            'EMISSIONS CO2 Emissions(3)': 'CO2',
-            'EMISSIONS N2O Amount captured': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.C": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 24,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA CO2 transported or injected(1)',
-                'IMPLIED EMISSION FACTORS CO2',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Transport of CO2', ['1.C.1']],
-            ['a. Pipelines', ['1.C.1.a']],
-            ['b. Ships', ['1.C.1.b']],
-            ['c. Other', ['1.C.1.c']],
-            ['2. Injection and storage(3)', ['1.C.2']],
-            ['a. Injection', ['1.C.2.a']],
-            ['b. Storage', ['1.C.2.b']],
-            ['3. Other', ['1.C.3']],
-            ['Information item(4, 5)', ['\IGNORE']],
-            ['Total amount captured for storage', ['M.Info.A.TACS']],
-            ['Total amount of imports for storage', ['M.Info.A.TAIS']],
-            ['Total A', ['M.Info.A']],
-            ['Total amount of exports for storage', ['M.Info.B.TAES']],
-            ['Total amount of CO2 injected at storage sites', ['M.Info.B.TAI']],
-            ['Total leakage from transport, injection and storage', ['M.Info.B.TLTIS']],
-            ['Total B', ['M.Info.B']],
-            ['Difference (A-B)(6)', ['\IGNORE']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CO2(2)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table1.D": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 20,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category", "class"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(I)s1": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 31,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["industry"],
-        },
-        "sector_mapping": [
-            ['Total industrial processes', ['2']],
-            ['A. Mineral industry', ['2.A']],
-            ['1. Cement production', ['2.A.1']],
-            ['2. Lime production', ['2.A.2']],
-            ['3. Glass production', ['2.A.3']],
-            ['4. Other process uses of carbonates', ['2.A.4']],
-            ['B. Chemical industry', ['2.B']],
-            ['1. Ammonia production', ['2.B.1']],
-            ['2. Nitric acid production', ['2.B.2']],
-            ['3. Adipic acid production', ['2.B.3']],
-            ['4. Caprolactam, glyoxal and glyoxylic acid production', ['2.B.4']],
-            ['5. Carbide production', ['2.B.5']],
-            ['6. Titanium dioxide production', ['2.B.6']],
-            ['7. Soda ash production', ['2.B.7']],
-            ['8. Petrochemical and carbon black production', ['2.B.8']],
-            ['9. Fluorochemical production', ['2.B.9']],
-            ['10. Other (as specified in table 2(I).A-H)', ['2.B.10']],
-            ['C. Metal industry', ['2.C']],
-            ['1. Iron and steel production', ['2.C.1']],
-            ['2. Ferroalloys production', ['2.C.2']],
-            ['3. Aluminium production', ['2.C.3']],
-            ['4. Magnesium production', ['2.C.4']],
-            ['5. Lead production', ['2.C.5']],
-            ['6. Zinc production', ['2.C.6']],
-            ['7. Other (as specified in table 2(I).A-H)', ['2.C.7']],
-        ],
-        "entity_mapping": {
-            'HFCs(1)': f'HFCS ({gwp_to_use})',
-            'PFCs(1)': f'PFCS ({gwp_to_use})',
-            'Unspecified mix of HFCs and PFCs(1)': f'UnspMixOfHFCsPFCs ({gwp_to_use})',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table2(I)s2": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 29,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["industry"],
-        },
-        "sector_mapping": [
-            ['D. Non-energy products from fuels and solvent use', ['2.D']],
-            ['1. Lubricant use', ['2.D.1']],
-            ['2. Paraffin wax use', ['2.D.2']],
-            ['3. Other', ['2.D.3']],
-            ['E. Electronics industry', ['2.E']],
-            ['1. Integrated circuit or semiconductor', ['2.E.1']],
-            ['2. TFT flat panel display', ['2.E.2']],
-            ['3. Photovoltaics', ['2.E.3']],
-            ['4. Heat transfer fluid', ['2.E.4']],
-            ['5. Other (as specified in table 2(II))', ['2.E.5']],
-            ['F. Product uses as substitutes for ODS(2)', ['2.F']],
-            ['1. Refrigeration and air conditioning', ['2.F.1']],
-            ['2. Foam blowing agents', ['2.F.2']],
-            ['3. Fire protection', ['2.F.3']],
-            ['4. Aerosols', ['2.F.4']],
-            ['5. Solvents', ['2.F.5']],
-            ['6. Other applications', ['2.F.6']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['1. Electrical equipment', ['2.G.1']],
-            ['2. SF6 and PFCs from other product use', ['2.G.2']],
-            ['3. N2O from product uses', ['2.G.3']],
-            ['4. Other', ['2.G.4']],
-            ['H. Other (as specified in tables 2(I).A-H and 2(II))(3)', ['2.H']],
-        ],
-        "entity_mapping": {
-            'HFCs(1)': f'HFCS ({gwp_to_use})',
-            'PFCs(1)': f'PFCS ({gwp_to_use})',
-            'Unspecified mix of HFCs and PFCs(1)': f'UnspMixOfHFCsPFCs ({gwp_to_use})',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table2(I).A-Hs1": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 40,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(I).A-Hs2": {
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 36,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table2(II)": {
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 38,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["fgases"],
-        },
-        "sector_mapping": [
-            ['Total actual emissions of halocarbons (by chemical) and SF6', ['2']],
-            ['B. Chemical industry', ['2.B']],
-            ['9. Flurochemical production', ['2.B.9']],
-            ['By-product emissions', ['2.B.9.a']],
-            ['Fugitive emissions', ['2.B.9.b']],
-            ['10. Other', ['2.B.10']],
-            ['C. Metal industry', ['2.C']],
-            ['3. Aluminium production', ['2.C.3']],
-            ['4. Magnesium production', ['2.C.4']],
-            ['7. Other', ['2.C.7']],
-            ['E. Electronics industry', ['2.E']],
-            ['1. Integrated circuit or semiconductor', ['2.E.1']],
-            ['2. TFT flat panel display', ['2.E.2']],
-            ['3. Photovoltaics', ['2.E.3']],
-            ['4. Heat transfer fluid', ['2.E.4']],
-            ['5. Other (as specified in table 2(II))', ['2.E.5']],
-            ['F. Product uses as substitutes for ODS(2)', ['2.F']],
-            ['1. Refrigeration and air conditioning', ['2.F.1']],
-            ['2. Foam blowing agents', ['2.F.2']],
-            ['3. Fire protection', ['2.F.3']],
-            ['4. Aerosols', ['2.F.4']],
-            ['5. Solvents', ['2.F.5']],
-            ['6. Other applications', ['2.F.6']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['1. Electrical equipment', ['2.G.1']],
-            ['2. SF6 and PFCs from other product use', ['2.G.2']],
-            ['4. Other', ['2.G.4']],
-            ['H. Other (please specify)', ['2.H']],
-            ['2.H.1 Pulp and paper', ['2.H.1']],
-            ['2.H.2 Food and beverages industry', ['2.H.2']],
-            ['2.H.3 Other (please specify)', ['2.H.3']],
-        ],
-        "entity_mapping": {
-            'C 3F8': 'C3F8',
-            #'C10F18' 'C2F6' 'C4F10' 'C5F12' 'C6F14' 'CF4'
-            'HFC-125': 'HFC125',
-            'HFC-134': 'HFC134',
-            'HFC-134a': 'HFC134a',
-            'HFC-143': 'HFC143',
-            'HFC-143a': 'HFC143a',
-            'HFC-152': 'HFC152',
-            'HFC-152a': 'HFC152a',
-            'HFC-161': 'HFC161',
-            'HFC-227ea': 'HFC227ea',
-            'HFC-23': 'HFC23',
-            'HFC-236cb': 'HFC236cb',
-            'HFC-236ea': 'HFC236ea',
-            'HFC-236fa': 'HFC236fa',
-            'HFC-245ca': 'HFC245ca',
-            'HFC-245fa': 'HFC245fa',
-            'HFC-32': 'HFC32',
-            'HFC-365mfc': 'HFC365mfc',
-            'HFC-41': 'HFC41',
-            'HFC-43-10mee': 'HFC4310mee',
-            'Unspecified mix of HFCs (1)': f'UnspMixOfHFCs ({gwp_to_use})',
-            'Unspecified mix of HFCs and PFCs(1)': f'UnspMixOfHFCsPFCs ({gwp_to_use})',
-            'Unspecified mix of PFCs (1)': f'UnspMixOfPFCs ({gwp_to_use})',
-            'c-C3F6': 'cC3F6',
-            'c-C4F8': 'cC4F8',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3s1": {  # Agriculture summary sheet 1
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 75,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['3. Total agriculture', ['3'], 0],
-            # I. Livestock
-            ['I. Livestock', ['M.3.LV'], 1],
-            # A. Enteric fermentation
-            ['A. Enteric fermentation', ['3.A'], 2],
-            ['1. Cattle(1)', ['3.A.1'], 3],
-            ['Option A:', ['\IGNORE'], 4],
-            ['Dairy cattle', ['3.A.1.Aa'], 5],
-            ['Non-dairy cattle', ['3.A.1.Ab'], 5],
-            ['Option B:', ['\IGNORE'], 4],
-            ['Mature dairy cattle', ['3.A.1.Ba'], 5],
-            ['Other mature cattle', ['3.A.1.Bb'], 5],
-            ['Growing cattle', ['3.A.1.Bc'], 5],
-            ['Option C (country-specific):', ['\IGNORE'], 4],
-            # all countries not specified explcitly
-            ['\C!-AUS-MLT-LUX-POL-SVN-USA\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            # Australia
-            ['\C-AUS\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-AUS\ Dairy Cattle', ['3.A.1.C-AUS-a'], 6],
-            ['\C-AUS\ Beef Cattle - Pasture', ['3.A.1.C-AUS-b'], 6],
-            ['\C-AUS\ Beef Cattle - Feedlot', ['3.A.1.C-AUS-c'], 6],
-            # malta
-            ['\C-MLT\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-MLT\ dairy cows', ['3.A.1.C-MLT-a'], 6],
-            ['\C-MLT\ non-lactating cows', ['3.A.1.C-MLT-b'], 6],
-            ['\C-MLT\ bulls', ['3.A.1.C-MLT-c'], 6],
-            ['\C-MLT\ calves', ['3.A.1.C-MLT-d'], 6],
-            ['\C-MLT\ growing cattle 1-2 years', ['3.A.1.C-MLT-e'], 6],
-            # Luxembourg
-            ['\C-LUX\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-LUX\ Bulls', ['3.A.1.C-LUX-a'], 6],
-            ['\C-LUX\ Calves', ['3.A.1.C-LUX-b'], 6],
-            ['\C-LUX\ Young Cattle', ['3.A.1.C-LUX-c'], 6],
-            ['\C-LUX\ Suckler Cows', ['3.A.1.C-LUX-d'], 6],
-            ['\C-LUX\ Bulls under 2 years', ['3.A.1.C-LUX-e'], 6],
-            ['\C-LUX\ Dairy Cows', ['3.A.1.C-LUX-f'], 6],
-            # Poland
-            ['\C-POL\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-POL\ Bulls (older than 2 years)', ['3.A.1.C-POL-a'], 6],
-            ['\C-POL\ Non-dairy Heifers (older than 2 years)', ['3.A.1.C-POL-b'], 6],
-            ['\C-POL\ Non-dairy Young Cattle (younger than 1 year)', ['3.A.1.C-POL-c'], 6],
-            ['\C-POL\ Dairy Cattle', ['3.A.1.C-POL-d'], 6],
-            ['\C-POL\ Non-dairy Young Cattle (1-2 years)', ['3.A.1.C-POL-e'], 6],
-            # Slovenia
-            ['\C-SVN\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-SVN\ Dairy cows', ['3.A.1.C-SVN-a'], 6],
-            ['\C-SVN\ Non-dairy cattle', ['3.A.1.C-SVN-b'], 6],
-            ['\C-SVN\ Other cows', ['3.A.1.C-SVN-c'], 6],
-            # USA
-            ['\C-USA\ Other (as specified in table 3(I).A)', ['3.A.1.C'], 5],
-            ['\C-USA\ Steer Stocker', ['3.A.1.C-USA-a'], 6],
-            ['\C-USA\ Heifer Stocker', ['3.A.1.C-USA-b'], 6],
-            ['\C-USA\ Beef Cows', ['3.A.1.C-USA-c'], 6],
-            ['\C-USA\ Dairy Replacements', ['3.A.1.C-USA-d'], 6],
-            ['\C-USA\ Beef Replacements', ['3.A.1.C-USA-e'], 6],
-            ['\C-USA\ Steer Feedlot', ['3.A.1.C-USA-f'], 6],
-            ['\C-USA\ Heifer Feedlot', ['3.A.1.C-USA-g'], 6],
-            ['\C-USA\ Bulls', ['3.A.1.C-USA-h'], 6],
-            ['\C-USA\ Dairy Cows', ['3.A.1.C-USA-i'], 6],
-            ['\C-USA\ Beef Calves', ['3.A.1.C-USA-j'], 6],
-            ['\C-USA\ Dairy Calves', ['3.A.1.C-USA-k'], 6],
-            # Other livestock
-            ['2. Sheep', ['3.A.2'], 3],
-            ['3. Swine', ['3.A.3'], 3],
-            ['4. Other livestock', ['3.A.4'], 3],
-            ['Buffalo', ['3.A.4.a'], 4],
-            ['Camels', ['3.A.4.b'], 4],
-            ['Deer', ['3.A.4.c'], 4],
-            ['Goats', ['3.A.4.d'], 4],
-            ['Horses', ['3.A.4.e'], 4],
-            ['Mules and Asses', ['3.A.4.f'], 4],
-            ['Poultry', ['3.A.4.g'], 4],
-            ['Other (please specify)', ['3.A.4.h'], 4],
-            ['Rabbit', ['3.A.4.h.i'], 5],
-            ['Reindeer', ['3.A.4.h.ii'], 5],
-            ['Ostrich', ['3.A.4.h.iii'], 5],
-            ['Fur-bearing Animals', ['3.A.4.h.iv'], 5],
-            ['Other', ['3.A.4.h.v'], 5],
-            # Manure Management
-            ['B. Manure management', ['3.B'], 2],
-            ['1. Cattle(1)', ['3.B.1'], 3],
-            ['Option A:', ['\IGNORE'], 4],
-            ['Dairy cattle', ['3.B.1.Aa'], 5],
-            ['Non-dairy cattle', ['3.B.1.Ab'], 5],
-            ['Option B:', ['\IGNORE'], 4],
-            ['Mature dairy cattle', ['3.B.1.Ba'], 5],
-            ['Other mature cattle', ['3.B.1.Bb'], 5],
-            ['Growing cattle', ['3.B.1.Bc'], 5],
-            ['Option C (country-specific):', ['\IGNORE'], 4],
-            # all countries not specified explicitly
-            ['\C!-AUS-MLT-LUX-POL-SVN-USA\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            # Australia
-            ['\C-AUS\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-AUS\ Dairy Cattle', ['3.B.1.C-AUS-a'], 6],
-            ['\C-AUS\ Beef Cattle - Pasture', ['3.B.1.C-AUS-b'], 6],
-            ['\C-AUS\ Beef Cattle - Feedlot', ['3.B.1.C-AUS-c'], 6],
-            # Malta
-            ['\C-MLT\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-MLT\ dairy cows', ['3.B.1.C-MLT-a'], 6],
-            ['\C-MLT\ non-lactating cows', ['3.B.1.C-MLT-b'], 6],
-            ['\C-MLT\ bulls', ['3.B.1.C-MLT-c'], 6],
-            ['\C-MLT\ calves', ['3.B.1.C-MLT-d'], 6],
-            ['\C-MLT\ growing cattle 1-2 years', ['3.B.1.C-MLT-e'], 6],
-            # Luxembourg
-            ['\C-LUX\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-LUX\ Bulls', ['3.B.1.C-LUX-a'], 6],
-            ['\C-LUX\ Calves', ['3.B.1.C-LUX-b'], 6],
-            ['\C-LUX\ Young Cattle', ['3.B.1.C-LUX-c'], 6],
-            ['\C-LUX\ Suckler Cows', ['3.B.1.C-LUX-d'], 6],
-            ['\C-LUX\ Bulls under 2 years', ['3.B.1.C-LUX-e'], 6],
-            ['\C-LUX\ Dairy Cows', ['3.B.1.C-LUX-f'], 6],
-            # Poland
-            ['\C-POL\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-POL\ Non-dairy Cattle', ['3.B.1.C-POL-a'], 6],
-            ['\C-POL\ Dairy Cattle', ['3.B.1.C-POL-b'], 6],
-            # Slovenia
-            ['\C-SVN\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-SVN\ Dairy cows', ['3.B.1.C-SVN-a'], 6],
-            ['\C-SVN\ Non-dairy cattle', ['3.B.1.C-SVN-b'], 6],
-            ['\C-SVN\ Other cows', ['3.B.1.C-SVN-c'], 6],
-            # USA
-            ['\C-USA\ Other (as specified in table 3(I).B)', ['3.B.1.C'], 5],
-            ['\C-USA\ Dairy Cattle', ['\IGNORE'], 6],
-            ['\C-USA\ Non-Dairy Cattle', ['\IGNORE'], 6],
-            ['\C-USA\ Steer Stocker', ['3.B.1.C-USA-a'], 6],
-            ['\C-USA\ Heifer Stocker', ['3.B.1.C-USA-b'], 6],
-            ['\C-USA\ Beef Cows', ['3.B.1.C-USA-c'], 6],
-            ['\C-USA\ Dairy Replacements', ['3.B.1.C-USA-d'], 6],
-            ['\C-USA\ Beef Replacements', ['3.B.1.C-USA-e'], 6],
-            ['\C-USA\ Steer Feedlot', ['3.B.1.C-USA-f'], 6],
-            ['\C-USA\ Heifer Feedlot', ['3.B.1.C-USA-g'], 6],
-            ['\C-USA\ Bulls', ['3.B.1.C-USA-h'], 6],
-            ['\C-USA\ Dairy Cows', ['3.B.1.C-USA-i'], 6],
-            ['\C-USA\ Beef Calves', ['3.B.1.C-USA-j'], 6],
-            ['\C-USA\ Dairy Calves', ['3.B.1.C-USA-k'], 6],
-            # other animals
-            ['2. Sheep', ['3.B.2'], 3],
-            ['3. Swine', ['3.B.3'], 3],
-            ['4. Other livestock', ['3.B.4'], 3],
-            ['Buffalo', ['3.B.4.a'], 4],
-            ['Camels', ['3.B.4.b'], 4],
-            ['Deer', ['3.B.4.c'], 4],
-            ['Goats', ['3.B.4.d'], 4],
-            ['Horses', ['3.B.4.e'], 4],
-            ['Mules and Asses', ['3.B.4.f'], 4],
-            ['Poultry', ['3.B.4.g'], 4],
-            ['Other (please specify)', ['3.B.4.h'], 4],
-            ['Rabbit', ['3.B.4.h.i'], 5],
-            ['Reindeer', ['3.B.4.h.ii'], 5],
-            ['Ostrich', ['3.B.4.h.iii'], 5],
-            ['Fur-bearing Animals', ['3.B.4.h.iv'], 5],
-            ['Other', ['3.B.4.h.v'], 5],
-            ['5. Indirect N2O emissions', ['3.B.5'], 3],
-        ],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3s2": {  # Agriculture summary sheet 2
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 18,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": [".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['C. Rice cultivation', ['3.C']],
-            ['D. Agricultural soils(2) (3) (4)', ['3.D']],
-            ['E. Prescribed burning of savannahs', ['3.E']],
-            ['E. Prescribed burning of savannas', ['3.E']],
-            ['F. Field burning of agricultural residues', ['3.F']],
-            ['G. Liming', ['3.G']],
-            ['H. Urea application', ['3.H']],
-            ['I. Other carbon-containing fertilizers', ['3.I']],
-            ['J. Other (please specify)', ['3.J']],
-            ['NOx from Manure Management', ['3.J.1']],
-            ['3.B NOx Emissions', ['3.J.1']],
-            ['NOx from 3B', ['3.J.1']],
-            ['NOX emissions from manure management', ['3.J.1']],
-            ['NOx from manure management', ['3.J.1']],
-            ['Other', ['3.J.2']],
-            ['Other UK emissions', ['3.J.2']],
-            ['Other non-specified', ['3.J.2']],
-            ['OTs and CDs - Livestock', ['3.J.3']],
-            ['OTs and CDs - soils', ['3.J.4']],
-            ['OTs and CDs - other', ['3.J.5']],
-            ['Digestate renewable raw material (storage of N)', ['3.J.6']],
-            ['Digestate renewable raw material (atmospheric deposition)', ['3.J.7']],
-            ['Digestate renewable raw material (storage of dry matter)', ['3.J.8']],
-            ['NOx from Livestock', ['3.J.9']],
-        ],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.C": {  # rice cultivation details
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 21,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Harvested area(2)',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Organic amendments added(3)',
-                'IMPLIED EMISSION FACTOR (1) CH4',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Irrigated', ['3.C.1']],
-            ['Continuously flooded', ['3.C.1.a']],
-            ['Intermittently flooded Single aeration', ['3.C.1.b.i']],
-            ['Intermittently flooded Multiple aeration', ['3.C.1.b.ii']],
-            ['2. Rainfed', ['3.C.2']],
-            ['Flood prone', ['3.C.2.a']],
-            ['Drought prone', ['3.C.2.b']],
-            ['3. Deep water', ['3.C.3']],
-            ['Water depth 50–100 cm', ['3.C.3.a']],
-            ['Water depth > 100 cm', ['3.C.3.b']],
-            ['4. Other (please specify)', ['3.C.4']],
-            ['Non-specified', ['3.C.4.a']],  # EST
-            ['Other', ['3.C.4.a']],  # DEU
-            ['other', ['3.C.4.a']],  # LVA
-            ['Other cultivation', ['3.C.4.a']],  # CZE
-            ['Upland rice(4)', ['\IGNORE']],
-            ['Total(4)', ['\IGNORE']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4': 'CH4',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.D": {  # direct and indirect N2O from soils
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 21,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                "ACTIVITY DATA AND OTHER RELATED INFORMATION Description",
-                "ACTIVITY DATA AND OTHER RELATED INFORMATION Value",
-                "IMPLIED EMISSION FACTORS Value",
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['a. Direct N2O emissions from managed soils', ['3.D.a']],
-            ['1. Inorganic N fertilizers(3)', ['3.D.a.1']],
-            ['2. Organic N fertilizers(3)', ['3.D.a.2']],
-            ['a. Animal manure applied to soils', ['3.D.a.2.a']],
-            ['b. Sewage sludge applied to soils', ['3.D.a.2.b']],
-            ['c. Other organic fertilizers applied to soils', ['3.D.a.2.c']],
-            ['3. Urine and dung deposited by grazing animals', ['3.D.a.3']],
-            ['4. Crop residues', ['3.D.a.4']],
-            ['5. Mineralization/immobilization associated with loss/gain of soil organic matter (4)(5)', ['3.D.a.5']],
-            ['6. Cultivation of organic soils (i.e. histosols)(2)', ['3.D.a.6']],
-            ['7. Other', ['3.D.a.7']],
-            ['b. Indirect N2O Emissions from managed soils', ['3.D.b']],
-            ['1. Atmospheric deposition(6)', ['3.D.b.1']],
-            ['2. Nitrogen leaching and run-off', ['3.D.b.2']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.E": {  # savanna burning details
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 14,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Area of savanna burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Average above-ground biomass density',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Biomass burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Fraction of savanna burned',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Nitrogen fraction in biomass',
-                'IMPLIED EMISSION FACTORS CH4',
-                'IMPLIED EMISSION FACTORS N2O',
-            ],
-            "stop_cats": ["", ".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Forest land (specify ecological zone)(1)', ['3.E.1'], 0],
-            ['Savanna Grassland', ['3.E.1.b'], 1],  # AUS
-            ['Savanna Woodland', ['3.E.1.a'], 1],  # AUS
-            ['Forest land', ['3.E.1.a'], 1],  # SWE, CHE, CZE, HRV
-            ['Luxembourg', ['3.E.1.c'], 1],  # LUX
-            ['Other non-specified', ['3.E.1.d'], 1],  # EST
-            ['All', ['3.E.1.d'], 1],  # DNK, DNM, DKE
-            ['Unspecified', ['3.E.1.d'], 1],  # DEU
-            ['forest land', ['3.E.1.a'], 1],  # MLT
-            ['Zone', ['3.E.1.d'], 1],  # LVA
-            ['Grassland (specify ecological zone)(1)', ['3.E.2'], 0],
-            ['Savanna Woodland', ['3.E.2.a'], 1],  # AUS
-            ['Savanna Grassland', ['3.E.2.b'], 1],  # AUS
-            ['Temperate Grassland', ['3.E.2.c'], 1],  # AUS
-            ['Grassland', ['3.E.2.d'], 1],  # SWE, CHE, CZE, HRV
-            ['Luxembourg', ['3.E.2.e'], 1],  # LUX
-            ['Other non-specified', ['3.E.2.f'], 1],  # EST
-            ['All', ['3.E.2.f'], 1],  # DNK, DNM, DKE
-            ['Unspecified', ['3.E.2.f'], 1],  # DEU
-            ['Tussock', ['3.E.2.g'], 1],  # NZL
-            ['grassland', ['3.E.2.d'], 1],  # MLT
-            ['Zone_', ['3.E.2.f'], 1],  # LVA
-        ],
-        "entity_mapping": {
-            'EMISSIONS (2) CH4': 'CH4',
-            'EMISSIONS (2) N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table3.F": {  # field burning details
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 30,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table3.G-I": {  # liming, urea, carbon containing fertilizer
-        "status": "TODO",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 13,
-            "header": ['group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-
-        ],
-        "entity_mapping": [],
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # TODO
-    "Table4": {  # LULUCF overview
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 29,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", ".", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['4. Total LULUCF', ['4']],
-            ['A. Forest land', ['4.A']],
-            ['1. Forest land remaining forest land', ['4.A.1']],
-            ['2. Land converted to forest land', ['4.A.2']],
-            ['B. Cropland', ['4.B']],
-            ['1. Cropland remaining cropland', ['4.B.1']],
-            ['2. Land converted to cropland', ['4.B.2']],
-            ['C. Grassland', ['4.C']],
-            ['1. Grassland remaining grassland', ['4.C.1']],
-            ['2. Land converted to grassland', ['4.C.2']],
-            ['D. Wetlands(3)', ['4.D']],
-            ['1. Wetlands remaining wetlands', ['4.D.1']],
-            ['2. Land converted to wetlands', ['4.D.2']],
-            ['E. Settlements', ['4.E']],
-            ['1. Settlements remaining settlements', ['4.E.1']],
-            ['2. Land converted to settlements', ['4.E.2']],
-            ['F. Other land (4)', ['4.F']],
-            ['1. Other land remaining other land', ['4.F.1']],
-            ['2. Land converted to other land', ['4.F.2']],
-            ['G. Harvested wood products (5)', ['4.G']],
-            ['H. Other (please specify)', ['4.H']],
-            ['Land converted to Settlement', ['4.H.1']],
-            ['Reservoir of Petit-Saut in French Guiana', ['4.H.5']],
-            ['Biogenic NMVOCs from managed forest', ['4.H.4']],
-            ['All other', ['4.H.9']],
-            ['Luxembourg', ['4.H.8']],
-            ['Settlements Remaining Settlements', ['4.H.2']],
-            ['4.E Settlements', ['4.H.2']],
-            ['4.C Grassland', ['4.H.3']],
-            ['Settlements', ['4.H.2']],
-            ['Other', ['4.H.9']],
-            ['N2O Emissions from Aquaculture Use', ['4.H.6']],
-            ['CH4 from artificial water bodies', ['4.H.7']],
-        ],
-        "entity_mapping": {
-            'CH4(2)': 'CH4',
-            'N2O(2)': 'N2O',
-            'Net CO2 emissions/removals(1), (2)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    # TODO: all other LULUCF tables
-    "Table5": {  # Waste overview
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 27,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['Total waste', ['5']],
-            ['A. Solid waste disposal', ['5.A']],
-            ['1. Managed waste disposal sites', ['5.A.1']],
-            ['2. Unmanaged waste disposal sites', ['5.A.2']],
-            ['3. Uncategorized waste disposal sites', ['5.A.3']],
-            ['B. Biological treatment of solid waste', ['5.B']],
-            ['1. Composting', ['5.B.1']],
-            ['2. Anaerobic digestion at biogas facilities', ['5.B.2']],
-            ['C. Incineration and open burning of waste', ['5.C']],
-            ['1. Waste incineration', ['5.C.1']],
-            ['2. Open burning of waste', ['5.C.2']],
-            ['D. Wastewater treatment and discharge', ['5.D']],
-            ['1. Domestic wastewater', ['5.D.1']],
-            ['2. Industrial wastewater', ['5.D.2']],
-            ['3. Other (as specified in table 5.D)', ['5.D.3']],
-            ['E. Other (please specify)', ['5.E']],
-            ['Other', ['5.E.5']],  # EST, NOR
-            ['Recycling activities', ['5.E.1']],  # NLD
-            ['Mechanical-Biological Treatment MBT', ['5.E.2']],  # DEU
-            ['Accidental fires', ['5.E.3']],  # DEU, DKE, DNK, DNM
-            ['Decomposition of Petroleum-Derived Surfactants', ['5.E.4']],  # JPN
-            ['Decomposition of Fossil-fuel Derived Surfactants', ['5.E.4']],
-            # JPN since 2023
-            ['Other non-specified', ['5.E.5']],  # USA
-            ['Biogas burning without energy recovery', ['5.E.6']],  # PRT
-            ['Sludge spreading', ['5.E.7']],  # ESP
-            ['Accidental combustion', ['5.E.3']],  # ESP
-            ['Other waste', ['5.E.5']],  # CZE
-            ['5.E.1 Industrial Wastewater', ['5.E.8']],  # CAN, new in 2022
-            ['Accidental Fires at SWDS', ['5.E.9']],  # AUS, new in 2022
-            ['Memo item:(2)', ['\IGNORE']],
-            ['Long-term storage of C in waste disposal sites', ['M.Memo.LTSW']],
-            ['Annual change in total long-term C storage', ['M.Memo.ACLT']],
-            ['Annual change in total long-term C storage in HWP waste(3)', ['M.Memo.ACLTHWP']],
-        ],
-        "entity_mapping": {
-            'CO2(1)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested; memo items not read because of empty lines
-    "Table5.A": {  # solid waste disposal
-        "status": "tested",
-        "table": {
-            "firstrow": 6,
-            "lastrow": 15,
-            "header": ['group', 'group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES Annual waste at the SWDS',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES MCF',
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION SINK CATEGORIES DOCf',
-                'IMPLIED EMISSION FACTOR SINK CATEGORIES CH4(1)',
-                'IMPLIED EMISSION FACTOR SINK CATEGORIES CO2',
-                'EMISSIONS SINK CATEGORIES CH4 Amount of CH4 flared',
-                'EMISSIONS SINK CATEGORIES CH4 Amount of CH4 for energy recovery(3)',
-            ],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Managed waste disposal sites', ['5.A.1']],
-            ['a. Anaerobic', ['5.A.1.a']],
-            ['b. Semi-aerobic', ['5.A.1.b']],
-            ['2. Unmanaged waste disposal sites', ['5.A.2']],
-            ['3. Uncategorized waste disposal sites', ['5.A.3']],
-        ],
-        "entity_mapping": {
-            'EMISSIONS SINK CATEGORIES CH4 Emissions(2)': 'CH4',
-            'EMISSIONS SINK CATEGORIES CO2(4) Amount of CH4 for energy recovery(3)': 'CO2',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.B": {  # Biological treatment of solid waste
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 16,
-            "header": ['group', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND OTHER RELATED INFORMATION Annual waste amount treated',
-                'IMPLIED EMISSION FACTOR CH4(1)',
-                'IMPLIED EMISSION FACTOR N2O',
-                'EMISSIONS CH4 Amount of CH4 flared',
-                'EMISSIONS CH4 Amount of CH4 for energy recovery(3)',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Composting', ['5.B.1'], 0],
-            ['Municipal solid waste', ['5.B.1.a'], 1],
-            ['Other (please specify)(4)', ['5.B.1.b'], 1],
-            ['Organic wastes households', ['5.B.1.b.i'], 2],  # NLD
-            ['Organic wastes from gardens and horticulture', ['5.B.1.b.ii'], 2],  # NLD
-            ['Food and garden waste', ['5.B.1.b.ii'], 2],  # DNM, DNK, DKE
-            ['Industrial Solid Waste', ['5.B.1.b.iii'], 2],  # POL
-            ['Home composting', ['5.B.1.b.iv'], 2],  # NOR
-            ['Mixed waste', ['5.B.1.b.v'], 2],  # LTU
-            ['Other waste', ['5.B.1.b.v'], 2],  # SWE
-            ['Sludge', ['5.B.1.b.vi'], 2],  # HUN, EST
-            ['Textile', ['5.B.1.b.vii'], 2],  # EST
-            ['Wood', ['5.B.1.b.viii'], 2],  # EST
-            ['Organic', ['5.B.1.b.ix'], 2],  # EST
-            ['Paper', ['5.B.1.b.x'], 2],  # EST
-            ['Other_SW', ['5.B.1.b.v'], 2],  # CZE
-            ['MBA treated MSW', ['5.B.1.b.xi'], 2],  # LUX
-            ['Specific Agricultural and Industrial Waste', ['5.B.1.b.xii'], 2],  # UKR
-            ['Industrial solid waste and constr. waste', ['5.B.1.b.xiii'], 2],  # FIN
-            ['Municipal sludge', ['5.B.1.b.xiv'], 2],  # FIN
-            ['Industrial sludge', ['5.B.1.b.xv'], 2],  # FIN
-            ['Open air composting', ['5.B.1.b.xvi'], 2],  # LIE
-            ['Industrial Waste', ['5.B.1.b.xvii'], 2],  # JPN
-            ['Human Waste and Johkasou sludge', ['5.B.1.b.xviii'], 2],  # JPN
-            ['2. Anaerobic digestion at biogas facilities(3)', ['5.B.2'], 0],
-            ['Municipal solid waste', ['5.B.2.a'], 1],
-            ['Other (please specify)(4)', ['5.B.2.b'], 1],
-            ['Organic wastes households', ['5.B.2.b.i'], 2],  # NLD
-            ['Organic wastes from gardens and horticulture', ['5.B.2.b.ii'], 2],  # NLD
-            ['Animal manure and other organic waste', ['5.B.2.b.iii'], 2],  # DNM, DNK, DKE
-            ['sewage sludge', ['5.B.2.b.iv'], 2],  # LTU
-            ['Other waste', ['5.B.2.b.v'], 2],  # SWE
-            ['Agricultural biogas facilities', ['5.B.2.b.vi'], 2],  # CHE
-            ['Other biogases from anaerobic fermentation', ['5.B.2.b.vii'], 2],  # HUN
-            ['Sludge', ['5.B.2.b.iv'], 2],  # EST
-            ['Anaerobic Digestion On-Farm and at Wastewater Treatment Facilities', ['5.B.2.b.viii'], 2],  # USA
-            ['Other_AD', ['5.B.2.b.v'], 2],  # CZE
-            ['Biogenic waste incl. wastes from Agriculture (manure)', ['5.B.2.b.ix'], 2],  # LUX
-            ['Industrial solid waste and constr. waste', ['5.B.2.b.x'], 2],  # FIN
-            ['Municipal sludge', ['5.B.2.b.xi'], 2],  # FIN
-            ['Industrial sludge', ['5.B.2.b.xii'], 2],  # FIN
-            ['Livestock manure co-digested', ['5.B.2.b.xiii'], 2],  # DEU, new in 2022
-            ['Waste water', ['5.B.2.b.xiv'], 2],  # NOR, new in 2022
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(2)': 'CH4',
-            'EMISSIONS N2O Amount of CH4 for energy recovery(3)': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.C": {  # Waste incineration and open burning
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 38,
-            "header": ['group', 'group', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA Amount of wastes (incinerated/open burned)',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) CO2',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) CH4',
-                'IMPLIED EMISSION FACTOR Amount of wastes (incinerated/open burned) N2O',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Waste Incineration', ['5.C.1'], 0],
-            ['Biogenic (1)', ['5.C.1.a'], 1],
-            ['Municipal solid waste', ['5.C.1.a.i'], 2],
-            ['Other (please specify)(2)', ['5.C.1.a.ii'], 2],
-            ['Industrial Solid Wastes', ['5.C.1.a.ii.1'], 3],
-            ['Hazardous Waste', ['5.C.1.a.ii.2'], 3],
-            ['Clinical Waste', ['5.C.1.a.ii.3'], 3],
-            ['Sewage Sludge', ['5.C.1.a.ii.4'], 3],
-            ['Other (please specify)', ['5.C.1.a.ii.5'], 3],
-            ['Animal cremations', ['5.C.1.a.ii.5.a'], 4],  # DKE, DNK, DNM
-            ['Human cremations', ['5.C.1.a.ii.5.b'], 4],  # DKE, DNK, DNM
-            ['Cremation', ['5.C.1.a.ii.5.c'], 4],  # CHE, NOR, FRA, FRK
-            ['cremation', ['5.C.1.a.ii.5.c'], 4],  # DEU
-            ['Industrial waste', ['5.C.1.a.ii.5.d'], 4],  # NOR
-            ['Biogenic other waste', ['5.C.1.a.ii.5.e'], 4],  # EST
-            ['Biogenic waste other than Municipal Solid Waste', ['5.C.1.a.ii.5.e'], 4],  # ROU
-            ['Sludge', ['5.C.1.a.ii.5.f'], 4],  # JPN
-            ['Non-fossile liquid waste', ['5.C.1.a.ii.5.g'], 4],  # JPN
-            ['Non-biogenic', ['5.C.1.b'], 1],
-            ['Municipal solid waste', ['5.C.1.b.i'], 2],
-            ['Other (please specify)(3)', ['5.C.1.b.ii'], 2],
-            ['Industrial Solid Wastes', ['5.C.1.b.ii.1'], 3],
-            ['Hazardous Waste', ['5.C.1.b.ii.2'], 3],
-            ['Clinical Waste', ['5.C.1.b.ii.3'], 3],
-            ['Sewage Sludge', ['5.C.1.b.ii.4'], 3],
-            ['Fossil liquid waste', ['5.C.1.b.ii.5'], 3],
-            ['Other (please specify)', ['5.C.1.b.ii.6'], 3],
-            ['Quarantine and other waste', ['5.C.1.b.ii.6.a'], 4],  # NZL
-            ['Industrial waste', ['5.C.1.b.ii.6.b'], 4],  # CHE
-            ['Chemical waste', ['5.C.1.b.ii.6.c'], 4],  # GBR, GBK
-            ['Flaring in the chemical industry', ['5.C.1.a.ii.6.d'], 4],  # BEL
-            ['Sludge', ['5.C.1.a.ii.6.e'], 4],  # JPN
-            ['Solvents', ['5.C.1.a.ii.6.f'], 4],  # GRC, AUS
-            ['2. Open burning of waste', ['5.C.2'], 0],
-            ['Biogenic (1)', ['5.C.2.a'], 1],
-            ['Municipal solid waste', ['5.C.2.a.i'], 2],
-            ['Other (please specify)', ['5.C.2.a.ii'], 2],
-            ['agricultural waste', ['5.C.2.a.ii.1'], 3],  # ITA
-            ['Agricultural residues', ['5.C.2.a.ii.1'], 3],  # ESP
-            ['Agriculture residues', ['5.C.2.a.ii.1'], 3],  # PRT new in 2023
-            ['Natural residues', ['5.C.2.a.ii.2'], 3],  # CHE
-            ['Wood waste', ['5.C.2.a.ii.3'], 3],  # GBR, GBK
-            ['Bonfires etc.', ['5.C.2.a.ii.4'], 3],  # DEU
-            ['Bonfires', ['5.C.2.a.ii.4'], 3],  # NLD, ISL
-            ['Other', ['5.C.2.a.ii.5'], 3],  # EST
-            ['Other waste', ['5.C.2.a.ii.5'], 3],  # CZE
-            ['Waste', ['5.C.2.a.ii.5'], 3],  # GBR new in 2023
-            ['Industrial Solid Waste', ['5.C.2.a.ii.6'], 3],  # JPN
-            ['Vine', ['5.C.2.a.ii.7'], 3], # AUT new in 2023
-            ['Non-biogenic', ['5.C.2.b'], 1],
-            ['Municipal solid waste', ['5.C.2.b.i'], 2],
-            ['Other (please specify)', ['5.C.2.b.ii'], 2],
-            ['Rural waste', ['5.C.2.b.ii.1'], 3],  # NZL
-            ['Accidental fires (vehicles)', ['5.C.2.b.ii.2'], 3],  # GBR, GBK
-            ['Accidental fires (buildings)', ['5.C.2.b.ii.3'], 3],  # GBR, GBK
-            ['Bonfires', ['5.C.2.b.ii.4'], 3],  # ISL
-            ['Other', ['5.C.2.b.ii.5'], 3],  # EST
-            ['Other waste', ['5.C.2.b.ii.5'], 3],  # CZE
-            ['Waste', ['5.C.2.b.ii.5'], 3],  # GBR new in 2023
-            ['Industrial Solid Waste', ['5.C.2.b.ii.6'], 3],  # JPN
-        ],
-        "entity_mapping": {
-            'EMISSIONS Amount of wastes (incinerated/open burned) CH4': 'CH4',
-            'EMISSIONS Amount of wastes (incinerated/open burned) CO2': 'CO2',
-            'EMISSIONS Amount of wastes (incinerated/open burned) N2O': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Table5.D": {  # Waste incineration and open burning
-        "status": "tested",
-        "table": {
-            "firstrow": 5,
-            "lastrow": 13,
-            "header": ['group', 'entity', 'entity', 'entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [
-                'ACTIVITY DATA AND RELATED INFORMATION Total organic product',
-                'ACTIVITY DATA AND RELATED INFORMATION Sludge removed(1)',
-                'ACTIVITY DATA AND RELATED INFORMATION Sludge removed(1) N in effluent',
-                'IMPLIED EMISSION FACTOR CH4(2) N in effluent',
-                'IMPLIED EMISSION FACTOR N2O(3) N in effluent',
-                'EMISSIONS CH4 Amount of CH4 flared',
-                'EMISSIONS CH4 Amount of CH4 for Energy Recovery(5)',
-            ],
-            "stop_cats": [".", "", np.nan],
-            "unit_info": unit_info["default"],
-        },
-        "sector_mapping": [
-            ['1. Domestic wastewater', ['5.D.1']],
-            ['2. Industrial wastewater', ['5.D.2']],
-            ['3. Other (please specify)', ['5.D.3']],
-            ['Other', ['5.D.3.a']],  # EST
-            ['Septic tanks', ['5.D.3.b']],  # NLD
-            ['Wastewater Effluent', ['5.D.3.c']],  # NLD
-            ['Fish farming', ['5.D.3.d']],  # FIN
-            ['Uncategorized wastewater', ['5.D.3.a']],  # CZE
-        ],
-        "entity_mapping": {
-            'EMISSIONS CH4 Emissions(4)': 'CH4',
-            'EMISSIONS N2O(3) Amount of CH4 for Energy Recovery(5)': 'N2O',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Summary1.As1": {  # Summary 1, sheet 1
-        "status": "tested",
-         "table": {
-            "firstrow": 5,
-            "lastrow": 28,
-            "header": ['entity', 'unit'],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["summary"],
-        },
-        "sector_mapping": [
-            ['Total national emissions and removals', ['0']],
-            ['1. Energy', ['1']],
-            ['A. Fuel combustion Reference approach(2)', ['1.A-ref']],
-            ['Sectoral approach(2)', ['1.A']],
-            ['1. Energy industries', ['1.A.1']],
-            ['2. Manufacturing industries and construction', ['1.A.2']],
-            ['3. Transport', ['1.A.3']],
-            ['4. Other sectors', ['1.A.4']],
-            ['5. Other', ['1.A.5']],
-            ['B. Fugitive emissions from fuels', ['1.B']],
-            ['1. Solid fuels', ['1.B.1']],
-            ['2. Oil and natural gas and other emissions from energy production',
-             ['1.B.2']],
-            ['C. CO2 Transport and storage', ['1.C']],
-            ['2. Industrial processes and product use', ['2']],
-            ['A. Mineral industry', ['2.A']],
-            ['B. Chemical industry', ['2.B']],
-            ['C. Metal industry', ['2.C']],
-            ['D. Non-energy products from fuels and solvent use', ['2.D']],
-            ['E. Electronic industry', ['2.E']],
-            ['F. Product uses as substitutes for ODS', ['2.F']],
-            ['G. Other product manufacture and use', ['2.G']],
-            ['H. Other(3)', ['2.H']],
-        ],
-        "entity_mapping": {
-            'NOX': 'NOx',
-            'Net CO2 emissions/removals': 'CO2',
-            'HFCs(1)': f'HFCS ({gwp_to_use})',
-            'PFCs(1)': f'PFCS ({gwp_to_use})',
-            'Unspecified mix of HFCs and PFCs(1)': f'UnspMixOfHFCsPFCs ({gwp_to_use})',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Summary1.As2": {  # Summary 1, sheet 2
-        "status": "tested",
-         "table": {
-            "firstrow": 5,
-            "lastrow": 34,
-            "header": ['entity', 'entity', 'unit'],
-            "header_fill": [True, False, True],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["summary"],
-        },
-        "sector_mapping": [
-            ['3. Agriculture', ['3']],
-            ['A. Enteric fermentation', ['3.A']],
-            ['B. Manure management', ['3.B']],
-            ['C. Rice cultivation', ['3.C']],
-            ['D. Agricultural soils', ['3.D']],
-            ['E. Prescribed burning of savannas', ['3.E']],
-            ['F. Field burning of agricultural residues', ['3.F']],
-            ['G. Liming', ['3.G']],
-            ['H. Urea application', ['3.H']],
-            ['I. Other carbon-contining fertilizers', ['3.I']],
-            ['J. Other', ['3.J']],
-            ['4. Land use, land-use change and forestry (4)', ['4']],
-            ['A. Forest land (4)', ['4.A']],
-            ['B. Cropland (4)', ['4.B']],
-            ['C. Grassland (4)', ['4.C']],
-            ['D. Wetlands (4)', ['4.D']],
-            ['E. Settlements (4)', ['4.E']],
-            ['F. Other land (4)', ['4.F']],
-            ['G. Harvested wood products', ['4.G']],
-            ['H. Other (4)', ['4.H']],
-            ['5. Waste', ['5']],
-            ['A. Solid waste disposal (5)', ['5.A']],
-            ['B. Biological treatment of solid waste (5)', ['5.B']],
-            ['C. Incineration and open burning of waste (5)', ['5.C']],
-            ['D. Wastewater treatment and discharge', ['5.D']],
-            ['E. Other (5)', ['5.E']],
-            ['6. Other (please specify)(6)', ['6']],
-        ],
-        "entity_mapping": {
-            'NOX': 'NOx',
-            'Net CO2 emissions/removals': 'CO2',
-            'HFCs (1)': f'HFCS ({gwp_to_use})',
-            'PFCs(1)': f'PFCS ({gwp_to_use})',
-            'Unspecified mix of HFCs and PFCs(1)': f'UnspMixOfHFCsPFCs ({gwp_to_use})',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-    "Summary1.As3": {  # Summary 1, sheet 3
-        "status": "tested",
-         "table": {
-            "firstrow": 5,
-            "lastrow": 17,
-            "header": ['entity', 'entity', 'unit'],
-            "header_fill": [True, False, True],
-            "col_for_categories": "GREENHOUSE GAS SOURCE AND SINK CATEGORIES",
-            "categories": ["category"],
-            "cols_to_ignore": [],
-            "stop_cats": ["", np.nan],
-            "unit_info": unit_info["summary"],
-        },
-        "sector_mapping": [
-            ['Memo items:(7)', ['\IGNORE']],
-            ['International bunkers', ['M.Memo.Int']],
-            ['Aviation', ['M.Memo.Int.Avi']],
-            ['Navigation', ['M.Memo.Int.Mar']],
-            ['Multilateral operations', ['M.Memo.Mult']],
-            ['CO2 emissions from biomass', ['M.Memo.Bio']],
-            ['CO2 captured', ['M.Memo.CO2Cap']],
-            ['Long-term storage of C in waste disposal sites', ['M.Memo.LTSW']],
-            ['Indirect N2O', ['M.Memo.IndN2O']],
-            ['Indirect CO2', ['M.Memo.IndCO2']],
-        ],
-        "entity_mapping": {
-            'NOX': 'NOx',
-            'Net CO2 emissions/removals': 'CO2',
-            'HFCs(1)': f'HFCS ({gwp_to_use})',
-            'PFCs(1)': f'PFCS ({gwp_to_use})',
-            'Unspecified mix of HFCs and PFCs(1)': f'UnspMixOfHFCsPFCs ({gwp_to_use})',
-        },
-        "coords_defaults": {
-            "class": "Total",
-        },
-    },  # tested
-}

+ 0 - 10
UNFCCC_GHG_data/UNFCCC_CRF_reader/crf_specifications/__init__.py

@@ -1,10 +0,0 @@
-"""
-Define the CRF specifications here for easy access
-"""
-
-from .CRF2021_specification import CRF2021
-from .CRF2022_specification import CRF2022
-from .CRF2023_specification import CRF2023
-from .CRF2023_AUS_specification import CRF2023_AUS
-
-__all__ = ["CRF2021", "CRF2022", "CRF2023", "CRF2023_AUS"]

+ 0 - 31
UNFCCC_GHG_data/UNFCCC_CRF_reader/read_UNFCCC_CRF_submission.py

@@ -1,31 +0,0 @@
-"""
-This script is a wrapper around the read_crf_for_country
-function such that it can be called from datalad
-"""
-
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.UNFCCC_CRF_reader_prod import read_crf_for_country
-import argparse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country name or code')
-parser.add_argument('--submission_year', help='Submission round to read', type=int)
-parser.add_argument('--submission_date', help='Date of submission to read', default=None)
-parser.add_argument('--re_read', help='Read data also if already read before',
-                    action='store_true')
-
-args = parser.parse_args()
-
-country = args.country
-submission_year = args.submission_year
-submission_date = args.submission_date
-re_read = args.re_read
-if submission_date == 'None':
-    submission_date = None
-
-read_crf_for_country(
-    country,
-    submission_year=submission_year,
-    submission_date=submission_date,
-    re_read=re_read
-)
-

+ 0 - 33
UNFCCC_GHG_data/UNFCCC_CRF_reader/read_UNFCCC_CRF_submission_datalad.py

@@ -1,33 +0,0 @@
-"""
-wrapper around read_crf_for_country_datalad such that it can be called
-from doit in the current setup where doit runs on system python and
-not in the venv.
-"""
-
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.UNFCCC_CRF_reader_prod import read_crf_for_country_datalad
-import argparse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country name or code')
-parser.add_argument('--submission_year', help='Submission round to read')
-parser.add_argument('--submission_date', help='Date of submission to read', default=None)
-parser.add_argument('--re_read', help='Read data also if already read before',
-                    action='store_true')
-
-args = parser.parse_args()
-
-country = args.country
-submission_year = args.submission_year
-submission_date = args.submission_date
-re_read = args.re_read
-
-
-if submission_date == "None":
-    submission_date = None
-
-read_crf_for_country_datalad(
-    country,
-    submission_year=int(submission_year),
-    submission_date=submission_date,
-    re_read=re_read
-)

+ 0 - 31
UNFCCC_GHG_data/UNFCCC_CRF_reader/read_new_UNFCCC_CRF_for_year.py

@@ -1,31 +0,0 @@
-"""
-This script is a wrapper around the read_crf_for_country
-function such that it can be called from datalad
-"""
-
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.UNFCCC_CRF_reader_prod import \
-    read_new_crf_for_year
-import argparse
-
-parser = argparse.ArgumentParser()
-#parser.add_argument('--countries', help='List of country codes', default=None)
-parser.add_argument('--submission_year', help='Submission round to read', type=int)
-parser.add_argument('--submission_date', help='Date of submission to read', default=None)
-parser.add_argument('--re_read', help='Read data also if already read before',
-                    action='store_true')
-
-args = parser.parse_args()
-
-#countries = args.countries
-#if countries == "None":
-#    countries = None
-submission_year = args.submission_year
-re_read = args.re_read
-print(f"!!!!!!!!!!!!!!!!!!!!script: re_read={re_read}")
-read_new_crf_for_year(
-    submission_year=int(submission_year),
-#    countries=countries,
-    re_read=re_read
-)
-
-

+ 0 - 32
UNFCCC_GHG_data/UNFCCC_CRF_reader/read_new_UNFCCC_CRF_for_year_datalad.py

@@ -1,32 +0,0 @@
-"""
-wrapper around read_crf_for_country_datalad such that it can be called
-from doit in the current setup where doit runs on system python and
-not in the venv.
-"""
-
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.UNFCCC_CRF_reader_prod import read_new_crf_for_year_datalad
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.util import NoCRFFilesError
-import argparse
-
-parser = argparse.ArgumentParser()
-#parser.add_argument('--countries', help='List of country codes', default=None)
-parser.add_argument('--submission_year', help='Submission round to read')
-parser.add_argument('--re_read', help='Read data also if already read before',
-                    action='store_true')
-
-args = parser.parse_args()
-
-#countries = args.countries
-#if countries == "None":
-#    countries = None
-submission_year = args.submission_year
-re_read = args.re_read
-print(f"!!!!!!!!!!!!!!!!!!!!script_dl: re_read={re_read}")
-try:
-    read_new_crf_for_year_datalad(
-        submission_year=int(submission_year),
-#        countries=countries,
-        re_read=re_read
-    )
-except NoCRFFilesError as err:
-    print(f"NoCRFFilesError: {err}")

+ 0 - 33
UNFCCC_GHG_data/UNFCCC_CRF_reader/test_read_UNFCCC_CRF_for_year.py

@@ -1,33 +0,0 @@
-"""
-This script is a wrapper around the read_year_to_test_specs
-function such that it can be called from datalad
-"""
-
-from UNFCCC_GHG_data.UNFCCC_CRF_reader.UNFCCC_CRF_reader_devel import read_year_to_test_specs
-import argparse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--submission_year', help='Submission round to read', type=int)
-parser.add_argument('--data_year', help='Data year to read', type=int, default=2010)
-parser.add_argument('--country', help='Country to read', type=str, default=None)
-parser.add_argument('--totest', help='read tables to test', action='store_true')
-args = parser.parse_args()
-
-
-submission_year = args.submission_year
-data_year = args.data_year
-country = args.country
-#print(f"totest: {args.totest}")
-if args.totest:
-    totest = True
-else:
-    totest = False
-
-read_year_to_test_specs(
-    submission_year=submission_year,
-    data_year=data_year,
-    totest=totest,
-    country_code=country,
-)
-
-

+ 0 - 16
UNFCCC_GHG_data/UNFCCC_CRF_reader/util.py

@@ -1,16 +0,0 @@
-
-all_crf_countries = [
-    'AUS', 'AUT', 'BEL', 'BGR', 'BLR',
-    'CAN', 'CHE', 'CYP', 'CZE', 'DEU', # 10
-    'DKE', 'DNK', 'DNM', 'ESP', 'EST',
-    'EUA', 'EUC', 'FIN', 'FRA', 'FRK', # 20
-    'GBK', 'GBR', 'GRC', 'HRV', 'HUN',
-    'IRL', 'ISL', 'ITA', 'JPN', 'KAZ', # 30
-    'LIE', 'LTU', 'LUX', 'LVA', 'MCO',
-    'MLT', 'NLD', 'NOR', 'NZL', 'POL', # 40
-    'PRT', 'ROU', 'RUS', 'SVK', 'SVN',
-    'SWE', 'TUR', 'UKR', 'USA', # 49
-]
-
-class NoCRFFilesError(Exception):
-    pass

+ 0 - 2323
UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_config.py

@@ -1,2323 +0,0 @@
-# TODO: check if downscaling respects gas basket resolution for GWP transformation
-# TODO: why is albania IPPU KYOTOGHG 0 in 2005
-
-di_query_filters = [
-    'classifications', 'measures', 'gases',
-]
-# category, party are extra
-# measure is preprocessed to find ids
-
-# the activity data and emissions factors have a structure that is incompatible
-# with PRIMAP2.
-# To read it into a primap2 dataframe the information in classification / measure
-# has to be put into "entity" which is currently always "No gas". I's possible,
-# but takes some time, so I have omitted it here
-filter_activity_factors = {
-    "entity": {"gas": ["No gas"]},
-    "unit": {"unit": [
-        'no unit', 'kg/TJ', 't/TJ', '%', 'kg/t',
-        'kg/kt', 't/t', 'kg/head/year', 'kg N2O/kg N handled', 'kg N2O/kg N',
-        'kg N2O-N/kg N handled', 'g/m^2', 'kg N2O-N/kg N', 'kg N2O-N/ha', 'kg/t dm',
-        't CO2-C/t', 't/unit', 't C/ha', 'kg CH4/ha', 'kg CO2/ha',
-        'g/kg', 'kg/kg DC',
-    ]
-    },
-}
-
-# regular expression to match category code in category label
-cat_code_regexp = r'(?P<code>^(([0-9][A-Za-z0-9\.]{0,10}[0-9A-Za-z]))|([0-9]))[' \
-                  r'\s\.].*'
-
-gwp_to_use = 'SARGWP100'
-
-# PRIMAP2 interchange format config
-di_to_pm2if_template_nai = {
-    "coords_cols": {
-        "category": "category",
-        "entity": "gas",
-        "unit": "unit",
-        "area": "party",
-        "sec_cats__class": "classification",
-        "sec_cats__measure": "measure",
-        "data": "numberValue",
-        "time": "year",
-    },
-    # to store the original category name as well as the one mapped to IPCC categories
-    "add_coords_cols": {
-        "orig_cat_name": ["category_copy", "category"],
-    },
-    # terminologies for different coordinates
-    "coords_terminologies": {
-        "area": "ISO3",
-        "scenario": "Access_Date",
-        "category": "BURDI",
-    },
-    # default values for coordinates
-    "coords_defaults": {
-        "provenance": "measured",
-        "source": "UNFCCC",
-    },
-    # mapping of values e.g. gases to the primap2 format
-    "coords_value_mapping": {
-        "entity": {
-            f"Aggregate GHGs ({gwp_to_use})": f"KYOTOGHG ({gwp_to_use})",
-            f"Aggregate F-gases ({gwp_to_use})": f"FGASES ({gwp_to_use})",
-            f"HFCs ({gwp_to_use})": f"HFCS ({gwp_to_use})",
-            f"PFCs ({gwp_to_use})": f"PFCS ({gwp_to_use})",
-            #f"SF6 ({gwp_to_use})": f"SF6 ({gwp_to_use})",
-            #f"CH4 ({gwp_to_use})": f"CH4 ({gwp_to_use})",
-            f"CO2 ({gwp_to_use})": "CO2",
-            #f"N2O ({gwp_to_use})": f"N2O ({gwp_to_use})",
-            #f"Unspecified mix of HFCs and PFCs ({gwp_to_use})":
-            #    f"UnspMixOfHFCsPFCs ({gwp_to_use})",
-            f"Unspecified mix of HFCs ({gwp_to_use})": f"UnspMixOfHFCs ({gwp_to_use})",
-            f"Unspecified mix of PFCs ({gwp_to_use})": f"UnspMixOfPFCs ({gwp_to_use})",
-            "HFC-23": "HFC23",
-            "HFC-32": "HFC32",
-            "HFC-41": "HFC41",
-            "HFC-43-10mee": "HFC4310mee",
-            "HFC-125": "HFC125",
-            "HFC-134": "HFC134",
-            "HFC-134a": "HFC134a",
-            "HFC-143": "HFC143",
-            "HFC-143a": "HFC143a",
-            "HFC-152": "HFC152",
-            "HFC-152a": "HFC152a",
-            "HFC-161": "HFC161",
-            "HFC-227ea": "HFC227ea",
-            "HFC-236ea": "HFC236ea",
-            "HFC-236cb": "HFC236cb",
-            "HFC-236fa": "HFC236fa",
-            "HFC-245ca": "HFC245ca",
-            "HFC-245fa": "HFC245fa",
-            "HFC-365mfc": "HFC365mfc",
-            "c-C4F8": "cC4F8",
-            "c-C3F6": "cC3F6",
-        },
-        "unit": "PRIMAP1",
-        "category": {
-            # NAI
-            "Total GHG emissions excluding LULUCF/LUCF": "15163",
-            "Total GHG emissions including LULUCF/LUCF": "24540",
-            "International Bunkers": "14637",
-            "Marine": "14423",
-            "Aviation": "14424",
-            "CO₂ Emissions from Biomass": "14638",
-        }
-    },
-    # fill missing data from other columns (not needed here)
-    "coords_value_filling": {
-    },
-    # remove data based on filters
-    "filter_remove": {
-    },
-    # keep only the data defined in the filters
-    "filter_keep": {
-    },
-    # define meta data
-    "meta_data": {
-        "references": "https://di.unfccc.int",
-        "title": "XXXX", # to set per country
-        "comment": "Data read from the UNFCCC DI flexible query interface using the API.",
-        "rights": "",
-        "contact": "mail@johannes-guetschow.de",
-        "institution": "United Nations Framework Convention on Climate Change (www.unfccc.int)",
-    },
-    # time format used in the input data
-    "time_format": "%Y",
-}
-
-di_to_pm2if_template_ai = {
-    "coords_cols": {
-        "category": "category",
-        "entity": "gas",
-        "unit": "unit",
-        "area": "party",
-        "sec_cats__class": "classification",
-        "sec_cats__measure": "measure",
-        "data": "numberValue",
-        "time": "year",
-    },
-    # to store the original category name as well as the one mapped to IPCC categories
-    "add_coords_cols": {
-        #"orig_cat_name": ["category_copy", "category"],
-    },
-    # terminologies for different coordinates
-    "coords_terminologies": {
-        "area": "ISO3",
-        "scenario": "Access_Date",
-        "category": "CRFDI",
-    },
-    # default values for coordinates
-    "coords_defaults": {
-        "provenance": "measured",
-        "source": "UNFCCC",
-    },
-    # mapping of values e.g. gases to the primap2 format
-    "coords_value_mapping": {
-        "entity": {
-            "Aggregate F-gases (AR4GWP100)": "FGASES (AR4GWP100)",
-            "Aggregate GHGs (AR4GWP100)": "KYOTOGHG (AR4GWP100)",
-            "HFCs (AR4GWP100)": "HFCS (AR4GWP100)",
-            "PFCs (AR4GWP100)": "PFCS (AR4GWP100)",
-            "Unspecified mix of HFCs and PFCs (AR4GWP100)":
-                "UnspMixOfHFCsPFCs (AR4GWP100)",
-            #"Unspecified mix of HFCs and PFCs":
-            #    "UnspMixOfHFCsPFCs", # this is problematic, mixes should use CO2eq
-            # with GWP
-            "Unspecified mix of HFCs (AR4GWP100)": "UnspMixOfHFCs (AR4GWP100)",
-            "Unspecified mix of PFCs (AR4GWP100)": "UnspMixOfPFCs (AR4GWP100)",
-            "HFC-23": "HFC23",
-            "HFC-32": "HFC32",
-            "HFC-41": "HFC41",
-            "HFC-43-10mee": "HFC4310mee",
-            "HFC-125": "HFC125",
-            "HFC-134": "HFC134",
-            "HFC-134a": "HFC134a",
-            "HFC-143": "HFC143",
-            "HFC-143a": "HFC143a",
-            "HFC-152": "HFC152",
-            "HFC-152a": "HFC152a",
-            "HFC-161": "HFC161",
-            "HFC-227ea": "HFC227ea",
-            "HFC-236ea": "HFC236ea",
-            "HFC-236cb": "HFC236cb",
-            "HFC-236fa": "HFC236fa",
-            "HFC-245ca": "HFC245ca",
-            "HFC-245fa": "HFC245fa",
-            "HFC-365mfc": "HFC365mfc",
-            "c-C4F8": "cC4F8",
-            "c-C3F6": "cC3F6",
-        },
-        "unit": "PRIMAP1",
-        "category": {
-            'Annual Change in Total Long-term C Storage': "11024",
-            'Annual Change in Total Long-term C Storage in HWP Waste': "11025",
-            'HWP in SWDS': "11036",
-            'International Aviation': "10357",
-            'International Navigation': "8828",
-            'Long-term Storage of C in Waste Disposal Sites': "temp",
-            'CO₂ Emissions from Biomass': "8270",
-            'International Bunkers': "8564",
-            'Multilateral Operations': "8987",
-            'Total Amount Captured for Storage': "11030",
-            'Total Amount of CO₂ Injected at Storage Sites': "11033",
-            'Total Amount of Exports for Storage': "11032",
-            'Total Amount of Imports for Storage': "11031",
-            'Total GHG emissions with LULUCF': "8677",
-            'Total GHG emissions with LULUCF including indirect CO₂': "10480",
-            'Total GHG emissions without LULUCF': "10464",
-            'Total GHG emissions without LULUCF including indirect CO₂': "10479",
-            'Total Leakage from Transport, Injection and Storage': "11034",
-            'Waste Incineration with Energy Recovery included as Biomass': "11027",
-            'Waste Incineration with Energy Recovery included as Fossil Fuels':
-                "11028",
-        }
-    },
-    # fill missing data from other columns (not needed here)
-    "coords_value_filling": {
-    },
-    # remove data based on filters
-    "filter_remove": {
-        # some upsecified mixes not reported in CO2eq have tonbe removed
-        "entity_wrong_unit": {
-            "gas": ["Unspecified mix of HFCs and PFCs"]
-        },
-        # remove data that is not for a gas (partly it currently can't be read and
-        # partly because the dataset is too large because of the many dimensions)
-        "entity_no_gas": {
-            "gas": ["No gas"]
-        },
-    },
-    # keep only the data defined in the filters
-    "filter_keep": {
-        "only_emission_measures": {
-            "measure": [
-                'Net carbon emissions',
-                'Net emissions/removals',
-                'Emissions from disposal',
-                'Emissions from manufacturing',
-                'Emissions from stocks',
-                'Indirect emissions',
-                'Direct emissions per MMS',
-                'Direct emissions per MMS - Anaerobic lagoon',
-                'Direct emissions per MMS - Composting',
-                'Direct emissions per MMS - Daily spread',
-                'Direct emissions per MMS - Digesters',
-                'Direct emissions per MMS - Liquid system',
-                'Direct emissions per MMS - Other',
-                'Direct emissions per MMS - Solid storage and dry lot',
-                'Indirect N2O emissions from atmospheric deposition',
-                'Indirect N2O emissions from nitrogen leaching and run-off',
-                'Net emissions/removals from HWP from domestic harvest',
-            ],
-        },
-    },
-    # define meta data
-    "meta_data": {
-        "references": "https://di.unfccc.int",
-        "title": "XXXX", # to set per country
-        "comment": "Data read from the UNFCCC DI flexible query interface using the API.",
-        "rights": "",
-        "contact": "mail@johannes-guetschow.de",
-        "institution": "United Nations Framework Convention on Climate Change (www.unfccc.int)",
-    },
-    # time format used in the input data
-    "time_format": "%Y",
-}
-
-cat_conversion = {
-    # ANNEXI to come (low priority as we read from CRF files)
-    'BURDI_to_IPCC2006_PRIMAP': {
-        'mapping': {
-            '1': '1',
-            '1.A': '1.A',
-            '1.A.1': '1.A.1',
-            '1.A.2': '1.A.2',
-            '1.A.3': '1.A.3',
-            '1.A.4': '1.A.4',
-            '1.A.5': '1.A.5',
-            '1.B': '1.B',
-            '1.B.1': '1.B.1',
-            '1.B.2': '1.B.2',
-            '2': '2',
-            '2.A': '2.A',
-            '2.B': 'M.2.B_2.B',
-            '2.C': '2.C',
-            '2.D': 'M.2.H.1_2',
-            '2.E': 'M.2.B_2.E',
-            '2.F': '2.F',
-            '2.G': '2.H.3',
-            '4': 'M.AG',
-            '4.A': '3.A.1',
-            '4.B': '3.A.2',
-            '4.C': '3.C.7',
-            '4.D': 'M.3.C.45.AG',
-            '4.E': '3.C.1.c',
-            '4.F': '3.C.1.b',
-            '4.G': '3.C.8',
-            '5': 'M.LULUCF',
-            '6': '4',
-            '6.A': '4.A',
-            '6.B': '4.D',
-            '6.C': '4.C',
-            '6.D': '4.E',
-            '24540': '0',
-            '15163': 'M.0.EL',
-            '14637': 'M.BK',
-            '14424': 'M.BK.A',
-            '14423': 'M.BK.M',
-            '14638': 'M.BIO',
-            '7': '5',
-        }, #5.A-D ignored as not fitting 2006 cats
-        'aggregate': {
-            '2.B': {'sources': ['M.2.B_2.B', 'M.2.B_2.E'], 'name': 'Chemical Industry'},
-            '2.H': {'sources': ['M.2.H.1_2', '2.H.3'], 'name': 'Other'},
-            #'2': {'sources': ['2.A', '2.B', '2.C', '2.F', '2.H'],
-            #      'name': 'Industrial Processes and Product Use'},
-            '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-            '3.C.1': {'sources': ['3.C.1.b', '3.C.1.c'],
-                         'name': 'Emissions from biomass burning'},
-            'M.3.C.1.AG': {'sources': ['3.C.1.b', '3.C.1.c'],
-                         'name': 'Emissions from biomass burning (Agriculture)'},
-            '3.C': {'sources': ['3.C.1', 'M.3.C.45.AG', '3.C.7', '3.C.8'],
-                         'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-            'M.3.C.AG': {'sources': ['M.3.C.1.AG', 'M.3.C.45.AG', '3.C.7', '3.C.8'],
-                         'name': 'Aggregate sources and non-CO2 emissions sources on land ('
-                                 'Agriculture)'},
-            'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock'},
-            '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-        },
-    },
-}
-
-di_processing_templates = {
-    # templates fro the DI processing. Most processing rules will apply to several
-    # versions. So we store them here and refer to them in the processing info dict
-    # general templates
-    'general': {
-        'copyUnspHFCUnspPFC': {
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-        'copyUnspHFC': {
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-        'copyHFCPFC': {
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["HFCS", "PFCS"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-        'copyPFC': {
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["PFCS"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-        'copyFGASES': {
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["FGASES"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # country templates
-    #AFG: not needed (newer data in BUR1), 2005, 2013 only
-    #AGO: 2000, 2005 only (external key needed for some gases / sectors)
-    'ALB': {
-        # 1990-2009, 1990-1999 need downscaling
-        'DI2023-05-24': {
-            'remove_ts': {
-                '2.A_H': { # looks wrong in 2005
-                    'category': ['2.A', '2.B', '2.C', '2.D', '2.G'],
-                    'entities': ['CO2', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['2005'],
-                },
-                'Bunkers': { # Aviation and marine swapped in 2005
-                    'category': ['14423', '14424'],
-                    'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['2005'],
-                },
-                'Bunkers_CH4': { # 2005 looks all wrong (swap in activity data not
-                    # result?)
-                    'category': ['14423', '14424', '14637'],
-                    'entities': ['CH4', f'KYOTOGHG ({gwp_to_use})', 'N2O'],
-                        'time': ['2005'],
-                },
-            },
-            'downscale': { # needed for 1990, 2000, 2005-2012
-                'sectors': {
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': ['CO2', 'N2O', 'CH4'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': ['CO2', 'N2O', 'CH4'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                    '1.B': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['CH4'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F',
-                                            '2.G'],
-                        'entities': ['CO2', 'N2O', 'CH4'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                            '4.G'],
-                        'entities': ['N2O', 'CH4'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B', '5.C', '5.D', '5.E'],
-                        'entities': ['CO2', 'CH4'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.C', '6.D'],
-                        'entities': ['N2O', 'CH4'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                    'bunkers': {
-                        'basket': '14637',
-                        'basket_contents': ['14423', '14424'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                        #'skipna_evaluation_dims': None,
-                        #'skipna': True,
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        }
-    },
-    #AND: no data
-    'ARE': { # 1990, 2000, 2005, 2014. some aggregation for fgases (pfcs) needed
-        'DI2023-05-24': {
-            'remove_ts': {
-                'AG_1994': { # inconsistent with other data
-                    'category': ['4',  '4.B', '4.D', '15163', '24540'],
-                    'entities': ['N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1994'],
-                },
-            },
-            'agg_tolerance': 0.015,
-            'ignore_entities': ["NMVOC"], #errors when aggregating cats
-            'aggregate_cats': {
-                '2': {'sources': ['2.A', '2.B', '2.C'],
-                     'name': '2.  Industrial Processes'},
-                '15163': {'sources': ['1', '2', '4', '6'],
-                          'name': 'Total GHG emissions excluding LULUCF/LUCF'},
-                '24540': {'sources': ['1', '2', '5', '4', '6'],
-                          'name': 'Total GHG emissions including LULUCF/LUCF'},
-            },
-        },
-    },
-    # ARG newer data in BUR
-    # ARM 1990, 2000, 2006, 2010, no processing needed
-    # ATG 1990, 2000, no processing needed
-    'AZE': {
-        # 1990-2013, but from different submissions and not completely consistent
-        # including different sector coverage
-        # for FGASES emissions are in HFCs for some years and in PFCs for others.
-        # waste data has inconsistent subsectors
-        'DI2023-05-24': {
-            'remove_ts': {
-                '1.A.1': { #contains data for all subsectors
-                    'category': ['1.A.1'],
-                    'entities': ['CH4', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1990', '2000', '2005', '2006', '2007', '2008', '2009',
-                             '2010', '2011', '2012'],
-                },
-                'pfcs': { # only HFCs in other years, likely wrong
-                    'entities': [f'PFCS ({gwp_to_use})'],
-                    'time': ['1991', '1992', '1993', '1994'],
-                },
-            },
-            'downscale': { # needed for 1990, 2000, 2005-2012
-                'sectors': {
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': ['CH4'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                },
-                'entities': {
-                    'FGASES': {
-                        'basket': f'FGASES ({gwp_to_use})',
-                        'basket_contents': [f'HFCS ({gwp_to_use})'],
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995']},
-                    },
-                    'HFC': {
-                        'basket': f'HFCS ({gwp_to_use})',
-                        'basket_contents': [f'UnspMixOfHFCs ({gwp_to_use})'],
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995', '2000', '2001', '2002', '2003',
-                                         '2004', '2005', '2006', '2007', '2008',
-                                         '2009', '2010', '2012']},
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # BDI 1998, 2005, 2010, 2015 # data coverage is a bit inconsistent
-    'BDI': {
-        'DI2023-05-24': { # 1994, 2000, 2003, 2006, 2009 (energy sector missing in 200X)
-            'remove_ts': {
-                'M.AG.ELV': { # prescribed burning of savannas and agricultural soils
-                    # are missing for all but 1 year
-                    'category': ['4', '4.B', '4.D', '4.E', '4.F', '15163', '24540'],
-                    'entities': ['N2O', f'KYOTOGHG ({gwp_to_use})']
-                },
-            },
-        },
-    },
-    # BEN 1995, 2000 # data coverage a bit inconsistent
-    'BFA': { # 1994, 2007, 2008-2017
-        'DI2023-05-24': {  # remove 2007, seems to have summed sectors (Agri and LULUCF)
-            # and missing sectors (e.g. 1,2 for CH4, N2O), Agri. burning (4.E,
-            # 4.F) missing for 2008-2017
-            # 1994 energy sector is not consistent with other years
-            'remove_years': ['1994', '2007'],
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # BGD 1994, 2001, 2005; coverage mostly consistent but not fully
-    # BHR 1994, 2000 (with some gaps in 2000)
-    'BHS': { # 1990, 1994, 2000 (differing coverage, might be unusable for some sectors)
-        # TODO: check e.g. 4 and 5
-        'DI2023-05-24': {
-            'downscale': {
-                'sectors': {
-                    '4': { # 1994
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.D', '4.G'],
-                        'entities': ['CH4', 'CO2', f'KYOTOGHG ({gwp_to_use})'], # no N2O but
-                        # CO2 is unusual
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                },
-            },
-        }
-    },
-    'BIH': {
-        'DI2023-05-24': {
-            # downscaling in two steps
-            # 1990-2001 has different coverage than 2002-2012 and 2013-2014
-            # do not downscale KyotoGHG for 1990-2001 as that's aggregated
-            # later to avoid inconsistencies
-            'downscale': {
-                'sectors': {
-                    '1.A_1990': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995', '1996', '1997', '1998', '1999',
-                                         '2000', '2001']},
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '1.B_1990': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['CH4', 'CO2', 'NMVOC', 'SO2'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995', '1996', '1997', '1998', '1999',
-                                         '2000', '2001']},
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '2_1990': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995', '1996', '1997', '1998', '1999',
-                                         '2000', '2001']},
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '4_1990': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995', '1996', '1997', '1998', '1999',
-                                         '2000', '2001']},
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '5_1990': {
-                        'basket': '5',
-                        'basket_contents': ['5.A'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995', '1996', '1997', '1998', '1999',
-                                         '2000', '2001']},
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '6_1990': {
-                        'basket': '6',
-                        'basket_contents': ['6.A'],
-                        'entities': ['CH4'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
-                                         '1995', '1996', '1997', '1998', '1999',
-                                         '2000', '2001']},
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                },
-                'entities': {  # 2002-2014
-                    'KYOTO': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CH4', 'CO2', 'N2O'],
-                        'sel': {'category (BURDI)':
-                                    ['1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                     '1.A.5', '1.B', '1.B.1', '1.B.2', '2', '2.A',
-                                     '2.B', '2.C', '2.D', '2.E', '4', '4.A', '4.B',
-                                     '4.C', '4.D', '4.E', '5', '5.A', '6', '6.A',
-                                     '6.B', '6.C', '14423', '14424', '14637',
-                                     '15163', '24540',
-                                     ],
-                                'time': ['2002', '2003', '2004', '2005', '2006',
-                                         '2007', '2008', '2009', '2010', '2011',
-                                         '2012', '2013', '2014'],
-                                },
-                    },
-                },
-            },
-        },
-    },
-    'BLZ': {
-        'DI2023-05-24': { # 1994, 2000, 2003, 2006, 2009 (energy sector missing in 200X)
-            'remove_ts': {
-                'AG': { # inconsistent with other data
-                    'category': ['4', '4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                 '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1994', '2000'],
-                },
-                'waste_1994': { # inconsistent with other data
-                    'category': ['6', '6.A', '6.B', '15163', '24540'],
-                    'entities': ['CO2', 'CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1994'],
-                },
-            },
-        },
-    },
-    # BOL 1990, 1994, 1998, 2000, 2002 (energy sectors missing for CH4, N2O), 2004 (sm),
-    # BRA 1990-2016 (BUR4)
-    'BRB': {
-        'DI2023-05-24': {
-            #'remove_years': ['1990', '1994', '1997'], # keep as 1997 needed for downscaling
-            'aggregate_cats': {
-                '14637': {'sources': ['14423', '14424'],
-                     'name': 'International Bunkers'},
-            },
-            # downscaling in two steps
-            # 2000 - 2012 LULUCF KYOTOGHG
-            # later KYOTOGHG to gases using 1997 shares (not ideal)
-            # don't use, not consistent with the per gas data available in NC2 (but
-            # not read into the DI portal)
-            # 'downscale': {
-            #     'sectors': {
-            #         '5_2000': {
-            #             'basket': '5',
-            #             'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-            #             'entities': [f'KYOTOGHG ({gwp_to_use})'],
-            #             'dim': 'category (BURDI)',
-            #             'sel': {'time': ['1997', '2000', '2001', '2002', '2003', '2004',
-            #                              '2005', '2006', '2007', '2009', '2010']},
-            #             'skipna_evaluation_dims': None,
-            #             'skipna': True,
-            #         },
-            #     },
-            #     'entities': {  # 2000-2010 (1997 as key)
-            #         'KYOTO': {
-            #             'basket': f'KYOTOGHG ({gwp_to_use})',
-            #             'basket_contents': ['CO2', 'CH4', 'N2O'],
-            #             'sel': {'category (BURDI)':
-            #                         ['1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
-            #                          '2', '2.A', '5', '14423', '14424',
-            #                          '14637', '4', '4.A', '4.B', '4.D',
-            #                          '6', '6.A', '6.B', '15163', '24540',
-            #                          ],
-            #                     'time': ['1997', '2000', '2001', '2002', '2003', '2004',
-            #                              '2005', '2006', '2007', '2008', '2009',
-            #                              '2010'],
-            #                     },
-            #         },
-            #     },
-            # },
-        },
-    }, # TODO: downscaling using external key
-    # BRN 2010 only (though with full sectors)
-    # BTN 1994, 2000, 2015. patchy coverage but no downscaling needed / possible
-    # BWA 1994, 2000, 2015. inconsistent coverage
-    # TODO CAF 1994, 2003-2010. 1994 has different coverage and might be inconsistent
-    # CHL: more data in BUR4/5
-    'CHN' :{
-        'DI2023-05-24': { #1994 (gaps), 2005 (needs downscaling), 2010, 2012, 2014
-            # (relatively complete and consistent)
-            'remove_ts': {
-                '1.A.1': { #contains data for all subsectors
-                    'category': ['1.A.1'],
-                    'entities': ['N2O'],
-                        'time': ['1994'],
-                },
-            },
-            'downscale': { # needed for 2005
-                'sectors': {
-                    '1': { # 2005
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': ['CH4', 'CO2', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '1.B': { # 2005
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['CH4'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '1.A': { # 2005
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    # with current functionality we can't downscale 1.A further for
-                    # non-CO2 as it needs several steps and CO2 is present
-                    '2': { # 2005
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C'],
-                        'entities': ['CO2', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '4': { # 2005
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '5': { # several years
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                    '6': { # 2005
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.C', '6.D'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                },
-                'entities': {
-                    'HFC': {
-                        'basket': f'HFCS ({gwp_to_use})',
-                        'basket_contents': ['HFC125', 'HFC134a', 'HFC143a', 'HFC152a',
-                                            'HFC227ea', 'HFC23', 'HFC236fa', 'HFC32',
-                                            f'UnspMixOfHFCs ({gwp_to_use})'],
-                        'sel': {'time': ['2005', '2010']},
-                    },
-                    'PFC': {
-                        'basket': f'PFCS ({gwp_to_use})',
-                        'basket_contents': ['C2F6', 'CF4'],
-                        'sel': {'time': ['2005', '2010']},
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    'CIV' :{
-        'DI2023-05-24': { #1994 (needs some downscaling), 2000
-            'downscale': { # needed for 2005
-                'sectors': {
-                    '1.A': { # 2005
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': ['CO2', 'CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'skipna_evaluation_dims': None,
-                        'skipna': True,
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["FGASES"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # CMR: 1994, 2000, not fully consistent
-    # COD: 1994, 1999-2003, coverage not fully consistent, downscaling complicated
-    # COG: 1994, 2000, not fully consistent
-    # COK: 1994, limited coverage
-    # COL: not needed, more data in BUR3, 1990, 1994, 2000, 2004,
-    # COM: 1994, 2000
-    # CPV: more data in NC3
-    # CRI: more data in NIR
-    'CUB': { # 1990, (1992, 1994, 1996, 1998 dwn needed), 2000, 2002
-        'DI2023-05-24': {
-            # calculate LULUCF from 0 an M.0.EL
-            'subtract_cats': {
-                '5': {'parent': '24540', 'subtract': ['15163'],
-                      'name': '5.  Land-Use Change and Forestry'},
-            },
-            'downscale': { # not tested yet
-                'sectors': {
-                    '0': {
-                        'basket': '24540',
-                        'basket_contents': ['15163', '5'],
-                        'entities': ['CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    'M.0.EL': {
-                        'basket': '15163',
-                        'basket_contents': ['1', '2', '3', '4', '6'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'C2F6', 'CF4', 'HFC134',
-                                     'HFC23', 'SF6', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4', '1.A.5'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.B': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['CH4', 'CO2', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.G'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'C2F6', 'CF4', 'HFC134',
-                                     'HFC23', 'SF6', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'CO', 'NOx'],
-                        'dim': 'category (BURDI)',
-                    },
-                '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.C'],
-                        'entities': ['CH4', 'CO2', 'N2O', 'CO', 'NMVOC', 'NOx', 'SO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-        },
-    },
-    # DJI: 1994, 2000
-    'DMA' :{
-        'DI2023-05-24': {  # 1994, 2000, (2001-2017, some dwn)
-            'remove_ts': {
-                'waste_CH4': { # 1994 very inconsistent
-                    'category': ['6', '6.A', '6.B', '15163', '24540'],
-                    'entities': [f'KYOTOGHG ({gwp_to_use})', 'CH4'],
-                    'time': ['1994'],
-                },
-            },
-            # LULUCF has gaps, cat 0 assumes 0 for LULUCF in these years
-            # we omit aerosols and ghg precusors as only so2 can be downscaled
-            'downscale': {
-                'sectors': {
-                    '1_CH4': {
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1994', '2000', '2001', '2002', '2003',
-                                         '2004', '2005']},
-                    },
-                    '1_CO2': {
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1994', '2000', '2001', '2002', '2003',
-                                         '2004', '2005']},
-                    },
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.F'],
-                        'entities': ['CO2', f'HFCS ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    'bunkers': {
-                        'basket': '14637',
-                        'basket_contents': ['14423', '14424'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # DOM: # 1990, 1994, 1998, 2000, 2010
-    # DZA: 1994, 2000
-    'ECU': {
-        'DI2023-05-24': { # 1990 (1994, 2000), 2010, 2012
-            #omit aerosols / GHG precursosrs in downscaling
-            'remove_years': ['1990'],
-            'downscale': {
-                'sectors': {
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.B': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.G'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E',
-                                            '4.F', '4.G'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.D'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-                'entities': {
-                    'KYOTO': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CH4', 'CO2', 'N2O'],
-                        'sel': {'category (BURDI)':
-                                    ['15163', '24540',
-                                     '1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                     '1.A.5', '1.B',  '1.B.1',  '1.B.2',
-                                     '2', '2.A', '2.B', '2.C', '2.D', '2.G',
-                                     '4', '4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                     '4.G',
-                                     '5', '5.A', '5.B', '5.C', '5.D',
-                                     '6', '6.A', '6.B', '6.D', '7']}
-                    },
-                },
-            },
-        },
-    },
-    'EGY': {
-        'DI2023-05-24': { # 1990, 2000, 2005
-            #omit aerosols / GHG precursosrs in downscaling
-            'remove_ts': {
-                '2.G': { # all in 2.G in 1990
-                        'category': ['2.G'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})', 'CO2', 'N2O'],
-                    },
-            },
-            'downscale': {
-                'sectors': {
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C'],
-                        'entities': ['CO2', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    'ERI': {
-        'DI2023-05-24': { #1994 1995-1999 (partial coverage, KYOTOGHG and total are incomplete), 2000
-            'remove_ts': {
-                'energy_N2O': { # very high in 1994
-                    'category': ['1', '1.A','15163', '24540'],
-                    'entities': ['N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1994'],
-                },
-            },
-        },
-    },
-    'ETH': {
-        'DI2023-05-24': { # 1990-1993 (downscaling needed), 1994-2013
-            'downscale': {
-                # omit aerosols / ghg precursors as missing for most years
-                'sectors': { # for 1990-1994
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.C'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    'bunkers': {
-                        'basket': '14637',
-                        'basket_contents': ['14424'],
-                        'entities': ['CO2', f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-                'entities': {
-                    'bunkers': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CH4', 'CO2', 'N2O'],
-                        'sel': {'category (BURDI)': ['14637', '14424']}
-                    },
-                },
-            },
-        },
-    },
-    # FJI: 1994, 2000
-    # FSM: 1994, 2000
-    # GAB: 1994, 2000 (more data in NIR)
-    # from here down aerosols and GHG precursors are always omitted in downscaling
-    # GEO:
-    'GEO': {
-        'DI2023-05-24': { # 1990-1997, 2000, 2000-2013 (more data in NC4)
-            'downscale': {
-                'sectors': { # for 1991-1997
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F',
-                                            '2.G'],
-                        'entities': ['CO2', 'CH4', 'N2O', 'C2F6', 'CF4', 'HFC125',
-                                     'HFC134', 'HFC134a', 'HFC32', 'SF6'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                            '4.G'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    # 5 subsectors are chaotic
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.D'],
-                        'entities': ['CH4'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-        },
-    },
-    # GHA: 1990-2006
-    # GIN: 1994, 2000
-    'GMB': {
-        'DI2023-05-24': { # 1993, 2000
-            'remove_ts': {
-                'waste': { # very high in 1994
-                    'category': ['6', '6.A', '6.B', '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['1993'],
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        }
-    },
-    'GNB': {
-        'DI2023-05-24': {
-            'remove_ts': {
-                'energy_nonCO2': { # very high in 2006
-                    'category': ['1', '1.A', '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['2006'],
-                },
-            },
-        },
-    },
-    # GNQ: no data
-    'GRD': { # 1994, limited coverage
-        'DI2023-05-24': {
-            'remove_ts': {
-                'agri, waste': { # inconsistent with other sources
-                    'category': ['4', '4.A', '4.B', '4.D', '6', '6.A',
-                                 '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['1994'],
-                },
-            },
-        },
-    },
-    # GTM: 1990, 1994, 2000, 2005,
-    # GUY: 1990-2004
-    'HND': {
-        'DI2023-05-24': { # 1995, 2000, 2005, 2015
-            'remove_ts': {
-                'waste': { # inconsistent
-                    'category': ['6', '6.B', '6.C', '6.D', '15163', '24540'],
-                    'entities': ['N2O', 'CH4', 'CO2', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1995', '2000'],
-                },
-                'livestock': { # inconsistent
-                    'category': ['4.B', '4'],
-                    'entities': ['N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['2000'],
-                },
-            },
-        },
-    },
-    # HTI: 1994-2000
-    'IDN': {
-        'DI2023-05-24': { # 1990-1994, 2000
-            'downscale': {
-                'sectors': { # for 1990-1993
-                    '1.B': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['CH4', 'CO2'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994']},
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                            '4.G'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994']},
-                    },
-                },
-            },
-        },
-    },
-    'IND': {
-        'DI2023-05-24': { # 1994,2000, 2010, 2016. Subsectors differ a bit especially
-            # for 1994 and for LULUCF data
-            'remove_ts': {
-                '2C': { # inconsistent with other sources
-                    'category': ['2.C', '2', '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})', 'CO2',
-                                 'C2F6', 'CF4', f'PFCS ({gwp_to_use})', 'SF6' ],
-                    'time': ['1994'],
-                },
-            },
-            'downscale': {
-                'sectors': { # for 1994
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': ['CH4'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1994', '2000']},
-                    },
-                },
-            },
-        },
-    },
-    # ISR: sf6 in 2008 is very high, but it's from BUR1
-    'JAM': { # 1994, 2006-2010, 2012
-        'DI2023-05-24': {
-            'remove_ts': {
-                'agri, waste': { # inconsistent with other sources
-                    'category': ['4', '4.A', '4.B', '4.D', '6', '6.A',
-                                 '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['1994'],
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # JOR: M.AG in 2000 is very low but it's like that in NC2 and no comment on error
-    # in comparison in NC3
-    'JOR': {
-        'DI2023-05-24': {
-            'remove_ts': {
-                'agri_2000': { # data are like that in NC2, but completely
-                    # inconsistent with NC1,3
-                    'category': ['4', '4.A', '4.B', '24540', '15163'],
-                    'entities': ['CH4', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['2000'],
-                },
-                'agri_1994': { # inconsistent with later submissions
-                    'category': ['4', '4.B', '4.C', '4.E', '4.F', '4.G',
-                                 '24540', '15163'],
-                    'entities': ['CH4', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1994'],
-                },
-                'waste': {
-                    'category': ['6', '6.A', '6.B', '6.C', '6.D', '15163', '24540'],
-                    'entities': ['CH4', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1994'],
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        }
-    },
-    'KEN': {
-        'DI2023-05-24': { # 1994,1995, 2000, 2005, 2010. Subsectors doffer a bit
-            # especilly for 1994
-            # 1994 data is inconsistent with 1995 and following years and has
-            # unrealisticly high N2O emissions from the energy sector
-            'remove_years': ['1994'],
-            'aggregate_cats': {
-                '1.B': {'sources': ['1.B.2'],
-                     'name': '1.B  Fugitive Emissions from Fuels'},
-            },
-        },
-    },
-    # KGZ: 1990-2010
-    # KHM: 1994, 2000 (more data in BUR1)
-    'KIR': { # 1994, (2004,2005 partial coverage), 2006-2008
-        'DI2023-05-24': {
-            'remove_ts': {
-                'agri_n2O': { # very high compared to CH4 and total emissions
-                    'category': ['4', '4.B',
-                                 '15163', '24540'],
-                    'entities': ['N2O', f'KYOTOGHG ({gwp_to_use})'],
-                },
-            },
-        },
-    },
-    # KNA: 1994
-    # KOR: 1990-2018 (more data in 2022 inventory)
-    # KWT: 1994, 2016
-    # LAO: 1990, 2000 (1990 data maybe inconsistent)
-    # LBN: 1994, 2000, 2011-2013
-    # LBR: 2000, 2014 (2000 misses some sectors, e.g. IPPU)
-    # LBY: no data
-    'LCA': {
-        'DI2023-05-24': { #1994, 2000, 2005, 2010, sectors a bit inconsistent for 1994
-            # 1994 data waste CH4
-            'remove_ts': {
-                'waste': { # very high in 1994
-                    'category': ['6', '6.A', '6.B', '6.D'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['1994'],
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # LKA: 1994, 2000. a bit inconsisten in subsectrs (all emissions in "other in
-    # 1994 for some sectors)
-    'LSO': {
-        'DI2023-05-24': { # 1994,2000, 2000 needs downscaling
-            'downscale': {
-                'sectors': { # for 2000
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A'],
-                        'entities': ['CH4', 'CO2', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.2', '1.A.3', '1.A.4', '1.A.5'],
-                        'entities': ['CH4', 'CO2', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.D', '4.E'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-        },
-    },
-    'MAR': { # TODO IPPU subsectors chaotic (swap between other and metal prodction)
-        'DI2023-05-24': { # 1994,2000, (2000-2006,2007 needs downscaling), 2010, 2012
-            'downscale': {
-                'sectors': {
-                    '1.B': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'tolerance' : 0.018, # LULUCF data inconstent in 2012
-                    },
-                },
-                'entities': {
-                    'all': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CH4', 'CO2', 'N2O'],
-                        'sel': {'category (BURDI)': [
-                            '1', '2', '4', '5', '6', '15163', '24540',
-                            '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                            '1.B', '1.B.1', '1.B.2',
-                            '2.A', '2.C', '2.D',
-                            '4.A', '4.B', '4.C', '4.D',
-                            '5.A', '5.B',
-                            '6.A', '6.B', '6.D',
-                        ]}
-                    },
-                },
-            },
-        },
-    },
-    # MDA: 1990-2013 (more data in NIR / NC5)
-    'MDG': {
-        'DI2023-05-24': { # 1994,2000, 2005-2010 (2006-2010 needs downscaling)
-            'remove_ts': {
-                'MAGELV_CH4': { # data from NCs 1 and 2 much lower than NC3 data. we
-                    # also have to remove field burning of agricultural residues and
-                    # Prescribed buning of savannas as they are summed
-                    'category': ['4', '4.C', '4.E', '4.F', '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['1994', '2000'],
-                },
-            },
-            'downscale': {
-                'sectors': {
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2005', '2006', '2007', '2008', '2009',
-                                         '2010']},
-                    },
-                    # further downscaling is not possible in a consistent manner with
-                    # current code (and not necessary for primap-hist). Using the
-                    # 2005 subsector information would lead to individual gas
-                    # timeseries which are inconsistent with given kyotoghg subsector
-                    # timeseries while using the kyotoghg subsector information will
-                    # not give individual gas subsector timeseries which add up to
-                    # the individual gas main sector timeseries
-                    # same for 6
-                },
-                'entities': {
-                    'kyotoghg_4': { # in general similar problem to 1.A, but most sectors have
-                        # only one gas and we need the data for PRIMAP-hist,
-                        # so we have to do it anyway
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CH4', 'N2O'],
-                        'sel': {
-                            'category (BURDI)': [
-                                '4.A', '4.B', '4.C', '4.D', '4.E', '4.F'],
-                            'time': [
-                                '2005', '2006', '2007', '2008', '2009','2010'],
-                        }
-                    },
-                },
-            },
-        },
-    },
-    # MDV: 1994 (only few sectors), 2011-2015
-    # MEX: more data in BURs 2 and 3
-    # MHL: 2000, 2005, 2010
-    # MKD:
-    'MKD': {
-        'DI2023-05-24': {  # 1990-2009
-            'downscale': {
-                'entities': {
-                    'FGASES': {
-                        'basket': f'FGASES ({gwp_to_use})',
-                        'basket_contents': [f'HFCS ({gwp_to_use})'],
-                    },
-                    'HFC': {
-                        'basket': f'HFCS ({gwp_to_use})',
-                        'basket_contents': [f'UnspMixOfHFCs ({gwp_to_use})'],
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    'MLI': {
-        'DI2023-05-24': {  # 1995,2000, 2005
-            'downscale': {
-                'sectors': {
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['1995', '2000']},
-                    },
-                },
-                'entities': {
-                    'FGASES': {
-                        'basket': f'FGASES ({gwp_to_use})',
-                        'basket_contents': [f'HFCS ({gwp_to_use})'],
-                    },
-                    'HFC': {
-                        'basket': f'HFCS ({gwp_to_use})',
-                        'basket_contents': [f'UnspMixOfHFCs ({gwp_to_use})'],
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    'MMR': {
-        'DI2023-05-24': {  # 2000-2005
-            'downscale': {
-                'sectors': {
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-                'entities': {
-                    'kyotoghg_5': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CO2', 'CH4', 'N2O'],
-                        'sel': {
-                            'category (BURDI)': [
-                                '5'],
-                        }
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # MNE: more data in BUR3
-    # MNG: 1990-1998, 2006. Some details missing in 1990-1998 but to disconnected
-    # from 2006 data to use that for downscaling
-    # MOZ: 1990, 1994
-    # MRT: more data in BUR 1 and 2
-    'MUS': {
-        'DI2023-05-24': { #1995, 200-2006, 2013
-            'remove_ts': {
-                'waste': { # 1995 inconsistent
-                    'category': ['6', '6.A', '6.B', '6.C', '6.D'],
-                    'entities': ['CO2', 'CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                        'time': ['1995'],
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # MWI: 1990, 1994. inconsistency in 1.B.1: 1994: CO2, 1990: CH4
-    # MYS: more data in BUR 3, 4
-    # NAM: more adat in BUR 2, 3
-    # NER: 1990, 2000, 2008
-    # NGA: miore data in NIR
-    # NIC: 1994, 2000: LU data inconsistent (5.A missing in 2000)
-    # NIU: 1990, 2000, 2005-2009
-    # NPL: 1994, 2000
-    # NRU: 1994, 2000, 2003, 2007, 2010. Subsectors (e.g. 1.A.x) sometimes inconsistent
-    # OMN: more data in BUR1
-    # PAK: 1994, 2008, 2012, 2015 (very limited data)
-    # PAN: more data in NIR, BUR2
-    # PER: 1994, 2000, 2010, 2012
-    # PNG: 1994, 2000 inconsistent sector coverage
-    'PHL': {
-        'DI2023-05-24': {  # 1994, 2000
-            'downscale': {
-                'sectors': {
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-                'entities': {
-                    'kyotoghg_56': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CH4', 'N2O'],
-                        'sel': {
-                            'category (BURDI)': ['6', '6.A', '6.B'],
-                        }
-                    },
-                },
-            },
-        },
-    },
-    'PLW': { # 1994, 1995-1999 (partial), 2000, 2005}
-        'DI2023-05-24': {
-            'remove_ts': {
-                'waste': {
-                    'category': ['6', '6.A', '6.B', '6.C', '6.D', '15163', '24540'],
-                    'entities': ['CO2', 'CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1994'],
-                },
-            },
-            'remove_years': ['1995', '1996', '1997', '1998', '1999'],
-            # only few sectors covered and data found neither in NC1 nor NC2
-        },
-    },
-    # PRK: 1990, 1994, 2000, 2002
-    # PRY: 1990, 1994, 2000, 2005, 2011, 2012, 2015, 2017 land use sectors not
-    # consistent, more data in BUR3 but not read yet
-    # PSE: 2011 only
-    # QAT: 2007 only
-    'RWA': {
-        'DI2023-05-24': {  # 2002, 2005
-            'downscale': {
-                'sectors': {
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-        },
-    },
-    # SAU: 1990, 2000, 2010, 2012
-    # SDN: 1995, 2000 subsectors inconsistent
-    # SEN: 2000, 2005 subsectors inconsistent
-    # SGP: 1994, 2000, 2010, 2012 for 1994 sectors a bit inconsistent
-    'SLB': {
-        'DI2023-05-24': {  # 1994 (energy CO2 only), 2000, 2005, 2010 (5, 10 need downscaling)
-            'downscale': {
-                'entities': {
-                    'kyotoghg': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CO2', 'CH4', 'N2O'],
-                        'sel': {
-                            'category (BURDI)': [
-                                '1', '1.A', '1.A.1', '1.A.3', '1.A.4',
-                                '1.B', '1.B.1', '1.B.2',
-                                '4', '4.A', '4.B', '4.C', '4.D',
-                                '6', '6.A', '6.B',
-                                '14424', '14637', '15163', '24540',
-                            ],
-                        }
-                    },
-                },
-            },
-        },
-    },
-    # SLE: no data
-    # SLV: 1994, 2005 subsectors a bit inconsistent
-    # SMR: 2007, 2010
-    # SOM: no data
-    # SSD: 2012-2015
-    'STP': {
-        'DI2023-05-24': {  # 1998 (dwn), 2005 (dwn), 2012:
-            'downscale': {
-                'entities': {
-                    'kyotoghg': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CO2', 'CH4', 'N2O'],
-                        'sel': {
-                            'category (BURDI)': [
-                                '1', '1.A', '1.A.1', '1.A.3', '1.A.4', '1.A.5',
-                                '1.B',
-                                '4', '4.A', '4.B', '4.D', '4.E', '4.F',
-                                '5', '5.A', '5.B', '5.C', '5.D',
-                                '6', '6.A', '6.B',
-                                '14423', '14424', '14637', '14638', '15163', '24540',
-                            ],
-                        }
-                    },
-                },
-            },
-        },
-    },
-    # SUR: 2003
-    # SYC: 1995 (partial), 2000
-    # SYR: 1994-2005: external key needed
-    'TCD': {
-        'DI2023-05-24': {  # 1993, 1998-2003, 2010 sector coverage inconsistent
-            # LULUCF data with sum errors
-            'downscale': {
-                'sectors': {
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A'],
-                        'entities': ['CO2'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-            'remove_ts': {
-                'M.AG.ELV': {
-                    'category': ['4', '4.D', '4.E', '4.F', '15163', '24540'],
-                    'entities': ['N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1993'],
-                },
-            },
-        },
-    },
-    # TGO: more data in BUR / NIR, 1992-1998, 2000, 2005, 2010, 2013-2018 (
-    # downscaling needed for some years, inconsistent detail)
-    # THA: 1994 (2000-2013, extensive downscaling needed for 2000-2012).
-    'THA': {
-        'DI2023-05-24': {
-            'downscale': {
-                # main sectors present as KYOTOGHG sum. subsectors need to be downscaled
-                # TODO: downscale CO, NOx, NMVOC, SO2 (national total present)
-                'sectors': {
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
-                                         '2005', '2006', '2007', '2008', '2009',
-                                         '2010', '2011', '2012', '2013']},
-                    },
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
-                                         '2005', '2006', '2007', '2008', '2009',
-                                         '2010', '2011', '2012', '2013']},
-                    },
-                    '1.B': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
-                                         '2005', '2006', '2007', '2008', '2009',
-                                         '2010', '2011', '2012', '2013']},
-                    },
-                    '2': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
-                                         '2005', '2006', '2007', '2008', '2009',
-                                         '2010', '2011', '2012', '2013']},
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E',
-                                            '4.F'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
-                                         '2005', '2006', '2007', '2008', '2009',
-                                         '2010', '2011', '2012', '2013']},
-                    },
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B', '5.C'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
-                                         '2005', '2006', '2007', '2008', '2009',
-                                         '2010', '2011', '2012', '2013']},
-                    },
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.C'],
-                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
-                        'dim': 'category (BURDI)',
-                        'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
-                                         '2005', '2006', '2007', '2008', '2009',
-                                         '2010', '2011', '2012', '2013']},
-                    },
-                },
-                'entities': {
-                    'KYOTO': {
-                        'basket': f'KYOTOGHG ({gwp_to_use})',
-                        'basket_contents': ['CH4', 'CO2', 'N2O'],
-                        'sel': {
-                            'category (BURDI)': [
-                                '1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                '1.B', '1.B.1', '1.B.2',
-                                '2', '2.A', '2.B', '2.C', '2.D',
-                                '4', '4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                '5', '5.A', '5.B', '5.C',
-                                '6', '6.A', '6.B', '6.C',
-                                '15163', '24540',
-                            ],
-                            'time': ['2000', '2001', '2002', '2003', '2004',
-                                     '2005', '2006', '2007', '2008', '2009',
-                                     '2010', '2011', '2012', '2013']
-                        },
-                    },
-                },
-            },
-        },
-    },
-    # TJK 1990-2010
-    # TKM: 1994, 2000, 2004, 2010. subsectors a bit inconsistent
-    # TLS: 2010, also covered by NC2, but without full detail
-    # TON: 1994, 2000, 2006. subsectors a bit inconsistent
-    # TTO: 1990 only
-    # TUN: 1994, 2000
-    # TUV: 1994, 2014, many sectors missiong / 0 (but maybe as there are no emissions)
-    # TZA: 1990, 1994
-    # UGA: 1994, 2000, subcategories a bit inconsistent
-    'URY': {
-        # remove data: CH4, 1998, 2002, 1
-        'DI2023-05-24': {
-            'downscale': {
-                'sectors': {
-                    '1': {
-                        'basket': '1',
-                        'basket_contents': ['1.A', '1.B'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.A': {
-                        'basket': '1.A',
-                        'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
-                                            '1.A.5'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '1.B': {
-                        'basket': '1.B',
-                        'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '2_CO2CH4N2O': {
-                        'basket': '2',
-                        'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.G'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '2_FGASES': {
-                        'basket': '2',
-                        'basket_contents': ['2.C', '2.E', '2.F'],
-                        'entities': ['C2F6', 'CF4', 'HFC125', 'HFC134a', 'HFC143a',
-                                     'HFC152a', 'HFC227ea', 'HFC23', 'HFC32', 'SF6'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '4': {
-                        'basket': '4',
-                        'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                            '4.G'],
-                        'entities': ['CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.A', '5.B', '5.C', '5.D', '5.E'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B', '6.C', '6.D'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfPFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # UZB: 1990-2012
-    'VCT': {
-        'DI2023-05-24': { # 1990, 1994, 1997, 2000, 2004. Sector coverage a bit
-            # inconsistent. 1.A.x
-            'remove_ts': {
-                'agri': { # inconsistent probably from two submissions with different
-                    # methodology
-                    'category': ['4', '4.A', '4.B', '4.C', '4.D', '4.E', '4.F',
-                                 '15163', '24540'],
-                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
-                    'time': ['1990', '1994', '1997', '2000', '2004'],
-                },
-            },
-        },
-    },
-    # missing for CH4 but present for CO2. IPPU is 0, subsectors missing downscaling
-    # doesn't wor for all 0 / Nan timeseries
-    # VEN: 1999 only
-    # VNM: more data in BUR3
-    # VUT: more data in NC3
-    # WSM: more data in NC2
-    # YEM: 1995, 2000, 2010, 2012. subsectoral data a bit inconsistent, e.g. for 1.A.x
-    # ZAF: 1990, 1994
-    'ZMB': {
-        'DI2023-05-24': {  # 1994, 2000
-            'downscale': { # for 2000
-                'sectors': {
-                    '5': {
-                        'basket': '5',
-                        'basket_contents': ['5.B', '5.C'],
-                        'entities': ['CO2', 'CH4', 'N2O'],
-                        'dim': 'category (BURDI)',
-                    },
-                    '6': {
-                        'basket': '6',
-                        'basket_contents': ['6.A', '6.B'],
-                        'entities': ['CH4'],
-                        'dim': 'category (BURDI)',
-                    },
-                },
-            },
-            'basket_copy': {
-                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-                'entities': ["UnspMixOfHFCs"],
-                'source_GWP': gwp_to_use,
-            },
-        },
-    },
-    # ZWE:
-    # 'ZWE': { # 1994, 2000, 2006 consistency of sectors and coverage does not look good,
-    # # especially for subsectors
-    #     'DI2023-05-24': {  # remove all years
-    #         'remove_years': ['1994', '2000', '2006'],
-    #     },
-    # },
-}
-
-di_processing_info = {
-    # only countries with special processing listed
-    # category conversion is defined on a country group level
-    # the 'default' option is used if no specific option is found such that
-    # processing of new versions can be done before creating a configuration for the
-    # version.
-    'ALB': {
-        'default': di_processing_templates['ALB']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['ALB']['DI2023-05-24'],
-    },
-    'ARE': {
-        'default': di_processing_templates['ARE']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['ARE']['DI2023-05-24'],
-    },
-    'ARG': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'AZE': {
-        'default': di_processing_templates['AZE']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['AZE']['DI2023-05-24'],
-    },
-    'BDI': {
-        'default': di_processing_templates['BDI']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['BDI']['DI2023-05-24'],
-    },
-    'BFA': {
-        'default': di_processing_templates['BFA']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['BFA']['DI2023-05-24'],
-    },
-    'BHS': {
-        'default': di_processing_templates['BHS']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['BHS']['DI2023-05-24'],
-    },
-    'BIH': {
-        'default': di_processing_templates['BIH']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['BIH']['DI2023-05-24'],
-    },
-    'BLZ': {
-        'default': di_processing_templates['BLZ']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['BLZ']['DI2023-05-24'],
-    },
-    'BOL': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'BRB': {
-        'default': di_processing_templates['BRB']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['BRB']['DI2023-05-24'],
-    },
-    'BRN': {
-        'default': di_processing_templates['general']['copyUnspHFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
-    },
-    'CHL': {
-        'default': di_processing_templates['general']['copyUnspHFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
-    },
-    'CHN': {
-        'default': di_processing_templates['CHN']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['CHN']['DI2023-05-24'],
-    },
-    'CIV': {
-        'default': di_processing_templates['CIV']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['CIV']['DI2023-05-24'],
-    },
-    'CUB': {
-        'default': di_processing_templates['CUB']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['CUB']['DI2023-05-24'],
-    },
-    'DMA': {
-        'default': di_processing_templates['DMA']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['DMA']['DI2023-05-24'],
-    },
-    'ECU': {
-        'default': di_processing_templates['ECU']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['ECU']['DI2023-05-24'],
-    },
-    'EGY': {
-        'default': di_processing_templates['EGY']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['EGY']['DI2023-05-24'],
-    },
-    'ERI': {
-        'default': di_processing_templates['ERI']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['ERI']['DI2023-05-24'],
-    },
-    'ETH': {
-        'default': di_processing_templates['ETH']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['ETH']['DI2023-05-24'],
-    },
-    'GEO': {
-        'default': di_processing_templates['GEO']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['GEO']['DI2023-05-24'],
-    },
-    'GMB': {
-        'default': di_processing_templates['GMB']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['GMB']['DI2023-05-24'],
-    },
-    'GNB': {
-        'default': di_processing_templates['GNB']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['GNB']['DI2023-05-24'],
-    },
-    'GRD': {
-        'default': di_processing_templates['GRD']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['GRD']['DI2023-05-24'],
-    },
-    'HND': {
-        'default': di_processing_templates['HND']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['HND']['DI2023-05-24'],
-    },
-    'IDN': {
-        'default': di_processing_templates['IDN']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['IDN']['DI2023-05-24'],
-    },
-    'IND': {
-        'default': di_processing_templates['IND']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['IND']['DI2023-05-24'],
-    },
-    'ISR': {
-        'default': di_processing_templates['general']['copyHFCPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyHFCPFC'],
-    },
-    'JAM': {
-        'default': di_processing_templates['JAM']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['JAM']['DI2023-05-24'],
-    },
-    'JOR': {
-        'default': di_processing_templates['JOR']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['JOR']['DI2023-05-24'],
-    },
-    'KEN': {
-        'default': di_processing_templates['KEN']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['KEN']['DI2023-05-24'],
-    },
-    'KIR': {
-        'default': di_processing_templates['KIR']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['KIR']['DI2023-05-24'],
-    },
-    'KGZ': {
-        'default': di_processing_templates['general']['copyUnspHFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
-    },
-    'KOR': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'LCA': {
-        'default': di_processing_templates['LCA']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['LCA']['DI2023-05-24'],
-    },
-    'LKA': {
-        'default': di_processing_templates['general']['copyFGASES'],
-        'DI2023-05-24': di_processing_templates['general']['copyFGASES'],
-    },
-    'LSO': {
-        'default': di_processing_templates['LSO']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['LSO']['DI2023-05-24'],
-    },
-    'MAR': {
-        'default': di_processing_templates['MAR']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['MAR']['DI2023-05-24'],
-    },
-    'MDA': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'MDG': {
-        'default': di_processing_templates['MDG']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['MDG']['DI2023-05-24'],
-    },
-    'MDV': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'MEX': {
-        'default': di_processing_templates['general']['copyHFCPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyHFCPFC'],
-    },
-    'MHL': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'MKD': {
-        'default': di_processing_templates['MKD']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['MKD']['DI2023-05-24'],
-    },
-    'MLI': {
-        'default': di_processing_templates['MLI']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['MLI']['DI2023-05-24'],
-    },
-    'MMR': {
-        'default': di_processing_templates['MMR']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['MMR']['DI2023-05-24'],
-    },
-    'MNE': {
-        'default': di_processing_templates['general']['copyUnspHFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
-    },
-    'MNG': {
-        'default': di_processing_templates['general']['copyUnspHFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
-    },
-    'MOZ': {
-        'default': di_processing_templates['general']['copyPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyPFC'],
-    },
-    'MUS': {
-        'default': di_processing_templates['MUS']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['MUS']['DI2023-05-24'],
-    },
-    'PHL': {
-        'default': di_processing_templates['PHL']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['PHL']['DI2023-05-24'],
-    },
-    'PLW': {
-        'default': di_processing_templates['PLW']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['PLW']['DI2023-05-24'],
-    },
-    'PRY': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'PSE': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'RWA': {
-        'default': di_processing_templates['RWA']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['RWA']['DI2023-05-24'],
-    },
-    'SEN': {
-        'default': di_processing_templates['general']['copyHFCPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyHFCPFC'],
-    },
-    'SGP': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'SLB': {
-        'default': di_processing_templates['SLB']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['SLB']['DI2023-05-24'],
-    },
-    'SMR': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'STP': {
-        'default': di_processing_templates['STP']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['STP']['DI2023-05-24'],
-    },
-    'SWZ': {
-        'default': di_processing_templates['general']['copyUnspHFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
-    },
-    'TCD': {
-        'default': di_processing_templates['TCD']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['TCD']['DI2023-05-24'],
-    },
-    'THA': {
-        'default': di_processing_templates['THA']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['THA']['DI2023-05-24'],
-    },
-    'URY': {
-        'default': di_processing_templates['URY']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['URY']['DI2023-05-24'],
-    },
-    'UZB': {
-        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
-    },
-    'VCT': {
-        'default': di_processing_templates['VCT']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['VCT']['DI2023-05-24'],
-    },
-    'ZMB': {
-        'default': di_processing_templates['ZMB']['DI2023-05-24'],
-        'DI2023-05-24': di_processing_templates['ZMB']['DI2023-05-24'],
-    },
-    # 'ZWE': {
-    #     'default': di_processing_templates['ZWE']['DI2023-05-24'],
-    #     'DI2023-05-24': di_processing_templates['ZWE']['DI2023-05-24'],
-    # },
-}
-
-basket_copy_HFCPFC = {
-    'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-    'entities': ["HFCS", "PFCS"],
-    'source_GWP': gwp_to_use,
-},
-basket_copy_unspHFCPFC = {
-    'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-    'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
-    'source_GWP': gwp_to_use,
-},
-
-
-

+ 0 - 30
UNFCCC_GHG_data/UNFCCC_DI_reader/__init__.py

@@ -1,30 +0,0 @@
-# submodule to read data from UNFCCC DI API using the unfccc_di_api package
-
-#import unfccc_di_api
-from .UNFCCC_DI_reader_core import read_UNFCCC_DI_for_country,  \
-    convert_DI_data_to_pm2_if, convert_DI_IF_data_to_pm2, \
-    read_UNFCCC_DI_for_country_group
-
-from .UNFCCC_DI_reader_proc import process_UNFCCC_DI_for_country, \
-    process_and_save_UNFCCC_DI_for_country, process_UNFCCC_DI_for_country_group
-
-from .UNFCCC_DI_reader_datalad import read_DI_for_country_datalad, \
-read_DI_for_country_group_datalad, process_DI_for_country_datalad, \
-    process_DI_for_country_group_datalad
-
-from .UNFCCC_DI_reader_helper import determine_filename
-
-__all__ = [
-    "read_UNFCCC_DI_for_country",
-    "convert_DI_data_to_pm2_if",
-    "convert_DI_IF_data_to_pm2",
-    "read_UNFCCC_DI_for_country_group",
-    "process_UNFCCC_DI_for_country",
-    "process_and_save_UNFCCC_DI_for_country",
-    "process_UNFCCC_DI_for_country_group",
-    "process_DI_for_country_group_datalad",
-    "read_DI_for_country_datalad",
-    "process_DI_for_country_datalad",
-    "read_DI_for_country_group_datalad",
-    "determine_filename",
-]

+ 0 - 26
UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country.py

@@ -1,26 +0,0 @@
-"""
-This script is a wrapper around the read__for_country
-function such that it can be called from datalad
-"""
-
-import argparse
-from UNFCCC_GHG_data.UNFCCC_DI_reader import \
-    process_and_save_UNFCCC_DI_for_country
-
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country code')
-parser.add_argument('--date', help='String with date to read and process. If not '
-                                   'given latest data will be used', default=None)
-args = parser.parse_args()
-
-country_code = args.country
-date_str = args.date
-
-if date_str == "None":
-    date_str = None
-
-process_and_save_UNFCCC_DI_for_country(
-    country_code=country_code,
-    date_str=date_str,
-)

+ 0 - 22
UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_datalad.py

@@ -1,22 +0,0 @@
-"""
-wrapper around read_crf_for_country_datalad such that it can be called
-from doit in the current setup where doit runs on system python and
-not in the venv.
-"""
-
-from UNFCCC_GHG_data.UNFCCC_DI_reader import \
-    process_DI_for_country_datalad
-import argparse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country name or code')
-parser.add_argument('--date', help='String with date to read and process. If not '
-                                   'given latest data will be used', default=None)
-args = parser.parse_args()
-country = args.country
-date_str = args.date
-
-if date_str == "None":
-    date_str = None
-
-process_DI_for_country_datalad(country, date_str=date_str)

+ 0 - 25
UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_group.py

@@ -1,25 +0,0 @@
-"""
-This script is a wrapper around the process_UNFCCC_DI_for_country_group
-function such that it can be called from datalad
-"""
-
-import argparse
-from UNFCCC_GHG_data.UNFCCC_DI_reader import \
-    process_UNFCCC_DI_for_country_group
-
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--annexI', help='read for AnnexI countries (default is for '
-                                     'non-AnnexI)', action='store_true')
-parser.add_argument('--date', help='date of input data to use (default is None '
-                                       'to read latest data)', default=None)
-args = parser.parse_args()
-annexI = args.annexI
-date_str = args.date
-if date_str == "None":
-    date_str = None
-
-process_UNFCCC_DI_for_country_group(
-    annexI=annexI,
-    date_str=date_str,
-)

+ 0 - 25
UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_group_datalad.py

@@ -1,25 +0,0 @@
-"""
-wrapper around read_crf_for_country_datalad such that it can be called
-from doit in the current setup where doit runs on system python and
-not in the venv.
-"""
-
-from UNFCCC_GHG_data.UNFCCC_DI_reader import \
-    process_DI_for_country_group_datalad
-import argparse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--annexI', help='read for AnnexI countries (default is for '
-                                     'non-AnnexI)', action='store_true')
-parser.add_argument('--date', help='date of input data to use (default is None '
-                                       'to read latest data)', default=None)
-args = parser.parse_args()
-annexI = args.annexI
-date_str = args.date
-if date_str == "None":
-    date_str = None
-
-process_DI_for_country_group_datalad(
-    annexI=annexI,
-    date_str=date_str
-)

+ 0 - 27
UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country.py

@@ -1,27 +0,0 @@
-"""
-This script is a wrapper around the read__for_country
-function such that it can be called from datalad
-"""
-
-import argparse
-from UNFCCC_GHG_data.UNFCCC_DI_reader.UNFCCC_DI_reader_core import \
-    read_UNFCCC_DI_for_country
-
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country code')
-parser.add_argument('--date', help='String with current date')
-args = parser.parse_args()
-
-country_code = args.country
-date_str = args.date
-
-read_UNFCCC_DI_for_country(
-    country_code=country_code,
-    category_groups=None, # read all categories
-    read_subsectors=False, # not applicable as we read all categories
-    date_str=date_str,
-    pm2if_specifications=None, # automatically use the right specs for AI and NAI
-    default_gwp=None, # automatically uses right default GWP for AI and NAI
-    debug=False,
-)

+ 0 - 17
UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country_datalad.py

@@ -1,17 +0,0 @@
-"""
-wrapper around read_crf_for_country_datalad such that it can be called
-from doit in the current setup where doit runs on system python and
-not in the venv.
-"""
-
-from UNFCCC_GHG_data.UNFCCC_DI_reader import \
-    read_DI_for_country_datalad
-import argparse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country name or code')
-
-args = parser.parse_args()
-country = args.country
-
-read_DI_for_country_datalad(country)

+ 0 - 19
UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country_group.py

@@ -1,19 +0,0 @@
-"""
-This script is a wrapper around the read_UNFCCC_DI_for_country_group
-function such that it can be called from datalad
-"""
-
-import argparse
-from UNFCCC_GHG_data.UNFCCC_DI_reader import \
-    read_UNFCCC_DI_for_country_group
-
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--annexI', help='read for AnnexI countries (default is for '
-                                     'non-AnnexI)', action='store_true')
-args = parser.parse_args()
-annexI = args.annexI
-
-read_UNFCCC_DI_for_country_group(
-    annexI=annexI,
-)

+ 0 - 19
UNFCCC_GHG_data/UNFCCC_DI_reader/read_UNFCCC_DI_for_country_group_datalad.py

@@ -1,19 +0,0 @@
-"""
-wrapper around read_crf_for_country_datalad such that it can be called
-from doit in the current setup where doit runs on system python and
-not in the venv.
-"""
-
-from UNFCCC_GHG_data.UNFCCC_DI_reader import \
-    read_DI_for_country_group_datalad
-import argparse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--annexI', help='read for AnnexI countries (default is for '
-                                     'non-AnnexI)', action='store_true')
-args = parser.parse_args()
-annexI = args.annexI
-
-read_DI_for_country_group_datalad(
-    annexI=annexI,
-)

+ 0 - 19
UNFCCC_GHG_data/UNFCCC_DI_reader/util.py

@@ -1,19 +0,0 @@
-import unfccc_di_api
-import pandas as pd
-from UNFCCC_GHG_data.helper import code_path
-
-#reader = unfccc_di_api.UNFCCCApiReader()
-#nAI_countries = list(reader.non_annex_one_reader.parties["code"])
-nAI_countries = list(pd.read_csv(code_path / 'UNFCCC_DI_reader' /
-                                 'DI_NAI_parties.conf')["code"])
-#AI_countries = list(reader.annex_one_reader.parties["code"])
-AI_countries = list(pd.read_csv(code_path / 'UNFCCC_DI_reader' /
-                                'DI_AI_parties.conf')["code"])
-
-DI_date_format = '%Y-%m-%d'
-regex_date = r"([0-9]{4}-[0-9]{2}-[0-9]{2})"
-
-class NoDIDataError(Exception):
-    pass
-
-

+ 0 - 5
UNFCCC_GHG_data/UNFCCC_downloader/__init__.py

@@ -1,5 +0,0 @@
-from .unfccc_submission_info import get_unfccc_submission_info
-
-__all__ = [
-    "get_unfccc_submission_info",
-]

+ 0 - 195
UNFCCC_GHG_data/UNFCCC_downloader/download_annexI.py

@@ -1,195 +0,0 @@
-import argparse
-import pandas as pd
-import requests
-import shutil
-import time
-import os
-import zipfile
-from datetime import date
-from selenium.webdriver import Firefox
-from selenium.webdriver.firefox.options import Options
-from random import randrange
-from pathlib import Path
-
-from UNFCCC_GHG_data.helper import root_path, downloaded_data_path_UNFCCC
-
-###############
-#
-# TODO
-# download directly via selenium see link below
-# https://sqa.stackexchange.com/questions/2197/
-# how-to-download-a-file-using-seleniums-webdriver
-# for automatic downloading see https://stackoverflow.com/questions/70740163/
-# python-selenium-firefox-driver-dismiss-open-save-file-popup
-###############
-
-descr = 'Download and unzip data from UNFCCC National Inventory Submissions. ' \
-        'Based on download.py from national-inventory-submissions ' \
-        '(https://github.com/openclimatedata/national-inventory-submisions)'
-parser = argparse.ArgumentParser(description=descr)
-parser.add_argument(
-    '--category',
-    help='Category to download, CRF, NIR, SEF'
-)
-parser.add_argument(
-    '--year',
-    help='Year to download'
-)
-
-args = parser.parse_args()
-year = args.year
-category = args.category.upper()
-dataset = category + year
-print(f"Downloading data for {dataset}")
-
-# generate the correct url
-url = (
-    "https://unfccc.int/process/transparency-and-reporting/"
-    "reporting-and-review-under-the-convention/"
-    "greenhouse-gas-inventories-annex-i-parties/"
-    "submissions/national-inventory-submissions-{}".format(year)
-)
-
-# TODO: move to utils as used in two places
-if int(year) == 2019:
-    url = (
-        "https://unfccc.int/process-and-meetings/transparency-and-reporting/"
-        "reporting-and-review-under-the-convention/"
-        "greenhouse-gas-inventories-annex-i-parties/"
-        "national-inventory-submissions-{}".format(year)
-    )
-elif int(year) in range(2020,2023):
-    url = (
-        "https://unfccc.int/ghg-inventories-annex-i-parties/{}".format(year)
-    )
-elif int(year) >= 2023:
-    url = (
-        "https://unfccc.int/process-and-meetings/transparency-and-reporting/"
-        "reporting-and-review-under-the-convention/"
-        "greenhouse-gas-inventories-annex-i-parties/"
-        "national-inventory-submissions-{}".format(year)
-    )
-else:
-    url = (
-        "https://unfccc.int/process/transparency-and-reporting/"
-        "reporting-and-review-under-the-convention/"
-        "greenhouse-gas-inventories-annex-i-parties/"
-        "submissions/national-inventory-submissions-{}".format(year)
-    )
-
-error_file_sizes = [212, 210]
-
-# Read submissions list
-submissions = pd.read_csv(downloaded_data_path_UNFCCC / f"submissions-annexI_{year}.csv")
-
-# filter submissions list or category
-items = submissions[submissions.Kind  == category.upper()]
-
-# set options for headless mode
-profile_path = ".firefox"
-options = Options()
-#options.add_argument('-headless')
-
-# create profile for headless mode and automatic downloading
-options.set_preference('profile', profile_path)
-options.set_preference('browser.download.folderList', 2)
-
-# set up selenium driver
-driver = Firefox(options=options)
-# visit the main data page once to create cookies
-driver.get(url)
-
-# wait a bit for the website to load before we get the cokkies
-time.sleep(20)
-
-# get the session id cookie
-cookies_selenium = driver.get_cookies()
-cookies = {}
-for cookie in cookies_selenium:
-    cookies[cookie['name']] = cookie['value']
-
-new_downloaded = []
-
-for idx, submission in items.iterrows():
-    print("=" * 60)
-    title = submission.Title
-    url = submission.URL
-    country = submission.Country
-    country = country.replace(' ', '_')
-    print(f"Downloading {title} from {url}")
-
-    country_folder = downloaded_data_path_UNFCCC / country
-    if not country_folder.exists():
-        country_folder.mkdir()
-    local_filename = \
-        country_folder / dataset / \
-        url.split('/')[-1].replace("%20", "_").replace(" ", "_")
-    if not local_filename.parent.exists():
-        local_filename.parent.mkdir()
-
-    if local_filename.exists():
-        # check file size. if 210 or 212 bytes it's the error page
-        if Path(local_filename).stat().st_size in error_file_sizes:
-            # found the error page. delete file
-            os.remove(local_filename)
-    
-    # now we have removed error pages, so a present file should not be overwritten
-    if (not local_filename.exists()) and (not local_filename.is_symlink()):
-        i = 0  # reset counter
-        while not local_filename.exists() and i < 10:
-            # for i = 0 and i = 5 try to get a new session ID
-            if i == 1 or i == 5:
-                driver = Firefox(options=options)
-    
-                # visit the main data page once to create cookies
-                driver.get(url)
-                time.sleep(20)
-
-                # get the session id cookie
-                cookies_selenium = driver.get_cookies()
-                cookies = {}
-                for cookie in cookies_selenium:
-                    cookies[cookie['name']] = cookie['value']
-
-            r = requests.get(url, stream=True, cookies=cookies)
-            with open(str(local_filename), 'wb') as f:
-                shutil.copyfileobj(r.raw, f)
-            
-            # check file size. if 210 or 212 bytes it's the error page
-            if Path(local_filename).stat().st_size in error_file_sizes:
-                # found the error page. delete file
-                os.remove(local_filename)
-            
-            # sleep a bit to avoid running into captchas
-            time.sleep(randrange(5, 15))
-            
-        if local_filename.exists():
-            new_downloaded.append(submission)
-            print(f"Download => {local_filename.relative_to(root_path)}")
-            # unzip data (only for new downloads)
-            if local_filename.suffix == ".zip":
-                try:
-                    zipped_file = zipfile.ZipFile(str(local_filename), 'r')
-                    zipped_file.extractall(str(local_filename.parent))
-                    print(f"Extracted {len(zipped_file.namelist())} files.")
-                    zipped_file.close()
-                # TODO Better error logging/visibilty
-                except zipfile.BadZipFile:
-                    print(f"Error while trying to extract "
-                          f"{local_filename.relative_to(root_path)}")
-                except NotImplementedError:
-                    print("Zip format not supported, please unzip on the command line.")
-            else:
-                print(f"Not attempting to extract "
-                      f"{local_filename.relative_to(root_path)}.")
-        else:
-            print(f"Failed to download {local_filename.relative_to(root_path)}")
-
-    else:
-        print(f"=> Already downloaded {local_filename.relative_to(root_path)}")
-
-driver.close()
-
-df = pd.DataFrame(new_downloaded)
-df.to_csv(downloaded_data_path_UNFCCC
-          / f"00_new_downloads_{category}{year}-{date.today()}.csv", index=False)

+ 0 - 157
UNFCCC_GHG_data/UNFCCC_downloader/download_btr.py

@@ -1,157 +0,0 @@
-import argparse
-import pandas as pd
-import requests
-import shutil
-import time
-import os
-import zipfile
-from datetime import date
-from selenium.webdriver import Firefox
-from selenium.webdriver.firefox.options import Options
-from random import randrange
-from pathlib import Path
-
-from UNFCCC_GHG_data.helper import root_path, downloaded_data_path_UNFCCC
-from unfccc_submission_info import get_BTR_name_and_URL
-
-###############
-#
-# TODO
-# download directly via selenium see link below
-# https://sqa.stackexchange.com/questions/2197/
-# how-to-download-a-file-using-seleniums-webdriver
-# for automatic downloading see https://stackoverflow.com/questions/70740163/
-# python-selenium-firefox-driver-dismiss-open-save-file-popup
-###############
-
-descr = 'Download and unzip data from UNFCCC Biannial Transparency Reports Submissions. ' \
-        'Based on download.py from national-inventory-submissions ' \
-        '(https://github.com/openclimatedata/national-inventory-submisions)'
-parser = argparse.ArgumentParser(description=descr)
-
-parser.add_argument(
-    '--round',
-    help='Submission round to download, e.g. 1'
-)
-
-args = parser.parse_args()
-submission_round = int(args.round)
-
-round_name, url = get_BTR_name_and_URL(submission_round)
-dataset = f"BTR{submission_round}"
-
-print(f"Downloading data for {round_name} BTRs")
-
-error_file_sizes = [212, 210]
-
-# Read submissions list
-submissions = pd.read_csv(downloaded_data_path_UNFCCC / f"submissions-{dataset}.csv")
-
-# set options for headless mode
-profile_path = ".firefox"
-options = Options()
-#options.add_argument('-headless')
-
-# create profile for headless mode and automatic downloading
-options.set_preference('profile', profile_path)
-options.set_preference('browser.download.folderList', 2)
-
-# set up selenium driver
-driver = Firefox(options=options)
-# visit the main data page once to create cookies
-driver.get(url)
-
-# wait a bit for the website to load before we get the cookies
-time.sleep(20)
-
-# get the session id cookie
-cookies_selenium = driver.get_cookies()
-cookies = {}
-for cookie in cookies_selenium:
-    cookies[cookie['name']] = cookie['value']
-
-new_downloaded = []
-
-for idx, submission in submissions.iterrows():
-    print("=" * 60)
-    title = submission.Title
-    url = submission.URL
-    country = submission.Country
-    country = country.replace(' ', '_')
-    print(f"Downloading {title} from {url}")
-
-    country_folder = downloaded_data_path_UNFCCC / country
-    if not country_folder.exists():
-        country_folder.mkdir()
-    local_filename = \
-        country_folder / dataset / \
-        url.split('/')[-1].replace("%20", "_").replace(" ", "_")
-    if not local_filename.parent.exists():
-        local_filename.parent.mkdir()
-
-    if local_filename.exists():
-        # check file size. if 210 or 212 bytes it's the error page
-        if Path(local_filename).stat().st_size in error_file_sizes:
-            # found the error page. delete file
-            os.remove(local_filename)
-    
-    # now we have removed error pages, so a present file should not be overwritten
-    if (not local_filename.exists()) and (not local_filename.is_symlink()):
-        i = 0  # reset counter
-        while not local_filename.exists() and i < 10:
-            # for i = 0 and i = 5 try to get a new session ID
-            if i == 1 or i == 5:
-                driver = Firefox(options=options)
-    
-                # visit the main data page once to create cookies
-                driver.get(url)
-                time.sleep(20)
-
-                # get the session id cookie
-                cookies_selenium = driver.get_cookies()
-                cookies = {}
-                for cookie in cookies_selenium:
-                    cookies[cookie['name']] = cookie['value']
-
-            r = requests.get(url, stream=True, cookies=cookies)
-            with open(str(local_filename), 'wb') as f:
-                shutil.copyfileobj(r.raw, f)
-            
-            # check file size. if 210 or 212 bytes it's the error page
-            if Path(local_filename).stat().st_size in error_file_sizes:
-                # found the error page. delete file
-                os.remove(local_filename)
-            
-            # sleep a bit to avoid running into captchas
-            time.sleep(randrange(5, 15))
-            
-        if local_filename.exists():
-            new_downloaded.append(submission)
-            print(f"Download => {local_filename.relative_to(root_path)}")
-            # unzip data (only for new downloads)
-            if local_filename.suffix == ".zip":
-                try:
-                    zipped_file = zipfile.ZipFile(str(local_filename), 'r')
-                    zipped_file.extractall(str(local_filename.parent))
-                    print(f"Extracted {len(zipped_file.namelist())} files.")
-                    zipped_file.close()
-                # TODO Better error logging/visibilty
-                except zipfile.BadZipFile:
-                    print(f"Error while trying to extract "
-                          f"{local_filename.relative_to(root_path)}")
-                except NotImplementedError:
-                    print("Zip format not supported, please unzip on the command line.")
-            else:
-                print(f"Not attempting to extract "
-                      f"{local_filename.relative_to(root_path)}.")
-        else:
-            print(f"Failed to download {local_filename.relative_to(root_path)}")
-
-    else:
-        print(f"=> Already downloaded {local_filename.relative_to(root_path)}")
-
-driver.close()
-
-df = pd.DataFrame(new_downloaded)
-df.to_csv(downloaded_data_path_UNFCCC
-          / f"00_new_downloads_{dataset}-{date.today()}.csv", index=False)

+ 0 - 108
UNFCCC_GHG_data/UNFCCC_downloader/download_ndc.py

@@ -1,108 +0,0 @@
-import pandas as pd
-import requests
-import shutil
-import time
-import os
-import re
-from datetime import date
-from random import randrange
-from UNFCCC_GHG_data.helper import downloaded_data_path_UNFCCC
-from pathlib import Path
-
-"""
-based on download_bur from national-inventory-submissions
-# (https://github.com/openclimatedata/national-inventory-submisions)
-"""
-
-###############
-#
-# TODO
-# download directly via selenium see link below
-# https://sqa.stackexchange.com/questions/2197/
-# how-to-download-a-file-using-seleniums-webdriver
-###############
-
-# we use the ndc package provided by openclimatedata which is updated on
-# a daily basis
-submissions_url = "https://github.com/openclimatedata/ndcs/raw/main/data/ndcs.csv"
-submissions = pd.read_csv(submissions_url)
-
-url = "https://www4.unfccc.int/sites/NDCStaging/Pages/All.aspx"
-
-# if we get files of this size they are error pages and we need to
-# try the download again
-# TODO error page sizes are from BUR and NC and might differ for NDCs
-# if an error page is found instead of a pdf adjust sizes here
-error_file_sizes = [212, 210]
-ndc_regex = r".*\s([A-Za-z]*)\sNDC"
-
-# Ensure download path and subfolders exist
-if not downloaded_data_path_UNFCCC.exists():
-    downloaded_data_path_UNFCCC.mkdir(parents=True)
-
-new_downloaded = []
-
-for idx, submission in submissions.iterrows():
-    print("=" * 60)
-    #ndc = submission.Number
-    title = submission.Title
-    temp = re.findall(ndc_regex, title)
-    ndc = temp[0]
-    url = submission.EncodedAbsUrl
-    submission_date = submission.SubmissionDate
-    country = submission.Party
-    country = country.replace(' ', '_')
-    print(title)
-
-    ndc_folder = "NDC_" + ndc + "_" + submission_date
-
-    country_folder = downloaded_data_path_UNFCCC / country
-    if not country_folder.exists():
-        country_folder.mkdir()
-    local_filename = country_folder / ndc_folder / url.split('/')[-1]
-    local_filename_underscore = \
-        downloaded_data_path_UNFCCC / country / ndc_folder / \
-        url.split('/')[-1].replace("%20", "_").replace(" ", "_")
-    if not local_filename.parent.exists():
-        local_filename.parent.mkdir()
-
-    # this should never be needed but in case anything goes wrong and
-    # an error page is present it should be overwritten
-    if local_filename_underscore.exists():
-        # check file size. if 210 or 212 bytes it's the error page
-        if Path(local_filename_underscore).stat().st_size in error_file_sizes:
-            # found the error page. delete file
-            os.remove(local_filename_underscore)
-    
-    # now we have to remove error pages, so a present file should not be overwritten
-    if (not local_filename_underscore.exists()) \
-            and (not local_filename_underscore.is_symlink()):
-        i = 0  # reset counter
-        while not local_filename_underscore.exists() and i < 10:
-
-            r = requests.get(url, stream=True)
-            with open(str(local_filename_underscore), 'wb') as f:
-                shutil.copyfileobj(r.raw, f)
-            
-            # check file size. if 210 or 212 bytes it's the error page
-            if Path(local_filename_underscore).stat().st_size in error_file_sizes:
-                # found the error page. delete file
-                os.remove(local_filename_underscore)
-            
-            # sleep a bit to avoid running into captchas
-            time.sleep(randrange(5, 15))
-            
-        if local_filename_underscore.exists():
-            new_downloaded.append(submission)
-            print("Download => downloaded_data/UNFCCC/" + country + "/" +
-                  ndc_folder + "/" + local_filename_underscore.name)
-        else:
-            print("Failed downloading downloaded_data/UNFCCC/" + country + "/"
-                  + ndc_folder + "/" + local_filename_underscore.name)
-
-    else:
-        print("=> Already downloaded " + local_filename_underscore.name)
-
-
-df = pd.DataFrame(new_downloaded)
-df.to_csv(downloaded_data_path_UNFCCC / "00_new_downloads_ndc-{}.csv".format(date.today()), index=False)

+ 0 - 141
UNFCCC_GHG_data/UNFCCC_downloader/download_non-annexI.py

@@ -1,141 +0,0 @@
-import argparse
-import pandas as pd
-import requests
-import shutil
-import time
-import os
-from datetime import date
-from selenium.webdriver import Firefox
-from selenium.webdriver.firefox.options import Options
-from random import randrange
-from pathlib import Path
-from UNFCCC_GHG_data.helper import root_path, downloaded_data_path_UNFCCC
-
-###############
-#
-# TODO
-# download directly via selenium see link below
-# https://sqa.stackexchange.com/questions/2197/
-# how-to-download-a-file-using-seleniums-webdriver
-# for automatic downloading see https://stackoverflow.com/questions/70740163/
-# python-selenium-firefox-driver-dismiss-open-save-file-popup
-###############
-
-descr = 'Download data from UNFCCC non-AnnexI Submissions. ' \
-        'Based on download_bur.py from national-inventory-submissions ' \
-        '(https://github.com/openclimatedata/national-inventory-submisions)'
-parser = argparse.ArgumentParser(description=descr)
-parser.add_argument(
-    '--category',
-    help='Category to download: BUR, NC'
-)
-
-args = parser.parse_args()
-category = args.category.upper()
-print(f"Downloading {category} submissions")
-
-if category == "BUR":
-    url = "https://unfccc.int/BURs"
-else:
-    url = "https://unfccc.int/non-annex-I-NCs"
-
-# if we get files of this size they are error pages and we need to
-# try the download again
-error_file_sizes = [212, 210]
-
-# Read submissions list
-submissions = pd.read_csv(downloaded_data_path_UNFCCC / f"submissions-{category.lower()}.csv")
-
-# set options for headless mode
-profile_path = ".firefox"
-options = Options()
-#options.add_argument('-headless')
-
-# create profile for headless mode and automatic downloading
-options.set_preference('profile', profile_path)
-options.set_preference('browser.download.folderList', 2)
-
-# set up selenium driver
-driver = Firefox(options=options)
-# visit the main data page once to create cookies
-driver.get(url)
-
-# wait a bit for the website to load before we get the cookies
-time.sleep(20)
-
-# get the session id cookie
-cookies_selenium = driver.get_cookies()
-cookies = {}
-for cookie in cookies_selenium:
-    cookies[cookie['name']] = cookie['value']
-
-new_downloaded = []
-
-for idx, submission in submissions.iterrows():
-    print("=" * 60)
-    kind = submission.Kind
-    title = submission.Title
-    url = submission.URL
-    country = submission.Country
-    country = country.replace(' ', '_')
-    print(f"Downloading {title} from {url}")
-
-    country_folder = downloaded_data_path_UNFCCC / country
-    if not country_folder.exists():
-        country_folder.mkdir()
-    local_filename = \
-        country_folder / kind / \
-        url.split('/')[-1].replace("%20", "_").replace(" ", "_")
-    if not local_filename.parent.exists():
-        local_filename.parent.mkdir()
-
-    if local_filename.exists():
-        # check file size. if 210 or 212 bytes it's the error page
-        if Path(local_filename).stat().st_size in error_file_sizes:
-            # found the error page. delete file
-            os.remove(local_filename)
-    
-    # now we have removed error pages, so a present file should not be overwritten
-    if (not local_filename.exists()) and (not local_filename.is_symlink()):
-        i = 0  # reset counter
-        while not local_filename.exists() and i < 10:
-            # for i = 0 and i = 5 try to get a new session ID
-            if i == 1 or i == 5:
-                driver = Firefox(options=options)
-    
-                # visit the main data page once to create cookies
-                driver.get(url)
-                time.sleep(20)
-
-                # get the session id cookie
-                cookies_selenium = driver.get_cookies()
-                cookies = {}
-                for cookie in cookies_selenium:
-                    cookies[cookie['name']] = cookie['value']
-
-            r = requests.get(url, stream=True, cookies=cookies)
-            with open(str(local_filename), 'wb') as f:
-                shutil.copyfileobj(r.raw, f)
-            
-            # check file size. if 210 or 212 bytes it's the error page
-            if Path(local_filename).stat().st_size in error_file_sizes:
-                # found the error page. delete file
-                os.remove(local_filename)
-            
-            # sleep a bit to avoid running into captchas
-            time.sleep(randrange(5, 15))
-            
-        if local_filename.exists():
-            new_downloaded.append(submission)
-            print(f"Download => {local_filename.relative_to(root_path)}")
-        else:
-            print(f"Failed to download {local_filename.relative_to(root_path)}")
-
-    else:
-        print(f"=> Already downloaded {local_filename.relative_to(root_path)}")
-
-driver.close()
-
-df = pd.DataFrame(new_downloaded)
-df.to_csv(downloaded_data_path_UNFCCC /
-          f"00_new_downloads_{category}-{date.today()}.csv", index=False)

+ 0 - 145
UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_annexI.py

@@ -1,145 +0,0 @@
-import argparse
-import time
-import pandas as pd
-
-from pathlib import Path
-from bs4 import BeautifulSoup
-from selenium.webdriver import Firefox
-from selenium.webdriver.firefox.options import Options
-from random import randrange
-from unfccc_submission_info import get_unfccc_submission_info
-from UNFCCC_GHG_data.helper import downloaded_data_path_UNFCCC
-
-max_tries = 10
-
-descr = ("Download UNFCCC National Inventory Submissions lists "
-         "and create list of submissions as CSV file. Based on "
-         "process.py from national-inventory-submissions "
-         "(https://github.com/openclimatedata/national-inventory-submisions)")
-parser = argparse.ArgumentParser(description=descr)
-parser.add_argument(
-    '--year',
-    help='Year to download'
-)
-
-args = parser.parse_args()
-year = args.year
-
-print("Fetching submissions for {}".format(year))
-# TODO: move to utils as used in two places
-if int(year) == 2019:
-    url = (
-        "https://unfccc.int/process-and-meetings/transparency-and-reporting/"
-        "reporting-and-review-under-the-convention/"
-        "greenhouse-gas-inventories-annex-i-parties/"
-        "national-inventory-submissions-{}".format(year)
-    )
-elif int(year) in range(2020,2025):
-    url = (
-        "https://unfccc.int/ghg-inventories-annex-i-parties/{}".format(year)
-    )
-elif int(year) >= 2025:
-    url = (
-        "https://unfccc.int/process-and-meetings/transparency-and-reporting/"
-        "reporting-and-review-under-the-convention/"
-        "greenhouse-gas-inventories-annex-i-parties/"
-        "national-inventory-submissions-{}".format(year)
-    )
-else:
-    url = (
-        "https://unfccc.int/process/transparency-and-reporting/"
-        "reporting-and-review-under-the-convention/"
-        "greenhouse-gas-inventories-annex-i-parties/"
-        "submissions/national-inventory-submissions-{}".format(year)
-    )
-
-print(f"Using {url} to get submissions list")
-
-# set options for headless mode
-profile_path = ".firefox"
-options = Options()
-options.add_argument('-headless')
-
-# create profile for headless mode and automatic downloading
-options.set_preference('profile', profile_path)
-
-# set up selenium driver
-driver = Firefox(options=options)
-driver.get(url)
-
-html = BeautifulSoup(driver.page_source, "html.parser")
-
-table = html.find("table")
-
-# check if table found. if not the get command didn't work, likely because of a captcha on the site
-### TODO replace by error message
-if not table:
-    # try to load html file from disk
-    print('Download failed, trying to load manually downloaded file')
-    file = open("manual_page_downloads/National-Inventory-Submissions-{}.html".format(year))
-    content = file.read()
-    html = BeautifulSoup(content, "html.parser")
-    table = html.find("table")
-    if not table:
-        print(
-            "Manually downloaded file " + "manual_page_downloads/National-Inventory-Submissions-{}.html".format(year) +
-            " not found")
-        exit()
-
-links = table.findAll('a')
-
-targets = []  # sub-pages
-downloads = []
-no_downloads = []
-
-# Check links for Zipfiles or subpages
-for link in links:
-    if "href" not in link.attrs:
-        continue
-    href = link.attrs["href"]
-    if "/documents/" in href:
-        if "title" in link.attrs.keys():
-            title = link.attrs["title"]
-        else:
-            title = link.contents[0]
-        if href.startswith("/documents"):
-            href = "https://unfccc.int" + href
-        # Only add pages in the format https://unfccc.int/documents/65587
-        # to further downloads
-        if str(Path(href).parent).endswith("documents"):
-            targets.append({"title": title, "url": href})
-    elif href.endswith(".zip"):
-        if href.startswith("/files"):
-            href = "https://unfccc.int" + href
-        country = Path(href).name.split("-")[0].upper()
-        title = f"{country} {link.contents[0]}"
-        filename = Path(href).name
-        file_parts = filename.split('-')
-        if len(file_parts) >= 2:
-            kind = file_parts[2].upper()
-        elif filename.startswith('asr'):
-            kind = 'CRF'
-        else:
-            kind = None
-
-        print("\t".join([kind, country, title, href]))
-        downloads.append({"Kind": kind, "Country": country, "Title": title, "URL": href})
-
-# Go through sub-pages.
-for target in targets:
-    time.sleep(randrange(5, 15))
-    url = target["url"]
-
-    submission_info = get_unfccc_submission_info(url, driver, 10)
-
-    if submission_info:
-        downloads = downloads + submission_info
-    else:
-        no_downloads.append({target["title"], url})
-
-if len(no_downloads) > 0:
-    print("No downloads for ", no_downloads)
-
-driver.close()
-df = pd.DataFrame(downloads)
-df.to_csv(downloaded_data_path_UNFCCC / f"submissions-annexI_{year}.csv", index=False)

+ 0 - 97
UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_btr.py

@@ -1,97 +0,0 @@
-import argparse
-import time
-import pandas as pd
-
-from pathlib import Path
-from bs4 import BeautifulSoup
-from selenium.webdriver import Firefox
-from selenium.webdriver.firefox.options import Options
-from random import randrange
-from unfccc_submission_info import (get_unfccc_submission_info,
-                                    get_BTR_name_and_URL)
-from UNFCCC_GHG_data.helper import downloaded_data_path_UNFCCC
-
-max_tries = 10
-
-descr = ("Download UNFCCC Biannial Transparency Reports Submissions lists "
-         "and create list of submissions as CSV file. Based on "
-         "process.py from national-inventory-submissions "
-         "(https://github.com/openclimatedata/national-inventory-submisions)")
-parser = argparse.ArgumentParser(description=descr)
-parser.add_argument(
-    '--round',
-    help='1 for first BTRs, 2 for second BTRs etc.'
-)
-
-args = parser.parse_args()
-submission_round = int(args.round)
-
-round_name, url = get_BTR_name_and_URL(submission_round)
-
-print(f"Fetching submissions for {round_name} BTRs")
-print(f"Using {url} to get submissions list")
-
-# set options for headless mode
-profile_path = ".firefox"
-options = Options()
-options.add_argument('-headless')
-
-# create profile for headless mode and automatic downloading
-options.set_preference('profile', profile_path)
-
-# set up selenium driver
-driver = Firefox(options=options)
-driver.get(url)
-
-html = BeautifulSoup(driver.page_source, "html.parser")
-
-table = html.find("table")
-
-# check if table found. if not the get command didn't work, likely because of a captcha on the site
-### TODO replace by error message
-if not table:
-    raise RuntimeError('No table found on URL. Possibly due to a captcha.')
-
-links = table.findAll('a')
-
-targets = []  # sub-pages
-downloads = []
-no_downloads = []
-
-# Check links for Zipfiles or subpages
-for link in links:
-    if "href" not in link.attrs:
-        continue
-    href = link.attrs["href"]
-    if "/documents/" in href:
-        if "title" in link.attrs.keys():
-            title = link.attrs["title"]
-        else:
-            title = link.contents[0]
-        if href.startswith("/documents"):
-            href = "https://unfccc.int" + href
-        # Only add pages in the format https://unfccc.int/documents/65587
-        # to further downloads
-        if str(Path(href).parent).endswith("documents"):
-            targets.append({"title": title, "url": href})
-    else:
-        print(f"Ignored link: {href}: not in the right format.")
-
-# Go through sub-pages.
-for target in targets:
-    time.sleep(randrange(5, 15))
-    url = target["url"]
-
-    submission_info = get_unfccc_submission_info(url, driver, 10)
-
-    if submission_info:
-        downloads = downloads + submission_info
-    else:
-        no_downloads.append({target["title"], url})
-
-if len(no_downloads) > 0:
-    print("No downloads for ", no_downloads)
-
-driver.close()
-df = pd.DataFrame(downloads)
-df.to_csv(downloaded_data_path_UNFCCC / f"submissions-BTR{submission_round}.csv", index=False)

+ 0 - 86
UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_bur.py

@@ -1,86 +0,0 @@
-#import requests
-import time
-import pandas as pd
-import re
-
-from pathlib import Path
-from bs4 import BeautifulSoup
-from selenium.webdriver import Firefox
-from selenium.webdriver.firefox.options import Options
-from random import randrange
-from unfccc_submission_info import get_unfccc_submission_info
-from UNFCCC_GHG_data.helper import downloaded_data_path_UNFCCC
-
-"""
-Download UNFCCC Biennial Update Report submissions
-from Non-Annex I Parties and create list of submissions as CSV file
-Based on `process_bur` from national-inventory-submissions 
-(https://github.com/openclimatedata/national-inventory-submisions)
-"""
-
-print("Fetching BUR submissions ...")
-
-url = "https://unfccc.int/BURs"
-
-#print(url)
-
-# set options for headless mode
-profile_path = ".firefox"
-options = Options()
-options.add_argument('-headless')
-
-# create profile for headless mode and automatic downloading
-options.set_preference('profile', profile_path)
-
-# set up selenium driver
-driver = Firefox(options=options)
-driver.get(url)
-
-html = BeautifulSoup(driver.page_source, "html.parser")
-table = html.find_all("table")[1]
-links = table.findAll("a")
-
-targets = []  # sub-pages
-downloads = []
-no_downloads = []
-
-# Check links for Zipfiles or subpages
-for link in links:
-    if "href" not in link.attrs:
-        continue
-    href = link.attrs["href"]
-    if "/documents/" in href:
-        if "title" in link.attrs.keys():
-            title = link.attrs["title"]
-        else:
-            title = link.contents[0]
-        if href.startswith("/documents"):
-            href = "https://unfccc.int" + href
-        # Only add pages in the format https://unfccc.int/documents/65587
-        # to further downloads
-        if str(Path(href).parent).endswith("documents"):
-            targets.append({"title": title, "url": href})
-
-
-pattern = re.compile(r"BUR ?\d")
-
-# Go through sub-pages.
-for target in targets:
-    time.sleep(randrange(5, 15))
-    url = target["url"]
-
-    submission_info = get_unfccc_submission_info(url, driver, 10)
-
-    if submission_info:
-        downloads = downloads + submission_info
-    else:
-        no_downloads.append({target["title"], url})
-
-
-if len(no_downloads) > 0:
-    print("No downloads for ", no_downloads)
-
-driver.close()
-df = pd.DataFrame(downloads)
-df = df[["Kind", "Country", "Title", "URL"]]
-df.to_csv(downloaded_data_path_UNFCCC / "submissions-bur.csv", index=False)

+ 0 - 88
UNFCCC_GHG_data/UNFCCC_downloader/fetch_submissions_nc.py

@@ -1,88 +0,0 @@
-#import requests
-import time
-import pandas as pd
-import re
-
-from pathlib import Path
-from bs4 import BeautifulSoup
-from selenium.webdriver import Firefox
-from selenium.webdriver.firefox.options import Options
-from random import randrange
-from UNFCCC_GHG_data.UNFCCC_downloader import \
-    get_unfccc_submission_info
-from UNFCCC_GHG_data.helper import downloaded_data_path_UNFCCC
-
-"""
-Download UNFCCC Biennial Update Report submissions
-from Non-Annex I Parties and create list of submissions as CSV file
-Based on `process_bur` from national-inventory-submissions 
-(https://github.com/openclimatedata/national-inventory-submisions)
-"""
-
-print("Fetching NC submissions ...")
-
-url = "https://unfccc.int/non-annex-I-NCs"
-
-#print(url)
-
-# set options for headless mode
-profile_path = ".firefox"
-options = Options()
-options.add_argument('-headless')
-
-# create profile for headless mode and automatic downloading
-options.set_preference('profile', profile_path)
-
-# set up selenium driver
-driver = Firefox(options=options)
-driver.get(url)
-
-html = BeautifulSoup(driver.page_source, "html.parser")
-table = html.find_all("table")[1]
-links = table.findAll("a")
-
-targets = []  # sub-pages
-downloads = []
-no_downloads = []
-
-# Check links for Zipfiles or subpages
-for link in links:
-    if "href" not in link.attrs:
-        continue
-    href = link.attrs["href"]
-    if "/documents/" in href:
-        if "title" in link.attrs.keys():
-            title = link.attrs["title"]
-        else:
-            title = link.contents[0]
-        if href.startswith("/documents"):
-            href = "https://unfccc.int" + href
-        # Only add pages in the format https://unfccc.int/documents/65587
-        # to further downloads
-        if str(Path(href).parent).endswith("documents"):
-            targets.append({"title": title, "url": href})
-
-
-pattern = re.compile(r"NC ?\d")
-
-
-# Go through sub-pages.
-for target in targets:
-    time.sleep(randrange(5, 15))
-    url = target["url"]
-
-    submission_info = get_unfccc_submission_info(url, driver, 10)
-
-    if submission_info:
-        downloads = downloads + submission_info
-    else:
-        no_downloads.append({target["title"], url})
-
-
-if len(no_downloads) > 0:
-    print("No downloads for ", no_downloads)
-
-driver.close()
-df = pd.DataFrame(downloads)
-df = df[["Kind", "Country", "Title", "URL"]]
-df.to_csv(downloaded_data_path_UNFCCC / "submissions-nc.csv", index=False)

+ 0 - 291
UNFCCC_GHG_data/UNFCCC_reader/Argentina/config_ARG_BUR5.py

@@ -1,291 +0,0 @@
-### config for reading and conversion to primap2 format
-time_format = "%Y"
-
-coords_cols = {
-    "category": "id_ipcc",
-    "entity": "tipo_de_gas",
-    "time": "año",
-    "data": "valor_en_toneladas_de_gas",
-}
-
-add_coords_cols = {}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "ARG-GHG-Inventory",
-    "provenance": "measured",
-    "area": "ARG",
-    "scenario": "BUR5",
-    #"unit": "tonnes" # this might not work as he entity has to be specified
-}
-
-unit = 't'
-
-coords_value_mapping = {
-    "category": "PRIMAP1",
-    "unit": "PRIMAP1",
-    "entity": {
-        'HFC_23': 'HFC23',
-        'HFC_32': 'HFC32',
-        'HFC_125': 'HFC125',
-        'HFC_134a': 'HFC134a',
-        'HFC_152a': 'HFC152a',
-        'HFC_143a': 'HFC143a',
-        'HFC_227ea': 'HFC227ea',
-        'HFC_236fa': 'HFC236fa',
-        'HFC_365mfc': 'HFC365mfc',
-        'HFC_245fa': 'HFC245fa',
-        'PFC_143_CF4': 'CF4',
-        'PFC_116_C2F6': 'C2F6',
-    },
-}
-
-coords_value_filling = {
-}
-
-filter_remove = {
-}
-
-filter_keep = {}
-
-meta_data = {
-    "ref": "https://unfccc.int/documents/634953",
-    "ref2": "https://ciam.ambiente.gob.ar/repositorio.php?tid=9&stid=36&did=394#",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "",
-    "comment": "Read fom pcsv file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-
- ### config for processing
-
-# many custom categories which are not in climate categories, so automatic
-# aggregation would be a lot of coding work
-cats_to_agg = { # name is just for readability, not used
-    '1.A.1.c': {'sources': ['1.A.1.c.ii'],
-                'name': 'Manufacture of Solid Fuels and Other Energy Industries'},
-    '1.A.1': {'sources': ['1.A.1.a', '1.A.1.b', '1.A.1.c'],
-              'name': 'Energy Industries'},
-    '1.A.2': {'sources': ['1.A.2.a', '1.A.2.b', '1.A.2.c', '1.A.2.d',
-                          '1.A.2.e', '1.A.2.f', '1.A.2.g', '1.A.2.j',
-                          '1.A.2.l', '1.A.2.m'],
-              'name': 'Manufacturing Industries and Construction'},
-    '1.A.3.a': {'sources': ['1.A.3.a.ii'],
-                'name': 'Civil Aviation'},
-    '1.A.3.b': {'sources': ['1.A.3.b.iii', '1.A.3.b.vii'],
-                'name': 'Road Transportation'},
-    '1.A.3.d': {'sources': ['1.A.3.d.ii'],
-                'name': 'Water-Borne Navigation'},
-    '1.A.3.e': {'sources': ['1.A.3.e.i'],
-                'name': 'Other Transportation'},
-    '1.A.3': {'sources': ['1.A.3.a', '1.A.3.b', '1.A.3.c', '1.A.3.d',
-                          '1.A.3.e'],
-              'name': 'Transport'},
-    '1.A.4.a': {'sources': ['1.A.4.a.i', '1.A.4.a.ii', '1.A.4.a.iii'],
-                'name': 'Commercial/Institutional'},
-    '1.A.4': {'sources': ['1.A.4.a', '1.A.4.b', '1.A.4.c'],
-              'name': 'Other Sectors'},
-    '1.A': {'sources': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-            'name': 'Fuel Combustion Activities'},
-    '1.B.1.a.i': {'sources': ['1.B.1.a.i.1', '1.B.1.a.i.2'],
-                  'name': 'Underground mines'},
-    '1.B.1.a': {'sources': ['1.B.1.a.i'],
-                'name': 'Coal Mining and Handling'},
-    '1.B.1.c': {'sources': ['1.B.1.c.i'],
-                'name': 'Solid Fuel Transformation'},
-    '1.B.1': {'sources': ['1.B.1.a', '1.B.1.c'],
-              'name': 'Solid Fuels'},
-    '1.B.2.a': {'sources': ['1.B.2.a.i', '1.B.2.a.ii', '1.B.2.a.iii',
-                            '1.B.2.a.iv'],
-                'name': 'Oil'},
-    '1.B.2.b': {'sources': ['1.B.2.b.i', '1.B.2.b.ii', '1.B.2.b.iii',
-                            '1.B.2.b.iv', '1.B.2.b.v', '1.B.2.b.vi'],
-                'name': 'Natural Gas'},
-    '1.B.2': {'sources': ['1.B.2.a', '1.B.2.b'],
-              'name': 'Oil and Natural Gas'},
-    '1.B': {'sources': ['1.B.1', '1.B.2'],
-            'name': 'Fugitive Emissions from Fuels'},
-    '1': {'sources': ['1.A', '1.B'],
-          'name': 'Energy'},
-    '2.A.4': {'sources': ['2.A.4.a', '2.A.4.b', '2.A.4.d'],
-              'name': 'Other Process Uses of Carbonates'},
-    '2.A': {'sources': ['2.A.1', '2.A.2', '2.A.4'],
-            'name': 'Mineral Industry'},
-    '2.B.8': {'sources': ['2.B.8.a', '2.B.8.b', '2.B.8.c', '2.B.8.f'],
-              'name': 'Petrochemical and Carbon Black Production'},
-    '2.B.9': {'sources': ['2.B.9.a'],
-              'name': 'Fluorochemical Production'},
-    '2.B': {'sources': ['2.B.1', '2.B.2', '2.B.5', '2.B.7', '2.B.8', '2.B.9'],
-            'name': 'Chemical Industry'},
-    '2.C': {'sources': ['2.C.1', '2.C.2', '2.C.3', '2.C.6'],
-            'name': 'Metal Industry'},
-    '2.D': {'sources': ['2.D.1', '2.D.2'],
-            'name': 'Non-Energy Products from Fuels and Solvent Use'},
-    '2.F.1': {'sources': ['2.F.1.a', '2.F.1.b'],
-              'name': 'Refrigeration and Air Conditioning'},
-    '2.F': {'sources': ['2.F.1', '2.F.2', '2.F.3', '2.F.4'],
-            'name': 'Product Uses as Substitutes for Ozone Depleting Substances'},
-    '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.F'],
-          'name': 'IPPU'},
-    # AFOLU
-    # 3.A - Livestock
-    '3.A.1.a': {'sources': ['3.A.1.a.i', '3.A.1.a.ii'],
-                'name': 'Cattle'},
-    '3.A.1': {'sources': ['3.A.1.a',  '3.A.1.b', '3.A.1.c', '3.A.1.d',
-                          '3.A.1.e', '3.A.1.f', '3.A.1.g', '3.A.1.h'],
-              'name': 'Enteric Fermentation'},
-    '3.A.2.a': {'sources': ['3.A.2.a.i', '3.A.2.a.ii'],
-                'name': 'Cattle'},
-    '3.A.2': {'sources': ['3.A.2.a', '3.A.2.b', '3.A.2.c', '3.A.2.d',
-                          '3.A.2.e', '3.A.2.f', '3.A.2.g', '3.A.2.h',
-                          '3.A.2.i'],
-              'name': 'Enteric Fermentation'},
-    '3.A': {'sources': ['3.A.1', '3.A.2'],
-            'name': 'Livestock'},
-    # 3.B - Land
-    '3.B.1.a.i': {'sources': ['3.B.1.a.i.1', '3.B.1.a.i.2'],
-                  'name': ''}, # no name, not the normal IPCC category
-    '3.B.1.a.ii': {'sources': ['3.B.1.a.ii.1', '3.B.1.a.ii.2'],
-                   'name': ''}, # no name, not the normal IPCC category
-    '3.B.1.a': {'sources': ['3.B.1.a.i', '3.B.1.a.ii'],
-                'name': 'Forest Land Remaining Forest Land'},
-    # '3.B.1.b': {'sources': ['3.B.1.b.i', '3.B.1.b.ii'],
-    #             'name': 'Land Converted to Forest Land'},
-    '3.B.1': {'sources': ['3.B.1.a'],#, '3.B.1.b'],
-              'name': 'Forest Land'},
-    '3.B.2.b': {'sources': ['3.B.2.b.i', '3.B.2.b.ii'],
-                'name': 'Land Converted to Cropland'},
-    '3.B.2': {'sources': ['3.B.2.b'],
-              'name': 'Cropland'},
-    '3.B.3.b': {'sources': ['3.B.3.b.i', '3.B.3.b.ii'],
-                'name': 'Land Converted to Grassland'},
-    '3.B.3': {'sources': ['3.B.3.b'],
-              'name': 'Grassland'},
-    '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.3', '3.B.7'],
-            'name': 'Land'},
-    # 3.C - Aggregate Sources and Non-CO2 Emissions Sources on Land
-    '3.C.1.a': {'sources': ['3.C.1.a.i', '3.C.1.a.ii'],
-                'name': 'Biomass Burning in Forest Lands'},
-    '3.C.1.b': {'sources': ['3.C.1.b.i', '3.C.1.b.ii'],
-                'name': 'Biomass Burning in Croplands'},
-    'M.3.C.1.b.AG': {'sources': ['3.C.1.b.i'],
-                     'name': 'Biomass Burning in Croplands - Agriculture'},
-    'M.3.C.1.b.LU': {'sources': ['3.C.1.b.ii'],
-                     'name': 'Biomass Burning in Croplands - LULUCF'},
-    '3.C.1.c': {'sources': ['3.C.1.c.i', '3.C.1.c.ii'],
-                'name': 'Biomass Burning in Grasslands'},
-    'M.3.C.1.c.AG': {'sources': ['3.C.1.c.i'],
-                     'name': 'Biomass Burning in Grasslands - Agriculture'},
-    'M.3.C.1.c.LU': {'sources': ['3.C.1.c.ii'],
-                     'name': 'Biomass Burning in Grasslands - LULUCF'},
-    '3.C.1': {'sources': ['3.C.1.a', '3.C.1.b', '3.C.1.c'],
-              'name': 'Biomass Burning'},
-    'M.3.C.1.AG': {'sources': ['M.3.C.1.b.AG', 'M.3.C.1.c.AG'],
-                   'name': 'Biomass Burning - Agriculture'},
-    'M.3.C.1.LU': {'sources': ['3.C.1.a', 'M.3.C.1.b.LU', 'M.3.C.1.c.LU'],
-                   'name': 'Biomass Burning'},
-    '3.C.4.d': {'sources': ['3.C.4.d.i', '3.C.4.d.ii', '3.C.4.d.iii',
-                            '3.C.4.d.iv', '3.C.4.d.v', '3.C.4.d.vi',
-                            '3.C.4.d.vii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.4.g': {'sources': ['3.C.4.g.i', '3.C.4.g.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.4': {'sources': ['3.C.4.a', '3.C.4.b', '3.C.4.c', '3.C.4.d',
-                          '3.C.4.e', '3.C.4.f', '3.C.4.g', '3.C.4.n',
-                          '3.C.4.o'],
-              'name': 'Direct N2O Emissions from Managed Soils'},
-    '3.C.5.a': {'sources': ['3.C.5.a.i', '3.C.5.a.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5.b': {'sources': ['3.C.5.b.i', '3.C.5.b.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5.c': {'sources': ['3.C.5.c.i', '3.C.5.c.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5.d.i': {'sources': ['3.C.5.d.i.1', '3.C.5.d.i.2'],
-                  'name': ''}, # not standard IPCC2006
-    '3.C.5.d.ii': {'sources': ['3.C.5.d.ii.1', '3.C.5.d.ii.2'],
-                   'name': ''}, # not standard IPCC2006
-    '3.C.5.d.iii': {'sources': ['3.C.5.d.iii.1', '3.C.5.d.iii.2'],
-                    'name': ''}, # not standard IPCC2006
-    '3.C.5.d.iv': {'sources': ['3.C.5.d.iv.1', '3.C.5.d.iv.2'],
-                   'name': ''}, # not standard IPCC2006
-    '3.C.5.d.v': {'sources': ['3.C.5.d.v.1', '3.C.5.d.v.2'],
-                  'name': ''}, # not standard IPCC2006
-    '3.C.5.d.vi': {'sources': ['3.C.5.d.vi.1', '3.C.5.d.vi.2'],
-                   'name': ''}, # not standard IPCC2006
-    '3.C.5.d.vii': {'sources': ['3.C.5.d.vii.1', '3.C.5.d.vii.2'],
-                    'name': ''}, # not standard IPCC2006
-    '3.C.5.d': {'sources': ['3.C.5.d.i', '3.C.5.d.ii', '3.C.5.d.iii',
-                            '3.C.5.d.iv', '3.C.5.d.v', '3.C.5.d.vi',
-                            '3.C.5.d.vii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5.f': {'sources': ['3.C.5.f.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5.g.i': {'sources': ['3.C.5.g.i.1', '3.C.5.g.i.2'],
-                  'name': ''}, # not standard IPCC2006
-    '3.C.5.g.ii': {'sources': ['3.C.5.g.ii.1', '3.C.5.g.ii.2'],
-                   'name': ''}, # not standard IPCC2006
-    '3.C.5.g': {'sources': ['3.C.5.g.i', '3.C.5.g.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5.n': {'sources': ['3.C.5.n.i', '3.C.5.n.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5.o': {'sources': ['3.C.5.o.i', '3.C.5.o.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.5': {'sources': ['3.C.5.a', '3.C.5.b', '3.C.5.c', '3.C.5.d',
-                          '3.C.5.e', '3.C.5.f', '3.C.5.g', '3.C.5.n',
-                          '3.C.5.o'],
-              'name': 'Indirect N2O Emissions from Managed Soils'},
-    '3.C.6.a.i': {'sources': ['3.C.6.a.i.1'],
-                  'name': ''}, # not standard IPCC2006
-    '3.C.6.a.ii': {'sources': ['3.C.6.a.ii.1', '3.C.6.a.ii.2'],
-                   'name': ''}, # not standard IPCC2006
-    '3.C.6.a': {'sources': ['3.C.6.a.i', '3.C.6.a.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.6.h': {'sources': ['3.C.6.h.i', '3.C.6.h.ii'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.6.i': {'sources': ['3.C.6.i.i'],
-                'name': ''}, # not standard IPCC2006
-    '3.C.6': {'sources': ['3.C.6.a', '3.C.6.h', '3.C.6.i'],
-              'name': 'Indirect N2O Emissions from Manure Management'},
-    '3.C': {'sources': ['3.C.1', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-            'name': 'Emissions from Biomass Burning'},
-    'M.3.C.AG': {'sources': ['M.3.C.1.AG', '3.C.3', '3.C.4', '3.C.5', '3.C.6',
-                             '3.C.7'],
-                 'name': 'Emissions from Biomass Burning - Agriculture'},
-    'M.AG.ELV': {'sources': ['M.3.C.AG'],
-                 'name': 'Agriculture Excluding Livestock'},
-    'M.3.C.LU': {'sources': ['M.3.C.1.LU'],
-                 'name': 'Emissions from Biomass Burning - LULUCF'},
-    '3.D': {'sources': ['3.D.1'],
-            'name': 'Other'},
-    'M.3.D.LU': {'sources': ['3.D.1'],
-                 'name': 'Other - LULUCF'},
-    '3': {'sources': ['3.A', '3.B', '3.C', '3.D'],
-          'name': 'AFOLU'},
-    'M.AG': {'sources': ['3.A', 'M.3.C.AG'],
-             'name': 'Agriculture'},
-    'M.LULUCF': {'sources': ['3.B', 'M.3.C.LU', '3.D'],
-                 'name': 'LULUCF'},
-    # waste
-    '4.A': {'sources': ['4.A.1', '4.A.3'],
-            'name': 'Solid Waste Disposal'},
-    '4.C': {'sources': ['4.C.1'],
-            'name': 'Incineration and Open Burning of Waste'},
-    '4.D.2': {'sources': ['4.D.2.a', '4.D.2.b', '4.D.2.c', '4.D.2.d', '4.D.2.e'],
-              'name': 'Industrial Wastewater Treatment and Discharge'},
-    '4.D': {'sources': ['4.D.1', '4.D.2'],
-            'name': 'Wastewater Treatment and Discharge'},
-    '4': {'sources': ['4.A', '4.B', '4.C', '4.D'],
-          'name': 'Waste'},
-    # national totals
-    '0': {'sources': ['1', '2', '3', '4'],
-          'name': 'National Total'},
-    'M.0.EL': {'sources': ['1', '2', 'M.AG', '4'],
-               'name': 'National Total Excluding LULUCF'},
-}

+ 0 - 404
UNFCCC_GHG_data/UNFCCC_reader/Argentina/read_ARG_BUR4_from_pdf.py

@@ -1,404 +0,0 @@
-# this script reads data from Chile's 2020 national inventory which is underlying BUR4
-# Data is read from the xlsx file
-
-import sys
-import camelot
-import primap2 as pm2
-from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path, \
-    process_data_for_country
-from UNFCCC_GHG_data.UNFCCC_DI_reader.UNFCCC_DI_reader_config import gas_baskets
-
-
-# ###
-# configuration
-# ###
-
-# TODO: lot's of empty lines are written in csv file. check if solved with new
-#  PRIMAP2 version
-
-# folders and files
-input_folder = downloaded_data_path / 'UNFCCC' / 'Argentina' / \
-               'BUR4'
-output_folder = extracted_data_path / 'UNFCCC' / 'Argentina'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'ARG_BUR4_2022_'
-
-pdf_file = '4to_Informe_Bienal_de_la_Rep%C3%BAblica_Argentina.pdf'
-
-# definitions part 1: reading data from pdf and preprocessing for conversion to PRIMAP2 format
-# part 1.1 KyotoGHG, CO2, CH4, N2O tables
-#
-pages_to_read = range(232, 244)
-data_start_keyword = "Id#"
-data_end_keyword = "Fuente: Elaboración propia"
-index_cols = ['Id#', 'Nombre']
-col_rename = {
-    index_cols[0]: "category",
-    index_cols[1]: "orig_cat_name"
-}
-metadata = {
-    "entity": [0, 1],
-    "unit": [0, 2]
-}
-
-rows_to_drop = [0]
-
-metadata_mapping = {
-    'unit': {
-        '(GgCO2e)': 'GgCO2e',
-        '(GgCO2)': 'Gg',
-        '(GgN2O)': 'Gg',
-        '(GgCH4)': 'Gg',
-        '(GgGas)': 'Gg',
-    }
-}
-
-# part 1.2: fgases table
-# the f-gases table is in wide format with no sectoral resolution and gases as row header
-pages_to_read_fgases = range(244, 247)
-data_start_keyword_fgases = "Gas"
-index_cols_fgases = ['Gas']
-cols_to_drop_fgases = ["Nombre"]
-metadata_fgases = {
-    "unit": [0, 2],
-    "category": '2',
-    "orig_cat_name": "PROCESOS INDUSTRIALES Y USO DE PRODUCTOS",
-}
-col_rename_fgases = {
-    index_cols_fgases[0]: "entity",
-}
-
-## definitions for conversion to PRIMAP2 format
-# rows to remove
-cats_remove = ["Information Items", "Memo Items (3)"]
-# manual category codes
-cat_codes_manual = {  # conversion to PRIMAP1 format
-    '1A6': 'MBIO',
-    '1A3di': 'MBKM',
-    '1A3ai': 'MBKA',
-    '1A3di Navegación marítima y fluvial internacional': 'MBKM',
-    'S/N': 'MMULTIOP',
-}
-
-cat_code_regexp = r'(?P<UNFCCC_GHG_data>^[A-Z0-9]{1,8}).*'
-
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "ARG-GHG-Inventory",
-    "provenance": "measured",
-    "area": "ARG",
-    "scenario": "BUR4",
-}
-
-coords_value_mapping = {
-    #    "category": "PRIMAP1",
-    "entity": {
-        'HFC-23': 'HFC23',
-        'HFC-32': 'HFC32',
-        'HFC-41': 'HFC41',
-        'HFC-43-10mee': 'HFC4310mee',
-        'HFC-125': 'HFC125',
-        'HFC-134': 'HFC134',
-        'HFC-134a': 'HFC134a',
-        'HFC-152a': 'HFC152a',
-        'HFC-143': 'HFC143',
-        'HFC-143a': 'HFC143a',
-        'HFC-227ea': 'HFC227ea',
-        'HFC-236fa': 'HFC236fa',
-        'HFC-245ca': 'HFC245ca',
-        'HFC-365mfc': 'HFC365mfc',
-        'HFC-245fa': 'HFC245fa',
-        'PFC-143 (CF4)': 'CF4',
-        'PFC-116 (C2F6)': 'C2F6',
-        'PFC-218 (C3F8)': 'C3F8',
-        'PFC-31-10 (C4F10)': 'C4F10',
-        'c-C4F8': 'cC4F8',
-        'PFC-51-144 (C6F14)': 'C6F14',
-    },
-    "unit": "PRIMAP1",
-    "orig_cat_name": {
-        "1A3di Navegación marítima y fluvial internacional": "Navegación marítima y fluvial internacional",
-    }
-}
-
-coords_value_filling = {
-    "category": {
-        "orig_cat_name": {
-            "Total de emisiones y absorciones nacionales": "0",
-            "Navegación marítima y fluvial internacional": "M.BK.M",
-            "Operaciones Multilaterales": "M.MULTIOP",
-            "Emisiones de CO2 provenientes del uso de biomasa como combustible": "M.BIO",
-        },
-    },
-    "orig_cat_name": {
-        "category": {
-            "M.BK.M": "Navegación marítima y fluvial internacional",
-        },
-    },
-}
-
-filter_remove = {
-    "f1": {
-        "orig_cat_name": ["Elementos Recordatorios"],
-    },
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/419772",
-    "rights": "XXXX",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Cuarto Informe Bienal de Actualización de la República Argentina a la Convención Marco delas Naciones Unidas Sobre el Cambio Climático",
-    "comment": "Read fom pdf file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# start data reading
-# ###
-
-# change working directory to script directory for proper folder names
-script_path = os.path.abspath(sys.argv[0])
-script_dir_name = os.path.dirname(script_path)
-os.chdir(script_dir_name)
-
-# read data for KyotoGHG, CO2, CH4, N2O
-data_all = None
-for page in pages_to_read:
-    # read current page
-    tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page),
-                              flavor='stream')
-    df_current = tables[0].df
-    rows_to_drop = []
-    for index, data in df_current.iterrows():
-        if data[0] == data_start_keyword:
-            break
-        else:
-            rows_to_drop.append(index)
-
-    end_of_data = False
-    for index, data in df_current.iterrows():
-        if data_end_keyword in list(data):
-            end_of_data = True
-        if end_of_data:
-            rows_to_drop.append(index)
-
-    df_current = df_current.drop(rows_to_drop)
-    idx_header = df_current.index[df_current[0] == index_cols[0]].tolist()
-    df_current = df_current.rename(
-        dict(zip(df_current.columns, list(df_current.loc[idx_header[0]]))), axis=1)
-    df_current = df_current.drop(idx_header)
-
-    # for sheet "Aggregate GHGs" fill entity cell
-    if page in range(232, 235):
-        df_current.iloc[
-            metadata["entity"][0], metadata["entity"][1]] = "KYOTOGHG (SARGWP100)"
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols, inplace=True)
-    # set index. necessary for the stack operation in the conversion to long format
-    # df_current = df_current.set_index(index_cols)
-    # add columns
-    inserted = 0
-    for col in metadata.keys():
-        # print(f"coordinates: {metadata[col][0]}, {metadata[col][1]}")
-        value = df_current.iloc[metadata[col][0], metadata[col][1] + inserted]
-        if col in metadata_mapping.keys():
-            if value in metadata_mapping[col].keys():
-                value = metadata_mapping[col][value]
-        # print(f"Inserting column {col} with value {value}")
-        df_current.insert(2, col, value)
-        inserted += 1
-
-    # drop unit row
-    # for row in rows_to_drop:
-    #    df_current = df_current.drop(df_current.iloc[row].name)
-    df_current = df_current.drop(df_current.index[0])
-
-    # fix number format
-    df_current = df_current.apply(lambda x: x.str.replace('.', '', regex=False), axis=1)
-    df_current = df_current.apply(lambda x: x.str.replace(',', '.', regex=False),
-                                  axis=1)
-
-    df_current.rename(columns=col_rename, inplace=True)
-
-    # reindex
-    df_current.reset_index(inplace=True, drop=True)
-
-    df_current["category"] = df_current["category"].replace(cat_codes_manual)
-    # then the regex replacements
-    repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('UNFCCC_GHG_data'))
-    df_current["category"] = df_current["category"].str.replace(cat_code_regexp, repl,
-                                                                regex=True)
-
-    df_current = df_current.reset_index(drop=True)
-
-    # make sure all col headers are str
-    df_current.columns = df_current.columns.map(str)
-
-    # convert to PRIMAP2 interchange format
-    data_if = pm2.pm2io.convert_wide_dataframe_if(
-        df_current,
-        coords_cols=coords_cols,
-        add_coords_cols=add_coords_cols,
-        coords_defaults=coords_defaults,
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=coords_value_mapping,
-        coords_value_filling=coords_value_filling,
-        filter_remove=filter_remove,
-        filter_keep=filter_keep,
-        meta_data=meta_data
-    )
-
-    # convert to PRIMAP2 native format
-    data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-    # aggregate to one df
-    if data_all is None:
-        data_all = data_pm2
-    else:
-        data_all = data_all.pr.merge(data_pm2)
-
-# read fgases
-for page in pages_to_read_fgases:
-    # read current page
-    tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page),
-                              flavor='stream')
-    df_current = tables[0].df
-    rows_to_drop = []
-    for index, data in df_current.iterrows():
-        if data[0] == data_start_keyword_fgases:
-            break
-        else:
-            rows_to_drop.append(index)
-
-    end_of_data = False
-    for index, data in df_current.iterrows():
-        if data_end_keyword in list(data):
-            end_of_data = True
-        if end_of_data:
-            rows_to_drop.append(index)
-
-    df_current = df_current.drop(rows_to_drop)
-    idx_header = df_current.index[df_current[0] == index_cols_fgases[0]].tolist()
-    df_current = df_current.rename(
-        dict(zip(df_current.columns, list(df_current.loc[idx_header[0]]))), axis=1)
-    df_current = df_current.drop(idx_header)
-
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols_fgases, inplace=True)
-    # set index. necessary for the stack operation in the conversion to long format
-    # df_current = df_current.set_index(index_cols)
-    # add columns
-    inserted = 0
-    for col in metadata_fgases.keys():
-        # print(f"coordinates: {metadata[col][0]}, {metadata[col][1]}")
-        if isinstance(metadata_fgases[col], str):
-            value = metadata_fgases[col]
-        else:
-            value = df_current.iloc[
-                metadata_fgases[col][0], metadata_fgases[col][1] + inserted]
-            if col in metadata_mapping.keys():
-                if value in metadata_mapping[col].keys():
-                    value = metadata_mapping[col][value]
-        # print(f"Inserting column {col} with value {value}")
-        df_current.insert(2, col, value)
-        inserted += 1
-
-    # remove unnecessary columns
-    df_current = df_current.drop(columns=cols_to_drop_fgases)
-
-    # drop unit row
-    df_current = df_current.drop(df_current.index[0])
-
-    # fix number format
-    df_current = df_current.apply(lambda x: x.str.replace('.', '', regex=False), axis=1)
-    df_current = df_current.apply(lambda x: x.str.replace(',', '.', regex=False),
-                                  axis=1)
-
-    df_current.rename(columns=col_rename_fgases, inplace=True)
-
-    # reindex
-    df_current.reset_index(inplace=True, drop=True)
-
-    df_current["category"] = df_current["category"].replace(cat_codes_manual)
-    # then the regex repalcements
-    repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('UNFCCC_GHG_data'))
-    df_current["category"] = df_current["category"].str.replace(cat_code_regexp, repl,
-                                                                regex=True)
-
-    df_current = df_current.reset_index(drop=True)
-
-    # make sure all col headers are str
-    df_current.columns = df_current.columns.map(str)
-
-    # convert to PRIMAP2 interchange format
-    data_if = pm2.pm2io.convert_wide_dataframe_if(
-        df_current,
-        coords_cols=coords_cols,
-        add_coords_cols=add_coords_cols,
-        coords_defaults=coords_defaults,
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=coords_value_mapping,
-        coords_value_filling=coords_value_filling,
-        filter_remove=filter_remove,
-        filter_keep=filter_keep,
-        meta_data=meta_data
-    )
-
-    # convert to PRIMAP2 native format
-    data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-    # aggregate to one df
-    data_all = data_all.pr.merge(data_pm2)
-
-# ###
-# process (aggregate fgases)
-# ###
-data_all = process_data_for_country(
-    data_all,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=None,
-)
-
-
-# ###
-# save data to IF and native format
-# ###
-
-encoding = {var: compression for var in data_all.data_vars}
-data_all.pr.to_netcdf(output_folder / (output_filename + coords_terminologies[
-    "category"] + ".nc"), encoding=encoding)
-
-data_if = data_all.pr.to_interchange_format()
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-
-
-
-

+ 0 - 118
UNFCCC_GHG_data/UNFCCC_reader/Argentina/read_ARG_BUR5_from_csv.py

@@ -1,118 +0,0 @@
-# this script reads data from Argentina's 2023 national inventory which is underlying BUR5
-# https://ciam.ambiente.gob.ar/repositorio.php?tid=9&stid=36&did=394#
-# Data is read from the csv file available for download at the above URL
-# license probably CC-BY 4.0 (see https://datos.gob.ar/dataset/ambiente-emisiones-gases-efecto-invernadero-gei)
-
-import pandas as pd
-import primap2 as pm2
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path, \
-    process_data_for_country
-from UNFCCC_GHG_data.helper import gas_baskets, compression
-
-from config_ARG_BUR5 import unit, time_format, filter_keep, filter_remove
-from config_ARG_BUR5 import (coords_cols, coords_defaults, coords_terminologies,
-                              coords_value_mapping, coords_value_filling)
-from config_ARG_BUR5 import cats_to_agg, meta_data
-
-# INFO
-# data is in long format. Columns needed are
-# 'año' 'id_ipcc' 'tipo_de_gas' 'valor_en_toneladas_de_gas'
-# columns to irgnore are
-# columns_to_ignore = ['sector', 'actividad', 'subactividad', 'categoria', 'valor_en_toneladas_de_co2e']
-# sector codes are in primap1 format (no dots), reading should be possible directly from CSV into interchange format
-# postprocessing needed is aggregation of gas baskets and categories as only the highest detail categories are present
-
-
-# ###
-# configuration
-# ###
-
-# folders and files
-input_folder = downloaded_data_path / 'UNFCCC' / 'Argentina' / \
-               'BUR5'
-output_folder = extracted_data_path / 'UNFCCC' / 'Argentina'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'ARG_BUR5_2023_'
-
-csv_file = 'emisiones_gei_inventario_datos_totales_1990_2020.csv'
-
-
-# read the data
-data_pd = pd.read_csv(
-    input_folder / csv_file,
-    sep=';',
-    parse_dates=[coords_cols["time"]],
-    usecols=list(coords_cols.values()),
-)
-
-data_pd["unit"] = unit
-coords_cols["unit"] = "unit"
-
-data_if = pm2.pm2io.convert_long_dataframe_if(
-    data_pd,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_value_mapping=coords_value_mapping,
-    coords_value_filling=coords_value_filling,
-    coords_terminologies=coords_terminologies,
-    filter_remove=filter_remove,
-    filter_keep=filter_keep,
-    meta_data=meta_data,
-    time_format=time_format,
-)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(output_folder /
-                                   (output_filename + coords_terminologies["category"]
-                                    + "_raw"), data_if)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(output_folder /
-                      (output_filename + coords_terminologies["category"]
-                       + "_raw" + ".nc"), encoding=encoding)
-
-### processing
-data_proc_pm2 = data_pm2
-
-# actual processing
-country_processing = {
-    'aggregate_cats': cats_to_agg,
-}
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=country_processing,
-)
-
-# adapt source and metadata
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-data_proc_pm2 = data_proc_pm2.pr.loc[{"source": ["BUR_NIR"]}]
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"]), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + ".nc"),
-    encoding=encoding)
-
-

+ 0 - 226
UNFCCC_GHG_data/UNFCCC_reader/Burundi/read_BDI_BUR1_from_pdf.py

@@ -1,226 +0,0 @@
-import camelot
-import primap2 as pm2
-import pandas as pd
-
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from UNFCCC_GHG_data.helper.functions import process_data_for_country
-
-from config_BDI_BUR1 import (
-    inv_conf,
-    meta_data,
-    filter_remove,
-    coords_value_mapping,
-    coords_terminologies,
-    coords_defaults,
-    coords_cols,
-    gas_baskets,
-    country_processing_step1,
-    inv_conf_per_year,
-)
-
-# ###
-# configuration
-# ###
-
-input_folder = downloaded_data_path / "UNFCCC" / "Burundi" / "BUR1"
-output_folder = extracted_data_path / "UNFCCC" / "Burundi"
-
-if not output_folder.exists():
-    output_folder.mkdir()
-
-pdf_file = "Burundi_BUR_1_Report__Francais.pdf"
-output_filename = "BDI_BUR1_2023_"
-category_column = f"category ({coords_terminologies['category']})"
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# 1. Read in tables
-# ###
-
-df_all = None
-for year in inv_conf_per_year.keys():
-    print("-" * 60)
-    print(f"Reading year {year}.")
-    print("-" * 60)
-    df_year = None
-    for page in inv_conf_per_year[year]["pages_to_read"]:
-        print(f"Reading table from page {page}.")
-        tables_inventory_original = camelot.read_pdf(
-            str(input_folder / pdf_file),
-            pages=page,
-            flavor="lattice",
-            split_text=True,
-        )
-        print("Reading complete.")
-
-        df_page = tables_inventory_original[0].df
-
-        if df_year is None:
-            df_year = df_page
-        else:
-            df_year = pd.concat(
-                [df_year, df_page],
-                axis=0,
-                join="outer",
-            ).reset_index(drop=True)
-
-    print(f"Concatenating all tables for {year}.")
-    # remove line breaks
-    for column in df_year.columns:
-        df_year[column] = df_year[column].str.replace("\n", "")
-
-    # fix broken values in cells
-    if "fix_values" in inv_conf_per_year[year].keys():
-        for index, column, value in inv_conf_per_year[year]["fix_values"]:
-            df_year.at[index, column] = value
-
-    # delete extra columns
-    if "delete_columns" in inv_conf_per_year[year].keys():
-        for column in inv_conf_per_year[year]["delete_columns"]:
-            df_year = df_year.drop(columns=column)
-        df_year.columns = range(df_year.columns.size)
-
-    df_header = pd.DataFrame([inv_conf["header"], inv_conf["unit"]])
-
-    df_year = pd.concat([df_header, df_year[2:]], axis=0, join="outer").reset_index(
-        drop=True
-    )
-
-    df_year = pm2.pm2io.nir_add_unit_information(
-        df_year,
-        unit_row=inv_conf["unit_row"],
-        entity_row=inv_conf["entity_row"],
-        regexp_entity=".*",
-        regexp_unit=".*",
-        default_unit="Gg",
-    )
-
-    print("Added unit information.")
-
-    # set index
-    df_year = df_year.set_index(inv_conf["index_cols"])
-
-    # convert to long format
-    df_year_long = pm2.pm2io.nir_convert_df_to_long(
-        df_year, year, inv_conf["header_long"]
-    )
-
-    # extract from tuple
-    df_year_long["orig_cat_name"] = df_year_long["orig_cat_name"].str[0]
-
-    # prep for conversion to PM2 IF and native format
-    # make a copy of the categories row
-    df_year_long["category"] = df_year_long["orig_cat_name"]
-
-    # replace cat names by codes in col "category"
-    # first the manual replacements
-    df_year_long["category"] = df_year_long["category"].str.replace("\n", "")
-
-    df_year_long["category"] = df_year_long["category"].replace(
-        inv_conf["cat_codes_manual"]
-    )
-
-    df_year_long["category"] = df_year_long["category"].str.replace(".", "")
-
-    # then the regex replacements
-    def repl(m):
-        return m.group("code")
-
-    df_year_long["category"] = df_year_long["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_year_long = df_year_long.reset_index(drop=True)
-
-    df_year_long["data"] = df_year_long["data"].str.replace(",", ".")
-
-    # TODO: I don't think there are NE1 in the tables.
-    # df_year_long["data"] = df_year_long["data"].str.replace("NE1", "NE")
-
-    # make sure all col headers are str
-    df_year_long.columns = df_year_long.columns.map(str)
-
-    df_year_long = df_year_long.drop(columns=["orig_cat_name"])
-
-    if df_all is None:
-        df_all = df_year_long
-    else:
-        df_all = pd.concat(
-            [df_all, df_year_long],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-### convert to interchange format ###
-print("Converting to interchange format.")
-df_all_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-
-### convert to primap2 format ###
-print("Converting to primap2 format.")
-data_pm2 = pm2.pm2io.from_interchange_format(df_all_IF)
-
-
-# ###
-# Save raw data to IF and native format.
-# ###
-
-data_if = data_pm2.pr.to_interchange_format()
-
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_if,
-)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding,
-)
-
-
-# ###
-# Processing
-# ###
-
-data_proc_pm2 = process_data_for_country(
-    data_country=data_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    filter_dims=None,
-    cat_terminology_out=None,
-    category_conversion=None,
-    sectors_out=None,
-    processing_info_country=country_processing_step1,
-)
-
-# ###
-# save processed data to IF and native format
-# ###
-
-terminology_proc = coords_terminologies["category"]
-
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if
-)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"), encoding=encoding
-)
-
-print("Saved processed data.")

+ 0 - 186
UNFCCC_GHG_data/UNFCCC_reader/Chile/config_CHL_BUR4.py

@@ -1,186 +0,0 @@
-## parameters for conversion to IPCC2006 categories
-filter_remove_IPCC2006 = {
-    "filter_cats": { # filter cats that have no 1:1 match for IPCC2006 or are additional subsectors
-        "category (IPCC2006_PRIMAP)": [
-            # refrigeration and air conditioning subsectors don't match IPCC2006
-            '2.F.1.a', '2.F.1.b', '2.F.1.c', '2.F.1.d', '2.F.1.e', '2.F.1.f',
-            # additional subsectors for other cattle in enteric fermentation
-            '3.A.1.b.i', '3.A.1.b.ii', '3.A.1.b.iii', '3.A.1.b.iv', '3.A.1.b.v',
-            # additional subcategories for swine in enteric fermentation
-            '3.A.3.a', '3.A.3.b', '3.A.3.c',
-            # other animals in enteric fermentation not fitting the IPCC2006 other animals
-            '3.A.4',
-            # need to be summed to '3.A.4.j'
-            '3.A.4.f', '3.A.4.g', '3.A.4.g.i', '3.A.4.g.ii',
-            # additional subsectors for other cattle in enteric fermentation
-            '3.B.1.b.i', '3.B.1.b.ii', '3.B.1.b.iii', '3.B.1.b.iv', '3.B.1.b.v',
-            # additional subcategories for swine in enteric fermentation
-            '3.B.3.a', '3.B.3.b', '3.B.3.c',
-            # other animals in enteric fermentation not fitting the IPCC2006 other animals
-            '3.B.4',
-            # need to be summed to '3.A.4.j'
-            '3.B.4.f', '3.B.4.g', '3.B.4.g.i', '3.B.4.g.ii',
-            # subsectors of indirect N2O from manure management
-            '3.B.5.a', '3.B.5.b', '3.B.5.c', '3.B.5.d', '3.B.5.d.i', '3.B.5.d.ii',
-            '3.B.5.d.iii', '3.B.5.d.iv', '3.B.5.d.v', '3.B.5.d.vi', '3.B.5.d.vii',
-            # subsectors of rice cultivation
-            '3.C.1', '3.C.2', '3.C.3', '3.C.4',
-            # no direct represenation of "agricultural soils" in IPCC 2006
-            '3.D',
-            # subsectors of 3.D.1. not matching subsectors of 3.C.4 (direct emissions from managed soils)
-            # '3.D.1.a.': '3.C.1.a', '3.D.1.b.': '3.C.1.b', '3.D.1.c.': '3.A.4.c', '3.D.1.d.': '3.C.4.d',
-            '3.D.1.a', '3.D.1.b', '3.D.1.b.i', '3.D.1.b.ii', '3.D.1.b.iii', '3.D.1.c',
-            '3.D.1.d', '3.D.1.e', '3.D.1.f', '3.D.1.g',
-            # additional subsector level of 3.D.2.a (3.C.5.a Atmospheric deposition)
-            '3.D.2.a.i', '3.D.2.a.ii', '3.D.2.a.ii.1', '3.D.2.a.ii.2', '3.D.2.a.ii.3', '3.D.2.a.iii',
-            # additional subsector level of 3.D.2.b (3.C.5.b Nitrongen leaching and runoff)
-            '3.D.2.b.i', '3.D.2.b.ii', '3.D.2.b.ii.1', '3.D.2.b.ii.2', '3.D.2.b.ii.3', '3.D.2.b.iii',
-            '3.D.2.b.iv', '3.D.2.b.v',
-            # additional subsector level of 3.F (3.C.1.b Biomass burning in cropland)
-            '3.F.1', '3.F.2', '3.F.3',
-            # additional subsector level of 3.G (3.C.2 Liming)
-            '3.G.1', '3.G.2',
-            # additional subsector levels of 4.A.1 (3.A.1.a Forest land remaining forest land)
-            '4.A.1.a', '4.A.1.a.i', '4.A.1.a.i.1', '4.A.1.a.i.1.a', '4.A.1.a.i.1.b', '4.A.1.a.i.1.c',
-            '4.A.1.a.i.1.d', '4.A.1.a.i.1.e', '4.A.1.a.i.1.f', '4.A.1.a.i.1.g', '4.A.1.a.i.1.h', 
-            '4.A.1.a.i.1.i', '4.A.1.a.i.1.j', '4.A.1.a.i.1.k', '4.A.1.a.i.1.l', '4.A.1.a.i.2', 
-            '4.A.1.a.i.2.a', '4.A.1.a.i.2.b', '4.A.1.a.i.2.c', '4.A.1.a.i.2.d', '4.A.1.a.i.2.e',
-            '4.A.1.a.i.2.f', '4.A.1.a.i.2.g', '4.A.1.a.i.2.h', '4.A.1.a.i.2.i', '4.A.1.a.i.2.j',
-            '4.A.1.a.i.2.k', '4.A.1.a.i.2.l', '4.A.1.a.i.3', '4.A.1.a.i.3.a', '4.A.1.a.i.3.b',
-            '4.A.1.a.i.3.c', '4.A.1.a.i.3.d', '4.A.1.a.i.3.e', '4.A.1.a.i.3.f', '4.A.1.a.i.3.g',
-            '4.A.1.a.i.3.h', '4.A.1.a.i.3.i', '4.A.1.a.i.3.j', '4.A.1.a.i.3.k', '4.A.1.a.i.3.l',
-            '4.A.1.a.ii', '4.A.1.a.ii.1', '4.A.1.a.ii.2', '4.A.1.a.ii.3', '4.A.1.a.ii.4',
-            '4.A.1.a.ii.5', '4.A.1.a.ii.6', '4.A.1.a.ii.7', '4.A.1.b', '4.A.1.b.i', '4.A.1.b.i.1',
-            '4.A.1.b.i.2', '4.A.1.b.i.3', '4.A.1.b.i.4', '4.A.1.b.ii', '4.A.1.b.ii.1', '4.A.1.b.ii.2',
-            '4.A.1.b.iii', '4.A.1.b.iii.1', '4.A.1.b.iii.1.a', '4.A.1.b.iii.1.b', '4.A.1.b.iii.2',
-            '4.A.1.b.iv', '4.A.1.c', '4.A.1.c.i', '4.A.1.c.ii',
-            # additional subsector level in land converted to forest land
-            '4.A.2.a.i', '4.A.2.a.ii', '4.A.2.b.i', '4.A.2.b.ii', '4.A.2.c.i', '4.A.2.c.ii',
-            '4.A.2.d.i', '4.A.2.d.ii', '4.A.2.e.i', '4.A.2.e.ii',
-            # subsectors of solid waste disposal might not match
-            '5.A.1', '5.A.2', '5.A.3',
-        ],
-    },
-}
-
-
-cat_mapping = { # categories not listed here have the same UNFCCC_GHG_data as in IPCC 2006 specifications
-    '3': 'M.AG',
-    '3.A': '3.A.1',
-    '3.A.1': '3.A.1.a',
-    '3.A.1.a': '3.A.1.a.i',
-    '3.A.1.b': '3.A.1.a.ii',
-    '3.A.2': '3.A.1.c',
-    '3.A.3': '3.A.1.h',
-    '3.A.4.a': '3.A.1.b',
-    '3.A.4.b': '3.A.1.d',
-    '3.A.4.c': '3.A.1.f',
-    '3.A.4.d': '3.A.1.g',
-    '3.A.4.e': '3.A.1.i',
-    '3.B': '3.A.2',
-    '3.B.1': '3.A.2.a',
-    '3.B.1.a': '3.A.2.a.i',
-    '3.B.1.b': '3.A.2.a.ii',
-    '3.B.2': '3.A.2.c',
-    '3.B.3': '3.A.2.h',
-    '3.B.4.a': '3.A.2.b',
-    '3.B.4.b': '3.A.2.d',
-    '3.B.4.c': '3.A.2.f',
-    '3.B.4.d': '3.A.2.g',
-    '3.B.4.e': '3.A.2.i',
-    '3.B.5': '3.C.6',
-    '3.C': '3.C.7',
-    '3.D.1': '3.C.4', 
-    '3.D.2': '3.C.5',
-    '3.D.2.a': '3.C.5.a', # not in climate_categories
-    '3.D.2.b': '3.C.5.b', # not in climate_categories
-    '3.E': '3.C.1.c',
-    '3.F': '3.C.1.b',
-    '3.G': '3.C.2',
-    '3.H': '3.C.3',
-    '3.I': '3.C.8.a', # merge this with cat below
-    '3.J': '3.C.8.b', # merge with cat above
-    '4': 'M.LULUCF',
-    '4.A': '3.B.1',
-    '4.A.1': '3.B.1.a',
-    '4.A.2': '3.B.1.b',
-    '4.A.2.a': '3.B.1.b.i',
-    '4.A.2.b': '3.B.1.b.ii',
-    '4.A.2.c': '3.B.1.b.iii',
-    '4.A.2.d': '3.B.1.b.iv',
-    '4.A.2.e': '3.B.1.b.v',
-    '4.B': '3.B.2',
-    '4.B.1': '3.B.2.a',
-    '4.B.2': '3.B.2.b',
-    '4.B.2.a': '3.B.2.b.i',
-    '4.B.2.b': '3.B.2.b.ii',
-    '4.B.2.c': '3.B.2.b.iii',
-    '4.B.2.d': '3.B.2.b.iv',
-    '4.B.2.e': '3.B.2.b.v',
-    '4.C': '3.B.3',
-    '4.C.1': '3.B.3.a',
-    '4.C.2': '3.B.3.b',
-    '4.C.2.a': '3.B.3.b.i',
-    '4.C.2.b': '3.B.3.b.ii',
-    '4.C.2.c': '3.B.3.b.iii',
-    '4.C.2.d': '3.B.3.b.iv',
-    '4.C.2.e': '3.B.3.b.v',
-    '4.D': '3.B.4',
-    '4.D.1': '3.B.4.a',
-    '4.D.2': '3.B.4.b',
-    '4.D.2.a': '3.B.4.b.i',
-    '4.D.2.b': '3.B.4.b.ii',
-    '4.D.2.c': '3.B.4.b.iii',
-    '4.D.2.d': '3.B.4.b.iv',
-    '4.D.2.e': '3.B.4.b.v',
-    '4.E': '3.B.5',
-    '4.E.1': '3.B.5.a',
-    '4.E.2': '3.B.5.b',
-    '4.E.2.a': '3.B.5.b.i',
-    '4.E.2.b': '3.B.5.b.ii',
-    '4.E.2.c': '3.B.5.b.iii',
-    '4.E.2.d': '3.B.5.b.iv',
-    '4.E.2.e': '3.B.5.b.v',
-    '4.F': '3.B.6',
-    '4.F.1': '3.B.6.a',
-    '4.F.2': '3.B.6.b',
-    '4.F.2.a': '3.B.6.b.i',
-    '4.F.2.b': '3.B.6.b.ii',
-    '4.F.2.c': '3.B.6.b.iii',
-    '4.F.2.d': '3.B.6.b.iv',
-    '4.F.2.e': '3.B.6.b.v',
-    '4.G': '3.D.1',
-    '4.H': '3.D.2',
-    '5': '4',
-    '5.A': '4.A',
-    '5.B': '4.B',
-    '5.C': '4.C',
-    '5.C.1': '4.C.1',
-    '5.C.2': '4.C.2',
-    '5.D': '4.D',
-    '5.D.1': '4.D.1',
-    '5.D.2': '4.D.2',
-    '5.E': '4.E',
-}
-
-# comments
-# '2.F.1.a.': included in '2.F.1.a.3', # not in climate categories
-# '2.F.1.b.': included in '2.F.1.a.2', # not in climate categories
-# '2.F.1.c.': included in '2.F.1.a.1', # not in climate categories 
-# '2.F.1.d.': included in 2.F.1.a (transport refigeration)
-# '2.F.1.e.', includeded in 2.F.1.a (stationary air conditioning)
-# '2.F.1.f.': 2.F.1.b, (mobile air conditioning) 
-#    '3.A.4.f.': included in '3.A.1.j',
-# '3.A.4.g.': included in '3.A.1.j',
-# '3.A.4.g.i.',
-# '3.A.4.g.ii.',
-
-aggregate_cats = {
-    '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-    '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.3', '3.B.4', '3.B.5', '3.B.6'], 'name': 'Land'},
-    '3.C.1': {'sources': ['3.C.1.b','3.C.1.c'], 'name': 'Emissions from Biomass Burning'},
-    '3.C.8': {'sources': ['3.C.8.a', '3.C.8.b'], 'name': 'Other'},
-    '3.C': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7', '3.C.8'], 'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-    '3.D': {'sources': ['3.D.1', '3.D.2'], 'name': 'Other'},
-    '3': {'sources': ['3.A', '3.B', '3.C', '3.D'], 'name': 'AFOLU'},
-}

+ 0 - 281
UNFCCC_GHG_data/UNFCCC_reader/Chile/read_CHL_BUR4_from_xlsx.py

@@ -1,281 +0,0 @@
-# this script reads data from Chile's 2020 national inventory which is underlying BUR4
-# Data is read from the xlsx file
-
-import os
-import sys
-import pandas as pd
-import primap2 as pm2
-
-from config_CHL_BUR4 import cat_mapping, filter_remove_IPCC2006, aggregate_cats
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from primap2.pm2io._data_reading import matches_time_format
-from primap2.pm2io._data_reading import filter_data
-
-# ###
-# configuration
-# ###
-
-# folders and files
-input_folder = downloaded_data_path / 'UNFCCC' / 'Chile' / 'BUR4'
-output_folder = extracted_data_path / 'UNFCCC' / 'Chile'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'CHL_BUR4_2021_'
-
-inventory_file = 'Inventario_Nacional_de_GEI-1990-2018.xlsx'
-years_to_read = range(1990, 2018 + 1)
-
-# configuration for conversion to PRIMAP2 data format
-unit_row = "header"
-unit_info = {
-    'regexp_entity': r'(.*)\s\(.*\)$',
-    'regexp_unit': r'.*\s\((.*)\)$',
-    'default_unit': 'kt',
-    'manual_repl_unit': {
-        'kt CO₂ eq': 'ktCO2eq',
-        'HFC (kt CO₂ eq)': 'ktCO2eq',
-        'PFC (kt CO₂ eq)': 'ktCO2eq',
-        'SF₆ (kt CO₂ eq)': 'ktCO2eq',
-    },
-    'manual_repl_entity': {
-        'kt CO₂ eq': 'KYOTOGHG (AR4GWP100)',
-        'HFC (kt CO₂ eq)': 'HFCS (AR4GWP100)',
-        'PFC (kt CO₂ eq)': 'PFCS (AR4GWP100)',
-        'SF₆ (kt CO₂ eq)': 'SF6 (AR4GWP100)',
-    }
-}
-cols_to_drop = ['Unnamed: 14', 'Unnamed: 16', 'Código IPCC.1',
-                'Categorías de fuente y sumidero de gases de efecto invernadero.1']
-# columns for category UNFCCC_GHG_data and original category name
-index_cols = ['Código IPCC', 'Categorías de fuente y sumidero de gases de efecto invernadero']
-
-# operations on long format DF
-cols_for_space_stripping = ['category', 'orig_cat_name', 'entity']
-
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_1996_Chile_NIR",
-    "scenario": "PRIMAP",
-}
-
-coords_terminologies_2006 = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "CHL-GHG-Inventory",
-    "provenance": "measured",
-    "area": "CHL",
-    "scenario": "BUR4"
-}
-
-coords_value_mapping = {
-    "entity": {
-        "COVDM": "NMVOC",
-        "CO₂ neto": "CO2",
-        "CH₄": "CH4",
-        # "HFC": "HFCS",
-        "HFC-125": "HFC125",
-        "HFC-134a": "HFC134a",
-        "HFC-143a": "HFC143a",
-        "HFC-152a": "HFC152a",
-        "HFC-227ea": "HFC227ea",
-        "HFC-23": "HFC23",
-        "HFC-236fa": "HFC236fa",
-        "HFC-245fa": "HFC245fa",
-        "HFC-32": "HFC32",
-        "HFC-365mfc": "HFC365mfc",
-        "HFC-43-10mee": "HFC4310mee",
-        "N₂O": "N2O",
-        # "PFC": "PFCS",
-        "PFC-116": "C2F6",
-        "PFC-14": "CF4",
-        "PFC-218": "C3F8",
-        # "SF₆": "SF6",
-        "SO₂": "SO2",
-    },
-    "unit": "PRIMAP1",
-}
-
-coords_value_filling = {
-    'category': {  # col to fill
-        'orig_cat_name': {  # col to fill from
-            'Todas las emisiones y las absorciones nacionales': '0',  # from value: to value
-            'Tanque internacional': 'M.BK',
-            'Aviación internacional': 'M.BK.A',
-            'Navegación internacional': 'M.BK.M',
-            'Operaciones multilaterales': 'M.MULTIOP',
-            'Emisiones de CO2 de la biomasa': 'M.BIO',
-        }
-    }
-}
-
-filter_remove = {
-    "f1": {
-        "entity": ["Absorciones CO₂", "Emisiones CO₂"],
-    },
-    "f2": {
-        "orig_cat_name": ["Partidas informativas"],
-    },
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/267936, https://snichile.mma.gob.cl/wp-content/uploads/2021/03/Inventario_Nacional_de_GEI-1990-2018.xlsx",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de.de",
-    "title": "Chile: BUR4",
-    "comment": "Read fom xlsx file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# start data reading
-# ###
-
-# change working directory to script directory for proper folder names
-script_path = os.path.abspath(sys.argv[0])
-script_dir_name = os.path.dirname(script_path)
-os.chdir(script_dir_name)
-
-df_all = None
-
-for year in years_to_read:
-    # read sheet for the year. Each sheet contains several tables,
-    # we only read the upper row as the other tables are summary tables
-    df_current = pd.read_excel(input_folder / inventory_file, sheet_name=str(year), skiprows=2, nrows=442, engine="openpyxl")
-    # drop the columns which are empty and repetition of the metadata for the second block
-    df_current.drop(cols_to_drop, axis=1, inplace=True)
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols, inplace=True)
-    # set multi-index. necessary for the stack operation in the conversion to long format
-    df_current = df_current.set_index(index_cols)
-    # add unit row using information from entity row and add to index
-    df_current = pm2.pm2io.nir_add_unit_information(df_current, unit_row=unit_row, **unit_info)
-    # actual conversion to long format
-    df_current = pm2.pm2io.nir_convert_df_to_long(df_current, year)
-    # aggregate to one df
-    if df_all is None:
-        df_all = df_current
-    else:
-        df_all = pd.concat([df_all, df_current])
-
-df_all = df_all.reset_index(drop=True)
-
-# ###
-# postprocessing
-# ###
-# strip trailing and leading spaces
-for col in cols_for_space_stripping:
-    df_all[col] = df_all[col].str.strip()
-
-df_all["category"] = df_all["category"].str.rstrip('.')
-
-data_if = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    filter_keep=filter_keep,
-    meta_data=meta_data
-)
-
-
-#conversion to PRIMAP2 native format
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)
-
-# ###
-# conversion to ipcc 2006 categories
-# ###
-
-data_if_2006 = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies_2006,
-    coords_value_mapping=coords_value_mapping,
-    coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    filter_keep=filter_keep,
-    meta_data=meta_data
-)
-
-cat_label = 'category (' + coords_terminologies_2006["category"] + ')'
-filter_data(data_if_2006, filter_remove=filter_remove_IPCC2006)
-data_if_2006 = data_if_2006.replace({cat_label: cat_mapping})
-
-# aggregate categories
-for cat_to_agg in aggregate_cats:
-    mask = data_if_2006[cat_label].isin(aggregate_cats[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity', 'unit']).sum()
-
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name", aggregate_cats[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-#conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-# convert back to IF to have units in the fixed format
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies_2006["category"]), data_if_2006)
-
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(output_folder / (output_filename + coords_terminologies_2006["category"] + ".nc"), encoding=encoding)

+ 0 - 285
UNFCCC_GHG_data/UNFCCC_reader/Chile/read_CHL_BUR5_from_xlsx.py

@@ -1,285 +0,0 @@
-# this script reads data from Chile's 2020 national inventory which is underlying BUR4
-# Data is read from the xlsx file
-
-import os
-import sys
-import pandas as pd
-import primap2 as pm2
-
-from config_CHL_BUR4 import cat_mapping, filter_remove_IPCC2006, aggregate_cats
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from primap2.pm2io._data_reading import matches_time_format
-from primap2.pm2io._data_reading import filter_data
-
-# ###
-# configuration
-# ###
-
-# folders and files
-input_folder = downloaded_data_path / 'UNFCCC' / 'Chile' / 'BUR5'
-output_folder = extracted_data_path / 'UNFCCC' / 'Chile'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'CHL_BUR5_2022_'
-
-inventory_file = '2022_GEI_CL.xlsx'
-years_to_read = range(1990, 2020 + 1)
-time_format='%Y'
-
-# configuration for conversion to PRIMAP2 data format
-unit_row = "header"
-unit_info = {
-    'regexp_entity': r'(.*)\s\(.*\)$',
-    'regexp_unit': r'.*\s\((.*)\)$',
-    'default_unit': 'kt',
-    'manual_repl_unit': {
-        'kt CO₂ eq': 'ktCO2eq',
-        'HFC (kt CO₂ eq)': 'ktCO2eq',
-        'PFC (kt CO₂ eq)': 'ktCO2eq',
-        'SF₆ (kt CO₂ eq)': 'ktCO2eq',
-    },
-    'manual_repl_entity': {
-        'kt CO₂ eq': 'KYOTOGHG (AR4GWP100)',
-        'HFC (kt CO₂ eq)': 'HFCS (AR4GWP100)',
-        'PFC (kt CO₂ eq)': 'PFCS (AR4GWP100)',
-        'SF₆ (kt CO₂ eq)': 'SF6 (AR4GWP100)',
-    }
-}
-cols_to_drop = ['Unnamed: 14', 'Unnamed: 16', 'Código IPCC.1',
-                'Categorías de fuente y sumidero de gases de efecto invernadero.1']
-# columns for category UNFCCC_GHG_data and original category name
-index_cols = ['Código IPCC', 'Categorías de fuente y sumidero de gases de efecto invernadero']
-
-# operations on long format DF
-cols_for_space_stripping = ['category', 'orig_cat_name', 'entity']
-
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_1996_Chile_NIR",
-    "scenario": "PRIMAP",
-}
-
-coords_terminologies_2006 = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "CHL-GHG-Inventory",
-    "provenance": "measured",
-    "area": "CHL",
-    "scenario": "BUR5"
-}
-
-coords_value_mapping = {
-    "entity": {
-        "COVDM": "NMVOC",
-        "CO₂ neto": "CO2",
-        "CH₄": "CH4",
-        # "HFC": "HFCS",
-        "HFC-125": "HFC125",
-        "HFC-134a": "HFC134a",
-        "HFC-143a": "HFC143a",
-        "HFC-152a": "HFC152a",
-        "HFC-227ea": "HFC227ea",
-        "HFC-23": "HFC23",
-        "HFC-236fa": "HFC236fa",
-        "HFC-245fa": "HFC245fa",
-        "HFC-32": "HFC32",
-        "HFC-365mfc": "HFC365mfc",
-        "HFC-43-10mee": "HFC4310mee",
-        "N₂O": "N2O",
-        # "PFC": "PFCS",
-        "PFC-116": "C2F6",
-        "PFC-14": "CF4",
-        "PFC-218": "C3F8",
-        # "SF₆": "SF6",
-        "SO₂": "SO2",
-    },
-    "unit": "PRIMAP1",
-}
-
-coords_value_filling = {
-    'category': {  # col to fill
-        'orig_cat_name': {  # col to fill from
-            'Todas las emisiones y las absorciones nacionales': '0',  # from value: to value
-            'Tanque internacional': 'M.BK',
-            'Aviación internacional': 'M.BK.A',
-            'Navegación internacional': 'M.BK.M',
-            'Operaciones multilaterales': 'M.MULTIOP',
-            'Emisiones de CO2 de la biomasa': 'M.BIO',
-        }
-    }
-}
-
-filter_remove = {
-    "f1": {
-        "entity": ["Absorciones CO₂", "Emisiones CO₂"],
-    },
-    "f2": {
-        "orig_cat_name": ["Partidas informativas", "Todas las emisiones nacionales"],
-    },
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/624735, https://snichile.mma.gob.cl/wp-content/uploads/2023/04/2022_GEI_CL.xlsx",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de.de",
-    "title": "Chile: BUR5",
-    "comment": "Read fom xlsx file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# start data reading
-# ###
-
-# change working directory to script directory for proper folder names
-script_path = os.path.abspath(sys.argv[0])
-script_dir_name = os.path.dirname(script_path)
-os.chdir(script_dir_name)
-
-df_all = None
-
-for year in years_to_read:
-    # read sheet for the year. Each sheet contains several tables,
-    # we only read the upper row as the other tables are summary tables
-    df_current = pd.read_excel(input_folder / inventory_file, sheet_name=str(year), skiprows=2, nrows=442, engine="openpyxl")
-    # drop the columns which are empty and repetition of the metadata for the second block
-    df_current.drop(cols_to_drop, axis=1, inplace=True)
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols, inplace=True)
-    # set multi-index. necessary for the stack operation in the conversion to long format
-    df_current = df_current.set_index(index_cols)
-    # add unit row using information from entity row and add to index
-    df_current = pm2.pm2io.nir_add_unit_information(df_current, unit_row=unit_row, **unit_info)
-    # actual conversion to long format
-    df_current = pm2.pm2io.nir_convert_df_to_long(df_current, year)
-    # aggregate to one df
-    if df_all is None:
-        df_all = df_current
-    else:
-        df_all = pd.concat([df_all, df_current])
-
-df_all = df_all.reset_index(drop=True)
-
-# ###
-# postprocessing
-# ###
-# strip trailing and leading spaces
-for col in cols_for_space_stripping:
-    df_all[col] = df_all[col].str.strip()
-
-df_all["category"] = df_all["category"].str.rstrip('.')
-
-data_if = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    filter_keep=filter_keep,
-    meta_data=meta_data,
-    time_format=time_format,
-)
-
-
-#conversion to PRIMAP2 native format
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)
-
-# ###
-# conversion to ipcc 2006 categories
-# ###
-
-data_if_2006 = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies_2006,
-    coords_value_mapping=coords_value_mapping,
-    coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    filter_keep=filter_keep,
-    meta_data=meta_data,
-    time_format=time_format
-)
-
-cat_label = 'category (' + coords_terminologies_2006["category"] + ')'
-filter_data(data_if_2006, filter_remove=filter_remove_IPCC2006)
-data_if_2006 = data_if_2006.replace({cat_label: cat_mapping})
-
-# aggregate categories
-for cat_to_agg in aggregate_cats:
-    mask = data_if_2006[cat_label].isin(aggregate_cats[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity', 'unit']).sum()
-        df_combine = df_combine.drop(columns=["category (IPCC2006_PRIMAP)", "orig_cat_name"])
-
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name", aggregate_cats[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-#conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-# convert back to IF to have units in the fixed format
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies_2006["category"]), data_if_2006)
-
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(output_folder / (output_filename + coords_terminologies_2006["category"] + ".nc"), encoding=encoding)

+ 0 - 249
UNFCCC_GHG_data/UNFCCC_reader/Colombia/read_COL_BUR3_from_xlsx.py

@@ -1,249 +0,0 @@
-# this script reads data from Colombia's BUR3
-# Data is read from the xlsx file which has been exported from the google docs
-# spreadsheet which is linked in the BUR
-
-import pandas as pd
-import primap2 as pm2
-from primap2.pm2io._data_reading import matches_time_format
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-
-# TODO: add fgases, sector summing (proc version)
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Colombia' / 'BUR3'
-output_folder = extracted_data_path / 'UNFCCC' / 'Colombia'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'COL_BUR3_2022_'
-
-inventory_file = 'TR_1990-2018_BUR3-AR5_VF.xlsx'
-years_to_read = range(1990, 2018 + 1)
-
-sheet_to_read = 'TR 1990-2018'
-cols_to_read = range(0, 47)
-
-compression = dict(zlib=True, complevel=9)
-
-unit_row = 0
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "COL-GHG-Inventory",
-    "provenance": "measured",
-    "area": "COL",
-    "scenario": "BUR3",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "entity": {
-        'Absorciones CO2': 'CO2 Absorptions',
-        'Emisiones CO2': 'CO2 Emissions',
-        'Emisiones netas (AR5GWP100)': 'KYOTOGHG (AR5GWP100)',
-        'HFC-23': 'HFC23',
-        'HFC-32': 'HFC32',
-        #'HFC-41': 'HFC41',
-        'HFC-43-10mee': 'HFC4310mee',
-        'HFC-125': 'HFC125',
-        #'HFC-134': 'HFC134',
-        'HFC-134a': 'HFC134a',
-        'HFC-152a': 'HFC152a',
-        #'HFC-143': 'HFC143',
-        'HFC-143a': 'HFC143a',
-        'HFC-227ea': 'HFC227ea',
-        'HFC-236fa': 'HFC236fa',
-        #'HFC-245ca': 'HFC245ca',
-        'HFC-245fa': 'HFC245fa',
-        'HFC-365mfc': 'HFC365mfc',
-        'PFC-116': 'C2F6',
-        'PFC-14': 'CF4',
-    },
-}
-
-
-filter_remove = {
-    "fGWP": {
-        "entity": [
-            'Absorciones CO2 (AR5GWP100)',
-            'Absorciones totales (AR5GWP100)',
-            'CH4 (AR5GWP100)',
-            'Emisiones CO2 (AR5GWP100)',
-            'Total emisiones (AR5GWP100)',
-            'HFC-125 (AR5GWP100)',
-            'HFC-134a (AR5GWP100)',
-            'HFC-143a (AR5GWP100)',
-            'HFC-152a (AR5GWP100)',
-            'HFC-227ea (AR5GWP100)',
-            'HFC-23 (AR5GWP100)',
-            'HFC-236fa (AR5GWP100)',
-            'HFC-245fa (AR5GWP100)',
-            'HFC-32 (AR5GWP100)',
-            'HFC-365mfc (AR5GWP100)',
-            'HFC-43-10mee (AR5GWP100)',
-            'N2O (AR5GWP100)',
-            'PFC-116 (AR5GWP100)',
-            'PFC-14 (AR5GWP100)',
-            'SF6 (AR5GWP100)',
-        ],
-    },
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/424157",
-    "rights": "",
-    "contact": "mail@johannes-guestchow.de",
-    "title": "Colombia. Biennial update report (BUR). BUR3",
-    "comment": "Read fom xlsx file (exported from google docs) by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
-
-
-# read the data
-data_raw = pd.read_excel(input_folder / inventory_file, sheet_name=sheet_to_read,
-                         skiprows=0, nrows=15025, usecols=cols_to_read,
-                         engine="openpyxl", header=None)
-
-# fill the units to the right as for merged cells the unit is only in the first cell
-data_raw.iloc[unit_row] = data_raw.iloc[unit_row].fillna(axis=0, method="ffill")
-merge_rows = [1, 2]
-for row in merge_rows:
-    data_raw.iloc[row] = data_raw.iloc[row].astype(str).str.replace("nan", "")
-data_raw.iloc[merge_rows[0]] = (
-data_raw.iloc[merge_rows[0]].astype(str) + " " + data_raw.iloc[
-        merge_rows[1]].astype(str))
-data_raw.iloc[merge_rows[0]] = data_raw.iloc[merge_rows[0]].str.strip()
-data_raw = data_raw.drop(index=data_raw.index[merge_rows[1]])
-
-# merge the category cols
-def join_code_parts(series):
-    code = series.iloc[0]
-    for part in series.iloc[1:]:
-        if part != "nan":
-            code = code + "." + part
-    if code == "nan":
-        code = "0"
-    return code
-
-cat_columns = [0, 1, 2, 3, 4, 5] # xlsx cols are ["MOD","CAP","CAT","SCAT","NROM",
-# "NUM"]
-data_raw["category"] = data_raw[cat_columns].astype(str).agg(func=join_code_parts,
-                                                             axis=1)
-data_raw = data_raw.drop(columns=cat_columns)
-
-# prepare the dataframe for processig with primap2 functions
-col_index = pd.MultiIndex.from_tuples(zip(data_raw.iloc[0], data_raw.iloc[1]))
-data_raw.columns = col_index
-data_raw = data_raw.drop(index=data_raw.index[0:2])
-
-data_raw = data_raw.set_index("MOD.CAP.CAT.SCAT.NROM.NUM")
-
-# loop over years to use pm2 stack operation
-years = data_raw["ANO"].unique()
-df_all = None
-for year in years:
-    data_year = data_raw[data_raw["ANO"] == year]
-    data_year = data_year.drop(columns=["ANO", "Categorías de fuente y sumideros"])
-    df_long_new = pm2.pm2io.nir_convert_df_to_long(data_year, year,
-                                                   ["category", "unit", "entity",
-                                                    "time", "data"])
-    if df_all is None:
-        df_all = df_long_new
-    else:
-        df_all = pd.concat([df_all, df_long_new], axis=0, join='outer')
-
-df_all["category"] = df_all["category"].str[0]
-
-# map units
-df_all["unit"] = df_all["unit"].replace({
-    'GEI DIRECTOS - Gg ': 'Gg',
-    'GEI DIRECTOS - Gg CO2 equivalente': 'GgCO2eq',
-}
-)
-
-# add GWP information to entity
-for entity in df_all["entity"].unique():
-    df_all["entity"][(df_all["entity"] == entity) & (
-                df_all["unit"] == "GgCO2eq")] = f"{entity} (AR5GWP100)"
-
-# reset index before conversion to pm2 IF
-df_all = df_all.reset_index(drop=True)
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True
-    )
-
-
-# combine CO2 emissions and absorptions
-data_CO2 = data_if[data_if[f"entity"].isin([
-    'CO2 Absorptions', 'CO2 Emissions'])]
-
-time_format = '%Y'
-time_columns = [
-    col
-    for col in data_CO2.columns.values
-    if matches_time_format(col, time_format)
-]
-
-for col in time_columns:
-    data_CO2[col] = pd.to_numeric(data_CO2[col], errors="coerce")
-
-data_CO2 = data_CO2.groupby(
-    by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)',
-        f"category ({coords_terminologies['category']})",
-        'unit']).sum(min_count = 1)
-
-data_CO2.insert(0, 'entity', 'CO2')
-data_CO2 = data_CO2.reset_index()
-
-data_if = pd.concat([data_if, data_CO2])
-
-
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)

+ 0 - 679
UNFCCC_GHG_data/UNFCCC_reader/Guinea/read_GIN_BUR1_from_pdf.py

@@ -1,679 +0,0 @@
-import camelot
-import primap2 as pm2
-import pandas as pd
-
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from UNFCCC_GHG_data.helper.functions import process_data_for_country
-from UNFCCC_GHG_data.helper.functions_temp import find_and_replace_values
-from config_GIN_BUR1 import coords_cols, coords_defaults, coords_terminologies
-from config_GIN_BUR1 import (
-    coords_value_mapping,
-    filter_remove,
-    meta_data,
-    page_def_templates,
-    delete_rows_by_category,
-)
-from config_GIN_BUR1 import (
-    inv_conf,
-    country_processing_step1,
-    gas_baskets,
-    replace_info,
-    replace_categories,
-    set_value,
-    delete_row,
-)
-
-# ###
-# configuration
-# ###
-
-input_folder = downloaded_data_path / "UNFCCC" / "Guinea" / "BUR1"
-output_folder = extracted_data_path / "UNFCCC" / "Guinea"
-if not output_folder.exists():
-    output_folder.mkdir()
-
-pdf_file = "Rapport_IGES-Guinee-BUR1_VF.pdf"
-output_filename = "GIN_BUR1_2023_"
-category_column = f"category ({coords_terminologies['category']})"
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# 1. Read in main tables
-# ###
-
-df_main = None
-for page in inv_conf["pages_to_read"]["main"]:
-    print("-" * 45)
-    print(f"Reading table from page {page}.")
-
-    tables_inventory_original = camelot.read_pdf(
-        str(input_folder / pdf_file),
-        pages=page,
-        table_areas=page_def_templates[page]["area"],
-        columns=page_def_templates[page]["cols"],
-        flavor="stream",
-        split_text=True,
-    )
-
-    print("Reading complete.")
-
-    df_inventory = tables_inventory_original[0].df.copy()
-
-    # set category names (they moved one row up)
-    if page in set_value["main"].keys():
-        for idx, col, value in set_value["main"][page]:
-            df_inventory.at[idx, col] = value
-    # delete empty row
-    if page in delete_row["main"].keys():
-        for idx in delete_row["main"][page]:
-            df_inventory = df_inventory.drop(index=idx)
-
-    # add header and unit
-    df_header = pd.DataFrame([inv_conf["header"], inv_conf["unit"]])
-    df_inventory = pd.concat(
-        [df_header, df_inventory], axis=0, join="outer"
-    ).reset_index(drop=True)
-    df_inventory = pm2.pm2io.nir_add_unit_information(
-        df_inventory,
-        unit_row=inv_conf["unit_row"],
-        entity_row=inv_conf["entity_row"],
-        regexp_entity=".*",
-        regexp_unit=".*",
-        default_unit="Gg",
-    )
-
-    print("Added unit information.")
-
-    # set index
-    df_inventory = df_inventory.set_index(inv_conf["index_cols"])
-
-    # convert to long format
-    df_inventory_long = pm2.pm2io.nir_convert_df_to_long(
-        df_inventory, inv_conf["year"][page], inv_conf["header_long"]
-    )
-
-    # extract category from tuple
-    df_inventory_long["orig_cat_name"] = df_inventory_long["orig_cat_name"].str[0]
-
-    # prep for conversion to PM2 IF and native format
-    df_inventory_long["category"] = df_inventory_long["orig_cat_name"]
-
-    df_inventory_long["category"] = df_inventory_long["category"].replace(
-        inv_conf["cat_codes_manual"]["main"]
-    )
-
-    df_inventory_long["category"] = df_inventory_long["category"].str.replace(".", "")
-
-    # regex replacements
-    def repl(m):
-        return m.group("code")
-
-    df_inventory_long["category"] = df_inventory_long["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_inventory_long = df_inventory_long.reset_index(drop=True)
-
-    df_inventory_long["data"] = df_inventory_long["data"].str.replace(",", ".")
-    df_inventory_long["data"] = df_inventory_long["data"].str.replace("NE1", "NE")
-
-    # make sure all col headers are str
-    df_inventory_long.columns = df_inventory_long.columns.map(str)
-    df_inventory_long = df_inventory_long.drop(columns=["orig_cat_name"])
-
-    if df_main is None:
-        df_main = df_inventory_long
-    else:
-        df_main = pd.concat(
-            [df_main, df_inventory_long],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-print("Converting to interchange format.")
-df_all_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_main,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping["main"],
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-df_all_IF = find_and_replace_values(
-    df=df_all_IF, replace_info=replace_info["main"], category_column=category_column
-)
-
-### convert to primap2 format ###
-data_pm2_main = pm2.pm2io.from_interchange_format(df_all_IF)
-
-# ###
-# 2. Read energy sector tables
-# ###
-
-df_energy = None
-for page in inv_conf["pages_to_read"]["energy"]:
-    print("-" * 45)
-    print(f"Reading table from page {page}.")
-
-    tables_inventory_original = camelot.read_pdf(
-        str(input_folder / pdf_file), pages=page, flavor="lattice", split_text=True
-    )
-
-    print("Reading complete.")
-
-    df_energy_year = pd.concat(
-        [tables_inventory_original[0].df[2:], tables_inventory_original[1].df[3:]],
-        axis=0,
-        join="outer",
-    ).reset_index(drop=True)
-
-    # TODO This step should be done in pm2.pm2io.convert_long_dataframe_if()
-    for row in delete_rows_by_category["energy"][page]:
-        row_to_delete = df_energy_year.index[df_energy_year[0] == row][0]
-        df_energy_year = df_energy_year.drop(index=row_to_delete)
-
-    # add header and unit
-    df_header = pd.DataFrame([inv_conf["header_energy"], inv_conf["unit_energy"]])
-
-    df_energy_year = pd.concat(
-        [df_header, df_energy_year], axis=0, join="outer"
-    ).reset_index(drop=True)
-
-    df_energy_year = pm2.pm2io.nir_add_unit_information(
-        df_energy_year,
-        unit_row=inv_conf["unit_row"],
-        entity_row=inv_conf["entity_row"],
-        regexp_entity=".*",
-        regexp_unit=".*",
-        default_unit="Gg",
-    )
-
-    print("Added unit information.")
-    # set index
-    df_energy_year = df_energy_year.set_index(inv_conf["index_cols"])
-
-    # convert to long format
-    df_energy_year_long = pm2.pm2io.nir_convert_df_to_long(
-        df_energy_year, inv_conf["year"][page], inv_conf["header_long"]
-    )
-
-    # extract from tuple
-    df_energy_year_long["orig_cat_name"] = df_energy_year_long["orig_cat_name"].str[0]
-
-    # prep for conversion to PM2 IF and native format
-    # make a copy of the categories row
-    df_energy_year_long["category"] = df_energy_year_long["orig_cat_name"]
-
-    # replace cat names by codes in col "category"
-    # first the manual replacements
-    df_energy_year_long["category"] = df_energy_year_long["category"].str.replace(
-        "\n", ""
-    )
-    df_energy_year_long["category"] = df_energy_year_long["category"].replace(
-        inv_conf["cat_codes_manual"]["energy"]
-    )
-
-    df_energy_year_long["category"] = df_energy_year_long["category"].str.replace(
-        ".", ""
-    )
-
-    # then the regex replacements
-    def repl(m):
-        return m.group("code")
-
-    df_energy_year_long["category"] = df_energy_year_long["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_energy_year_long = df_energy_year_long.reset_index(drop=True)
-
-    df_energy_year_long["data"] = df_energy_year_long["data"].str.replace(",", ".")
-    df_energy_year_long["data"] = df_energy_year_long["data"].str.replace("NE1", "NE")
-
-    # make sure all col headers are str
-    df_energy_year_long.columns = df_energy_year_long.columns.map(str)
-    df_energy_year_long = df_energy_year_long.drop(columns=["orig_cat_name"])
-
-    if df_energy is None:
-        df_energy = df_energy_year_long
-    else:
-        df_energy = pd.concat(
-            [df_energy, df_energy_year_long],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-print("Converting to interchange format.")
-df_energy_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_energy,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping["energy"],
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-### convert to primap2 format ###
-data_pm2_energy = pm2.pm2io.from_interchange_format(df_energy_IF)
-
-# ###
-# 3. Read in afolu table
-# ###
-
-df_afolu = None
-for page in inv_conf["pages_to_read"]["afolu"]:
-    print("-" * 45)
-    print(f"Reading table from page {page}.")
-
-    tables_inventory_original = camelot.read_pdf(
-        str(input_folder / pdf_file), pages=page, flavor="lattice", split_text=True
-    )
-    print("Reading complete.")
-
-    if page == "127":
-        # table on page 127 has one extra row at the top
-        # and one extra category 3.A.1.j
-        df_afolu_year = tables_inventory_original[0].df[3:]
-        # 3.A.1.a.i to 3.A.1.j exist twice.
-        # Rename duplicate categories in tables.
-        for index, category_name in replace_categories["afolu"]["127"]:
-            df_afolu_year.at[index, 0] = category_name
-    else:
-        # cut first two lines
-        df_afolu_year = tables_inventory_original[0].df[2:]
-        # On pages 124-126 the wrong categories are slightly different
-        for index, category_name in replace_categories["afolu"]["124-126"]:
-            df_afolu_year.at[index, 0] = category_name
-
-    # add header and unit
-    df_header = pd.DataFrame([inv_conf["header_afolu"], inv_conf["unit_afolu"]])
-
-    df_afolu_year = pd.concat(
-        [df_header, df_afolu_year], axis=0, join="outer"
-    ).reset_index(drop=True)
-
-    df_afolu_year = pm2.pm2io.nir_add_unit_information(
-        df_afolu_year,
-        unit_row=inv_conf["unit_row"],
-        entity_row=inv_conf["entity_row"],
-        regexp_entity=".*",
-        regexp_unit=".*",
-        default_unit="Gg",
-    )
-
-    print("Added unit information.")
-
-    # set index
-    df_afolu_year = df_afolu_year.set_index(inv_conf["index_cols"])
-
-    # convert to long format
-    df_afolu_year_long = pm2.pm2io.nir_convert_df_to_long(
-        df_afolu_year, inv_conf["year"][page], inv_conf["header_long"]
-    )
-
-    df_afolu_year_long["orig_cat_name"] = df_afolu_year_long["orig_cat_name"].str[0]
-
-    # prep for conversion to PM2 IF and native format
-    # make a copy of the categories row
-    df_afolu_year_long["category"] = df_afolu_year_long["orig_cat_name"]
-
-    # regex replacements
-    def repl(m):
-        return m.group("code")
-
-    df_afolu_year_long["category"] = df_afolu_year_long["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_afolu_year_long = df_afolu_year_long.reset_index(drop=True)
-
-    df_afolu_year_long["data"] = df_afolu_year_long["data"].str.replace(",", ".")
-    df_afolu_year_long["data"] = df_afolu_year_long["data"].str.replace("NE1", "NE")
-
-    # make sure all col headers are str
-    df_afolu_year_long.columns = df_afolu_year_long.columns.map(str)
-    df_afolu_year_long = df_afolu_year_long.drop(columns=["orig_cat_name"])
-
-    if df_afolu is None:
-        df_afolu = df_afolu_year_long
-    else:
-        df_afolu = pd.concat(
-            [df_afolu, df_afolu_year_long],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-print("Converting to interchange format.")
-df_afolu_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_afolu,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping["afolu"],
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-### convert to primap2 format ###
-data_pm2_afolu = pm2.pm2io.from_interchange_format(df_afolu_IF)
-
-# ###
-# 4. Read in Waste tables - pages 128, 130
-# ###
-
-# There are three tables for three years on page 128
-# and another table for the last year on page 130
-
-# read the first three tables
-page = inv_conf["pages_to_read"]["waste"][0]
-tables_inventory_original_128 = camelot.read_pdf(
-    str(input_folder / pdf_file), pages=page, flavor="lattice", split_text=True
-)
-
-# read last table
-page = inv_conf["pages_to_read"]["waste"][1]
-tables_inventory_original_130 = camelot.read_pdf(
-    str(input_folder / pdf_file), pages=page, flavor="lattice", split_text=True
-)
-
-# combine in a dict
-df_waste_years = {
-    "1990": tables_inventory_original_128[0].df,
-    "2000": tables_inventory_original_128[1].df,
-    "2010": tables_inventory_original_128[2].df,
-    "2019": tables_inventory_original_130[0].df,
-}
-
-df_waste = None
-for year in df_waste_years.keys():
-    print("-" * 45)
-    print(f"Processing table for {year}.")
-
-    df_waste_year = df_waste_years[year][2:]
-
-    # add header and unit
-    df_header = pd.DataFrame([inv_conf["header_waste"], inv_conf["unit_waste"]])
-
-    df_waste_year = pd.concat(
-        [df_header, df_waste_year], axis=0, join="outer"
-    ).reset_index(drop=True)
-
-    df_waste_year = pm2.pm2io.nir_add_unit_information(
-        df_waste_year,
-        unit_row=inv_conf["unit_row"],
-        entity_row=inv_conf["entity_row"],
-        regexp_entity=".*",
-        regexp_unit=".*",
-        default_unit="Gg",
-    )
-
-    print("Added unit information.")
-
-    # set index
-    df_waste_year = df_waste_year.set_index(inv_conf["index_cols"])
-
-    # convert to long format
-    df_waste_year_long = pm2.pm2io.nir_convert_df_to_long(
-        df_waste_year, year, inv_conf["header_long"]
-    )
-
-    df_waste_year_long["orig_cat_name"] = df_waste_year_long["orig_cat_name"].str[0]
-
-    # prep for conversion to PM2 IF and native format
-    # make a copy of the categories row
-    df_waste_year_long["category"] = df_waste_year_long["orig_cat_name"]
-
-    # regex replacements
-    def repl(m):
-        return m.group("code")
-
-    df_waste_year_long["category"] = df_waste_year_long["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_waste_year_long = df_waste_year_long.reset_index(drop=True)
-
-    df_waste_year_long["category"] = df_waste_year_long["category"].str.replace(".", "")
-    df_waste_year_long["data"] = df_waste_year_long["data"].str.replace(",", ".")
-    df_waste_year_long["data"] = df_waste_year_long["data"].str.replace("NE1", "NE")
-
-    # make sure all col headers are str
-    df_waste_year_long.columns = df_waste_year_long.columns.map(str)
-    df_waste_year_long = df_waste_year_long.drop(columns=["orig_cat_name"])
-
-    if df_waste is None:
-        df_waste = df_waste_year_long
-    else:
-        df_waste = pd.concat(
-            [df_waste, df_waste_year_long],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-print("Converting to interchange format.")
-df_waste_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_waste,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping["waste"],
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-### convert to primap2 format ###
-data_pm2_waste = pm2.pm2io.from_interchange_format(df_waste_IF)
-
-# ###
-# 5. Read in trend tables - pages 131 - 137
-# ###
-
-df_trend = None
-pages = inv_conf["pages_to_read"]["trend"]
-entities = inv_conf["entity_for_page"]["trend"]
-
-# for this set of tables every page is a different entity
-for page, entity in zip(pages, entities):
-    print("-" * 45)
-    print(f"Reading table for page {page} and entity {entity}.")
-
-    # First table must be read in with flavor="stream", as
-    # flavor="lattice" raises an error. Maybe camelot issue
-    # see https://github.com/atlanhq/camelot/issues/306,
-    # or because characters in first row almost touch
-    # the table grid.
-    if page == "131":
-        tables_inventory_original = camelot.read_pdf(
-            str(input_folder / pdf_file),
-            pages=page,
-            table_areas=page_def_templates[page]["area"],
-            columns=page_def_templates[page]["cols"],
-            flavor="stream",
-            split_text=True,
-        )
-
-        df_trend_entity = tables_inventory_original[0].df[1:]
-
-    else:
-        tables_inventory_original = camelot.read_pdf(
-            str(input_folder / pdf_file), pages=page, flavor="lattice", split_text=True
-        )
-        df_trend_entity = tables_inventory_original[0].df[3:]
-
-    print("Reading complete.")
-
-    if page in delete_rows_by_category["trend"].keys():
-        for category in delete_rows_by_category["trend"][page]:
-            row_to_delete = df_trend_entity.index[df_trend_entity[0] == category][0]
-            df_trend_entity = df_trend_entity.drop(index=row_to_delete)
-
-    df_trend_entity.columns = inv_conf["header_trend"]
-
-    df_trend_entity = df_trend_entity.copy()
-
-    # unit is always Gg
-    df_trend_entity.loc[:, "unit"] = "Gg"
-
-    # only one entity per table
-    df_trend_entity.loc[:, "entity"] = entity
-
-    df_trend_entity.loc[:, "category"] = df_trend_entity["orig_cat_name"]
-
-    df_trend_entity["category"] = df_trend_entity["category"].replace(
-        inv_conf["cat_codes_manual"]["trend"]
-    )
-
-    df_trend_entity.loc[:, "category"] = df_trend_entity["category"].str.replace(
-        ".", ""
-    )
-    df_trend_entity.loc[:, "category"] = df_trend_entity["category"].str.replace(
-        "\n", ""
-    )
-
-    def repl(m):
-        return m.group("code")
-
-    df_trend_entity.loc[:, "category"] = df_trend_entity["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_trend_entity = df_trend_entity.reset_index(drop=True)
-
-    print("Created category codes.")
-
-    for year in inv_conf["header_trend"][1:]:
-        df_trend_entity.loc[:, year] = df_trend_entity[year].str.replace(",", ".")
-        df_trend_entity.loc[:, year] = df_trend_entity[year].str.replace("NE1", "NE")
-
-    # make sure all col headers are str
-    df_trend_entity.columns = df_trend_entity.columns.map(str)
-
-    df_trend_entity = df_trend_entity.drop(columns=["orig_cat_name"])
-
-    # TODO better to use pm2.pm2io.convert_wide_dataframe_if
-    df_trend_entity_long = pd.wide_to_long(
-        df_trend_entity, stubnames="data", i="category", j="time"
-    )
-
-    print("Converted to long format.")
-
-    df_trend_entity_long = df_trend_entity_long.reset_index()
-
-    if df_trend is None:
-        df_trend = df_trend_entity_long
-    else:
-        df_trend = pd.concat(
-            [df_trend, df_trend_entity_long],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-print("Converting to interchange format.")
-
-df_trend_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_trend,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping["trend"],
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-df_trend_IF = find_and_replace_values(
-    df=df_trend_IF, replace_info=replace_info["trend"], category_column=category_column
-)
-
-### convert to primap2 format ###
-data_pm2_trend = pm2.pm2io.from_interchange_format(df_trend_IF)
-
-# ###
-# Combine tables
-# ###
-
-# merge main and energy
-# There are discrepancies larger than 0.86 for area category 1.A.2, entity NMVOC,
-# years 1990, 2000, 2010, 2019
-# It is assumed the main table has the correct values.
-print("Merging main and energy table.")
-data_pm2 = data_pm2_main.pr.merge(data_pm2_energy, tolerance=1)
-
-# merge afolu
-print("Merging afolu table.")
-data_pm2 = data_pm2.pr.merge(data_pm2_afolu, tolerance=0.11)
-
-# merge waste
-# increasing tolerance to merge values for 4.C, 1990, N2O - 0.003 in sector table, 0.0034 in main table
-print("Merging waste table.")
-data_pm2 = data_pm2.pr.merge(data_pm2_waste, tolerance=0.15)
-
-# merge trend
-print("Merging trend table.")
-data_pm2 = data_pm2.pr.merge(data_pm2_trend, tolerance=0.11)
-
-# convert back to IF to have units in the fixed format ( per year / per a / per annum)
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# Save raw data to IF and native format.
-# ###
-
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_if,
-)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding,
-)
-
-# ###
-# Processing
-# ###
-
-data_proc_pm2 = process_data_for_country(
-    data_country=data_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    filter_dims=None,  # leaving this explicit for now
-    cat_terminology_out=None,
-    category_conversion=None,
-    sectors_out=None,
-    processing_info_country=country_processing_step1,
-)
-
-# ###
-# save processed data to IF and native format
-# ###
-
-terminology_proc = coords_terminologies["category"]
-
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if
-)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"), encoding=encoding
-)
-
-print("Saved processed data.")

+ 0 - 377
UNFCCC_GHG_data/UNFCCC_reader/Indonesia/read_IDN_BUR3_from_pdf.py

@@ -1,377 +0,0 @@
-# this script reads data from Indonesia's BUR3
-# Data is read from pdf
-# only the 2019 inventory is read as the BUR refers to BUR2 for earlier years
-
-import pandas as pd
-import primap2 as pm2
-import camelot
-import numpy as np
-from primap2.pm2io._data_reading import matches_time_format
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Indonesia' / 'BUR3'
-output_folder = extracted_data_path / 'UNFCCC' / 'Indonesia'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'IDN_BUR3_2021_'
-
-inventory_file = 'IndonesiaBUR_3_FINAL_REPORT_2.pdf'
-
-gwp_to_use = 'SARGWP100'
-
-pages_to_read = range(61,65) # 65 is not read properly but contains almost no data anyway, so add it by hand '61-65'
-
-compression = dict(zlib=True, complevel=9)
-
-year = 2019
-entity_row = 0
-unit_row = 1
-index_cols = "Categories"
-# special header as category UNFCCC_GHG_data and name in one column
-header_long = ["orig_cat_name", "entity", "unit", "time", "data"]
-
-
-# manual category codes
-cat_codes_manual = {
-    'Total National Emissions and Removals': '0',
-    'Peat Decomposition': 'M.3.B.4.APD',
-    'Peat Fire': 'M.3.B.4.APF',
-    '4A1.2 Industrial Solid Waste Disposal': 'M.4.A.Ind',
-    #'3A2b Direct N2O Emissions from Manure Management': '3.A.2',
-}
-
-cat_code_regexp = r'(?P<code>^[a-zA-Z0-9]{1,4})\s.*'
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "IDN-GHG-Inventory",
-    "provenance": "measured",
-    "area": "IDN",
-    "scenario": "BUR3",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-    "entity": {
-        'Total 3 Gases': f"CO2CH4N2O ({gwp_to_use})",
-        'Net CO2 (1) (2)': 'CO2',
-        'CH4': f"CH4 ({gwp_to_use})",
-        'N2O': f"N2O ({gwp_to_use})",
-        'HFCs': f"HFCS ({gwp_to_use})",
-        'PFCs': f"PFCS ({gwp_to_use})",
-        'SF6': f"SF6 ({gwp_to_use})",
-        'NOx': 'NOX',
-        'CO': 'CO', # no mapping, just added for completeness here
-        'NMVOCs': 'NMVOC',
-        'SO2': 'SO2', # no mapping, just added for completeness here
-        'Other halogenated gases with CO2 equivalent conversion factors (3)': f"OTHERHFCS ({gwp_to_use})",
-    },
-}
-
-
-filter_remove = {
-    "fHFC": {"entity": 'Other halogenated gases without CO2 equivalent conversion factors (4)'}
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/403577",
-    "rights": "",
-    "contact": "mail@johannes-guestchow.de",
-    "title": "Indonesia. Biennial update report (BUR). BUR3",
-    "comment": "Read fom pdf by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
-
-# convert to mass units where possible
-entities_to_convert_to_mass = [
-    'CH4', 'N2O', 'SF6'
-]
-
-# CO2 equivalents don't make sense for these substances, so unit has to be Gg instead of Gg CO2 equivalents as indicated in the table
-entities_to_fix_unit = [
-    'NOx', 'CO', 'NMVOCs', 'SO2'
-]
-
-# add the data for the last page by hand as it's only one row
-data_last_page = [
-    ['5B Other (please specify)', 'Total 3 Gases', 'GgCO2eq', '2019', 'NE'],
-    ['5B Other (please specify)', 'Net CO2 (1) (2)', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'CH4', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'N2O', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'HFCs', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'PFCs', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'SF6', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'Other halogenated gases with CO2 equivalent conversion factors (3)', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'Other halogenated gases without CO2 equivalent conversion factors (4)', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'NOx', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'CO', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'NMVOCs', 'GgCO2eq', '2019', np.nan],
-    ['5B Other (please specify)', 'SO2', 'GgCO2eq', '2019', np.nan],
-]
-
-df_last_page = pd.DataFrame(data_last_page, columns=header_long)
-
-aggregate_cats = {
-    '1.A.4': {'sources': ['1.A.4.a', '1.A.4.b'], 'name': 'Other Sectors (calculated)'},
-    '2.A.4': {'sources': ['2.A.4.a', '2.A.4.b', '2.A.4.d'], 'name': 'Other Process uses of Carbonates (calculated)'},
-    '2.B.8': {'sources': ['2.B.8.a', '2.B.8.b', '2.B.8.c', '2.B.8.f'], 'name': 'Petrochemical and Carbon Black production (calculated)'},
-    '4.A': {'sources': ['4.A.2', 'M.4.A.Ind'], 'name': 'Solid Waste Disposal (calculated)'},
-}
-
-aggregate_cats_N2O = {
-    '3.A.2': {'sources': ['3.A.2.b'], 'name': '3A2 Manure Management'},
-    '3.A': {'sources': ['3.A.2'], 'name': '3A Livestock'},
-}
-
-aggregate_cats_CO2CH4N2O = {
-    '3.A.2': {'sources': ['3.A.2', '3.A.2.b'], 'name': '3A2 Manure Management'},
-}
-
-df_all = None
-
-for page in pages_to_read:
-    tables = camelot.read_pdf(str(input_folder / inventory_file), pages=str(page),
-                              flavor='lattice')
-    df_this_table = tables[0].df
-    # replace line breaks, double, and triple spaces in category names
-    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("\n", " ")
-    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("   ", " ")
-    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("  ", " ")
-    # replace line breaks in units and entities
-    df_this_table.iloc[entity_row] = df_this_table.iloc[entity_row].str.replace('\n',
-                                                                                '')
-    df_this_table.iloc[unit_row] = df_this_table.iloc[unit_row].str.replace('\n', '')
-
-    df_this_table = pm2.pm2io.nir_add_unit_information(df_this_table, unit_row=unit_row,
-                                                       entity_row=entity_row,
-                                                       regexp_entity=".*",
-                                                       default_unit="GgCO2eq")  # , **unit_info)
-
-    # set index and convert to long format
-    df_this_table = df_this_table.set_index(index_cols)
-    df_this_table_long = pm2.pm2io.nir_convert_df_to_long(df_this_table, year,
-                                                          header_long)
-    df_this_table_long["orig_cat_name"] = df_this_table_long["orig_cat_name"].str[0]
-
-    # combine with tables for other sectors (merge not append)
-    if df_all is None:
-        df_all = df_this_table_long
-    else:
-        df_all = pd.concat([df_all, df_this_table_long], axis=0, join='outer')
-
-# add the last page manually
-df_all = pd.concat([df_all, df_last_page], axis=0, join='outer')
-
-# fix the units of aerosols and precursors
-for entity in entities_to_fix_unit:
-    df_all["unit"][df_all["entity"] == entity] = "Gg"
-
-# make a copy of the categories row
-df_all["category"] = df_all["orig_cat_name"]
-
-# replace cat names by codes in col "category"
-# first the manual replacements
-df_all["category"] = df_all["category"].replace(cat_codes_manual)
-# then the regex replacements
-repl = lambda m: m.group('code')
-df_all["category"] = df_all["category"].str.replace(cat_code_regexp, repl, regex=True)
-df_all = df_all.reset_index(drop=True)
-
-###### convert to primap2 IF
-
-# replace "," with "" in data
-df_all.loc[:, "data"] = df_all.loc[:, "data"].str.replace(',','', regex=False)
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True
-    )
-
-cat_label = "category (IPCC2006)"
-
-# fix error cats
-data_if[cat_label] = data_if[cat_label].str.replace("error_", "")
-
-# aggregate categories
-attrs = data_if.attrs
-for cat_to_agg in aggregate_cats:
-    mask = data_if[cat_label].isin(aggregate_cats[cat_to_agg]["sources"])
-    df_test = data_if[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name", aggregate_cats[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        data_if = pd.concat([data_if, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-
-# delete cat 3 for N2O as it's wrong
-index_3A_N2O = data_if[(data_if[cat_label] == '3') &
-                       (data_if['entity'] == 'N2O')].index
-data_if = data_if.drop(index_3A_N2O)
-
-# aggregate cat 3 for N2O
-for cat_to_agg in aggregate_cats_N2O:
-    mask = data_if[cat_label].isin(aggregate_cats_N2O[cat_to_agg]["sources"])
-    df_test = data_if[mask]
-    df_test = df_test[df_test["entity"] == "N2O"]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name", aggregate_cats_N2O[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        data_if = pd.concat([data_if, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# delete cat 3.A.2 for CO2CH4N2O as it's wrong
-index_3A2_CO2CH4N2O = data_if[(data_if[cat_label] == '3.A.2') &
-                       (data_if['entity'] == 'CH4CO2N2O (SARGWP100)')].index
-data_if = data_if.drop(index_3A2_CO2CH4N2O)
-
-# aggregate cat 3 for N2O
-for cat_to_agg in aggregate_cats_CO2CH4N2O:
-    mask = data_if[cat_label].isin(aggregate_cats_CO2CH4N2O[cat_to_agg]["sources"])
-    df_test = data_if[mask]
-    df_test = df_test[df_test["entity"] == "CO2CH4N2O (SARGWP100)"]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name", aggregate_cats_CO2CH4N2O[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        data_if = pd.concat([data_if, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-
-data_if.attrs = attrs
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-# # convert to mass units from CO2eq
-# entities_to_convert = [f"{entity} ({gwp_to_use})" for entity in
-#                        entities_to_convert_to_mass]
-#
-# for entity in entities_to_convert:
-#     converted = data_pm2[entity].pr.convert_to_mass()
-#     basic_entity = entity.split(" ")[0]
-#     converted = converted.to_dataset(name=basic_entity)
-#     data_pm2 = data_pm2.pr.merge(converted)
-#     data_pm2[basic_entity].attrs["entity"] = basic_entity
-#
-# # drop the GWP data
-# data_pm2 = data_pm2.drop_vars(entities_to_convert)
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + ".nc"),
-    encoding=encoding)

+ 0 - 430
UNFCCC_GHG_data/UNFCCC_reader/Israel/config_ISR_BUR2.py

@@ -1,430 +0,0 @@
-#### configuration for trend tables
-import locale
-gwp_to_use = 'SARGWP100'
-terminology_proc = 'IPCC2006_PRIMAP'
-# bunkers [0,1] need different specs
-trend_table_def = {
-    # only GHG read, rest dropped
-    'GHG': {
-        'tables': [2],
-        'cols_add': {
-            'unit': 'ktCO2eq',
-            'category': '0',
-        },
-        'given_col': 'entity',
-        'take_only': ['Total GHG'],
-    },
-    'CO2': {
-        'tables': [3],
-        'cols_add': {
-            'unit': 'kt',
-            'entity': 'CO2',
-        },
-        'given_col': 'category',
-    },
-    'CH4': {
-        'tables': [5],
-        'cols_add': {
-            'unit': 'kt',
-            'entity': 'CH4',
-        },
-        'given_col': 'category',
-        'take_only': [
-            'Total emissions', 'From fuel combustion',
-            'From Industrial processes', 'From Agriculture'
-        ], # ignore the waste time series as they don't cover the full sector
-        # and lead to problems becaus eof the methodology chnage in the inventory
-    },
-    'N2O': {
-        'tables': [6],
-        'cols_add': {
-            'unit': 'kt',
-            'entity': 'N2O',
-        },
-        'given_col': 'category',
-    },
-    'FGases': {
-        'tables': [7],
-        'cols_add': {
-            'unit': 'ktCO2eq',
-            'category': '0',
-        },
-        'given_col': 'entity',
-    },
-}
-
-#### configuration for inventory tables
-inv_tab_conf = {
-    'unit_row': 0,
-    'entity_row': 0,
-    'regex_unit': r"\((.*)\)",
-    'regex_entity': r"^(.*)\s\(",
-    'index_cols': 'category',
-    'cat_pos': (0, 0),
-    'header_long': ["category", "entity", "unit", "time", "data"],
-    'header_2010': ["2010", "CO2 emissions (Gg)", "CO2 removals (Gg)",
-                  "CH4 (Gg)", "N2O (Gg)", "CO (Gg)", "NOx (Gg)",
-                  "NMVOCs (Gg)", "SOx (Gg)", "SF6 (CO2eq Gg)",
-                  "HFCs (CO2eq Gg)", "PFCs (CO2eq Gg)"],
-    'unit_repl': {
-        "SF6 (CO2e Gg)": "GgCO2eq",
-        "HFCs (CO2eGg)": "GgCO2eq",
-        "PFCs (CO2e Gg)": "GgCO2eq",
-        "SF6 (CO2eq Gg)": "GgCO2eq",
-        "HFCs (CO2eq Gg)": "GgCO2eq",
-        "PFCs (CO2eq Gg)": "GgCO2eq",
-    },
-}
-
-inv_table_def = {
-    '1996': {'tables': [1, 2]},
-    '2000': {'tables': [3, 4]},
-    '2005': {'tables': [5, 6]},
-    '2010': {'tables': [7, 8]},
-    '2015': {'tables': [9, 10, 11]},
-    '2019': {'tables': [12, 13, 14]},
-    '2020': {'tables': [15, 16]},
-}
-
-#### configuration for PM2 format
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "BURDI_ISRBUR2",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "ISR-GHG-Inventory",
-    "provenance": "measured",
-    "area": "ISR",
-    "scenario": "BUR2",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": {
-        'Total national emissions and removals': '24540',
-        '0': '24540', # no mapping, just for completeness
-        'Total emissions and removals': '24540',
-        'Total emissions': '24540',
-        '1. Energy': '1',
-        'A. Fuel combustion (sectoral approach)': '1.A',
-        'A. From fuel combustion': '1.A',
-        'From fuel combustion': '1.A',
-        '1. Energy industries': '1.A.1',
-        '2. Manufacturing industries and construction': '1.A.2',
-        '2. Manufacturing, industries and construction': '1.A.2',
-        '3. Transport': '1.A.3',
-        '4. Other sectors': '1.A.4',
-        '4. Other': '1.A.4',
-        'Commercial, institutional residential sectors': '1.A.4.ab', # not BURDI
-        'Commercial, institutional': '1.A.4.a', #not BURDI
-        'residential sectors': '1.A.4.b', #not BURDI
-        'Agriculture, forestry and fishing': '1.A.4.c', # not BURDI
-        '5. Other (please specify)': '1.A.5',
-        'B. Fugitive emissions from fuels': '1.B',
-        '1. Solid fuels': '1.B.1',
-        '2. Oil and natural gas': '1.B.2',
-        '2. Industrial processes': '2',
-        'B. industrial processes': '2',
-        'From Industrial processes': '2',
-        'A. Mineral products': '2.A',
-        'CEMENT PRODUCTION': '2.A.1',
-        'PRODUCTION OF LIME': '2.A.2',
-        'SODA ASH USE': '2.A.4.b',
-        'ROAD PAVING WITH ASPHALT': '2.A.6',
-        'Container Glass': '2.A.7.a',
-        'B. Chemical industry': '2.B',
-        'NITRIC ACID PRODUCTION': '2.B.2',
-        'Ethylene': '2.B.5.b',
-        'PRODUCTION OF OTHER CHEMICALS': '2.B.5.g', #not BURDI
-        'Sulphuric Acid': '2.B.5.f', #not BURDI
-        'C. Metal production': '2.C',
-        'D. Other production': '2.D',
-        'E. Production of halocarbons and sulphur hexafluoride': '2.E',
-        'F. Consumption of halocarbons and sulphur hexafluoride': '2.F',
-        'G. Other (IPPU)': '2.G',
-        '3. Solvent and other product use': '3',
-        '4. Agriculture': '4',
-        'From Agriculture': '4',
-        'From agriculture': '4',
-        'A. Enteric fermentation': '4.A',
-        'B. Manure management': '4.B',
-        'C. Rice cultivation': '4.C',
-        'D. Agricultural soils': '4.D',
-        'E. Prescribed burning of savannahs': '4.E',
-        'F. Field burning of agricultural residues': '4.F',
-        'G. Other (Agri)': '4.G',
-        '5. Land-use change and forestry': '5',
-        'C. Land-use change and forestry': '5',
-        'A. Changes in forest and other woody biomass stocks': '5.A',
-        '2. Changes in forest and other woody biomass stocks': '5.A',
-        'B. Forest and grassland conversion': '5.B',
-        'C. Abandonment of managed lands': '5.C',
-        'D. CO2 emissions and removals from soil': '5.D',
-        '1. CO2 emissions and removals from soil': '5.D',
-        'E. Other (LULUCF)': '5.E',
-        # waste in 2006 categories, not BURDI as we will lose info of we map to BURDI and back
-        '6. Waste': '6',
-        'A. Solid waste disposal on land': '6.A',
-        'From solid waste disposal on land': '6.A',
-        'B. Waste-water handling': '6X.B', # combine with 6.D
-        'From waste-water treatment': '6X.B', # not BURDI
-        'C. Waste incineration': '6.C',
-        'D. Other (please specify)': '6X.D', # combine with 6.E
-        'B. Biological Treatment of Solid Waste': '6.B', # not BURDI
-        'D.Waste-water handling': '6.D', # not BURDI
-        'D. Waste-water handling': '6.D', # not BURDI
-        'E. Other (Waste)': '6.E', # not BURDI
-        '7. Other (please specify)': '7',
-        'International bunkers': '14637',
-        'Aviation': '14424',
-        'Marine': '14423',
-        'CO2 emissions from biomass': '14638',
-    },
-    "entity": {
-        'Total GHG': f'KYOTOGHG ({gwp_to_use})',
-        'Carbon Dioxide (CO2)': 'CO2',
-        'CO2': 'CO2', # no mapping, just added for completeness here
-        'CO2 emissions': 'CO2 emissions', # no mapping, just added for completeness here
-        'CO2 removals': 'CO2 removals', # no mapping, just added for completeness here
-        'CO2 Emissions': 'CO2 emissions',
-        'CO2 Removals': 'CO2 removals',
-        'Methane (CH4)': 'CH4',
-        'CH4': 'CH4', # no mapping, just added for completeness here
-        'Nitrous Oxides (N2O)': 'N2O',
-        'NO2': 'NO2', # no mapping, just added for completeness here
-        'Sulfur hexafluoride (SF6)': f'SF6 ({gwp_to_use})',
-        'SF6': f'SF6 ({gwp_to_use})',
-        "Hydrofluorocarbons (HFC'S)": f'HFCS ({gwp_to_use})',
-        "HFCs": f'HFCS ({gwp_to_use})',
-        "Perfluorocarbons (PFC'S)": f'PFCS ({gwp_to_use})',
-        "PFCs": f'PFCS ({gwp_to_use})',
-        'NOx': 'NOX',
-        'Nox': 'NOX',
-        'Co': 'CO',
-        'CO': 'CO', # no mapping, just added for completeness here
-        'NMVOCs': 'NMVOC',
-        'SOx': 'SOX', # no mapping, just added for completeness here
-    },
-}
-
-filter_remove = {
-    'rem_cat': {'category': ['Memo items', 'G. Other (please specify)']},
-    #'rem_ent': {'entity': ['GHG per capita', 'GHG per GDP (2015 prices)']},
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/627150",
-    "rights": "",
-    "contact": "mail@johannes-guestchow.de",
-    "title": "Israel. Biennial update report (BUR). BUR2",
-    "comment": "Read fom pdf by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
-
-#### for processing
-# aggregate categories
-cats_to_agg = {
-    '1': {'sources': ['1.A'], 'name': 'Energy'}, # for trends
-    '1.A.4': {'sources': ['1.A.4.a', '1.A.4.b', '1.A.4.c', '1.A.4.ab'],
-              'name': 'Other sectors'},
-    '2.A.4': {'sources': ['2.A.4.b'], 'name': 'Soda Ash'},
-    '2.A.7': {'sources': ['2.A.7.a'], 'name': 'Other'},
-    '2.A': {'sources': ['2.A.1', '2.A.2', '2.A.4', '2.A.6', '2.A.7'], 'name': 'Mineral Products'},
-    '2.B.5': {'sources': ['2.B.5.f', '2.B.5.g'], 'name': 'Other'},
-    '2.B': {'sources': ['2.B.2', '2.B.5'], 'name': 'Chemical Industry'},
-    '6.D': {'sources': ['6.D', '6X.B'], 'name': 'Wastewater Treatment and Discharge'},
-    #'6.E': {'sources': ['6.E', '6X.D'], 'Other'}, # currently empty
-}
-
-# downscale
-# 1.A.4.ab
-downscaling = {
-    'sectors': {
-        '24540': {
-            'basket': '24540',
-            'basket_contents': ['2'],
-            'entities': ['SF6', 'HFCS (SARGWP100)', 'PFCS (SARGWP100)'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-        '1.A': {
-            'basket': '1.A',
-            'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-            'entities': ['CO2', 'CH4', 'N2O'],
-            'dim': f"category ({coords_terminologies['category']})",
-            'tolerance': 0.05, # some inconsistencies (rounding?)
-        },
-        '1.A.4.ab': {
-            'basket': '1.A.4.ab',
-            'basket_contents': ['1.A.4.a', '1.A.4.b'],
-            'entities': ['CO2', 'CH4', 'N2O', 'SOX', 'NOX', 'CO'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-        '1.A.4': {
-            'basket': '1.A.4',
-            'basket_contents': ['1.A.4.a', '1.A.4.b', '1.A.4.c'],
-            'entities': ['CO2', 'CH4', 'N2O'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-        '2': {
-            'basket': '2',
-            'basket_contents': ['2.A', '2.B', '2.F'],
-            'entities': ['CO2', 'CH4', 'N2O', 'SF6', 'PFCS (SARGWP100)', 'HFCS (SARGWP100)'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-        '2.A': {
-            'basket': '2.A',
-            'basket_contents': ['2.A.1', '2.A.2', '2.A.4', '2.A.7'],
-            'entities': ['CO2', 'CH4', 'N2O'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-        '2.B': {
-            'basket': '2.B',
-            'basket_contents': ['2.B.2', '2.B.5'],
-            'entities': ['CO2', 'CH4', 'N2O'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-        '4': {
-            'basket': '4',
-            'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F', '4.G'],
-            'entities': ['CH4', 'N2O'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-        '5': {
-            'basket': '5',
-            'basket_contents': ['5.A', '5.D'], # the other sectors are 0
-            'entities': ['CO2'],
-            'dim': f"category ({coords_terminologies['category']})",
-        },
-    },
-}
-
-# map to IPCC2006
-cat_conversion = {
-    # ANNEXI to come (low priority as we read from CRF files)
-    'mapping': {
-        '1': '1',
-        '1.A': '1.A',
-        '1.A.1': '1.A.1',
-        '1.A.2': '1.A.2',
-        '1.A.3': '1.A.3',
-        '1.A.4': '1.A.4',
-        '1.A.4.a': '1.A.4.a',
-        '1.A.4.b': '1.A.4.b',
-        '1.A.4.c': '1.A.4.c',
-        '1.A.5': '1.A.5', # currently not needed
-        '1.B': '1.B', # currently not needed
-        '1.B.1': '1.B.1', # currently not needed
-        '1.B.2': '1.B.2', # currently not needed
-        '2': '2',
-        '2.A': '2.A',
-        '2.A.1': '2.A.1', # cement
-        '2.A.2': '2.A.2', # lime
-        '2.A.4': '2.A.4.b', # soda ash
-        '2.A.6': '2.A.5', # road paving with asphalt -> other
-        '2.A.7.a': '2.A.3', # glass
-        '2.B': 'M.2.B_2.B',
-        '2.B.2': '2.B.2', # nitric acid
-        '2.B.5.b': '2.B.8.b', # Ethylene
-        '2.B.5.f': 'M.2.B.10.a', # sulphuric acid
-        '2.B.5.g': 'M.2.B.10.b', # other chemicals
-        '2.C': '2.C',
-        '2.D': 'M.2.H.1_2',
-        '2.E': '2.B.9',
-        '2.F': '2.F',
-        '2.G': '2.H.3',
-        '4': 'M.AG',
-        '4.A': '3.A.1',
-        '4.B': '3.A.2',
-        '4.C': '3.C.7',
-        '4.D': 'M.3.C.45.AG',
-        '4.E': '3.C.1.c',
-        '4.F': '3.C.1.b',
-        '4.G': '3.C.8',
-        '5': 'M.LULUCF',
-        '6': '4',
-        '6.A': '4.A',
-        '6.B': '4.B',
-        '6.C': '4.C',
-        '6.D': '4.D',
-        '24540': '0',
-        '15163': 'M.0.EL',
-        '14637': 'M.BK',
-        '14424': 'M.BK.A',
-        '14423': 'M.BK.M',
-        '14638': 'M.BIO',
-        '7': '5',
-    }, #5.A-D ignored as not fitting 2006 cats
-
-    'aggregate': {
-        '2.A.4': {'sources': ['2.A.4.b'], 'name': 'Other uses of soda ashes'},
-        '2.B.8': {'sources': ['2.B.8.b'], 'name': 'Petrochemical and Carbon Black production'},
-        '2.B.10': {'sources': ['M.2.B.10.a', 'M.2.B.10.b'], 'name': 'Other'},
-        '2.B': {'sources': ['2.B.2', '2.B.8', '2.B.9', '2.B.10'], 'name': 'Chemical Industry'},
-        '2.H': {'sources': ['M.2.H.1_2', '2.H.3'], 'name': 'Other'},
-        # '2': {'sources': ['2.A', '2.B', '2.C', '2.F', '2.H'],
-        #       'name': 'Industrial Processes and Product Use'},
-        '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-        '3.C.1': {'sources': ['3.C.1.b', '3.C.1.c'],
-                     'name': 'Emissions from biomass burning'},
-        'M.3.C.1.AG': {'sources': ['3.C.1.b', '3.C.1.c'],
-                     'name': 'Emissions from biomass burning (Agriculture)'},
-        '3.C': {'sources': ['3.C.1', 'M.3.C.45.AG', '3.C.7', '3.C.8'],
-                     'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-        'M.3.C.AG': {'sources': ['M.3.C.1.AG', 'M.3.C.45.AG', '3.C.7', '3.C.8'],
-                     'name': 'Aggregate sources and non-CO2 emissions sources on land ('
-                             'Agriculture)'},
-        'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock'},
-        '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'], 'name': 'National total '
-                                                                    'excluding LULUCF'},
-    },
-}
-
-sectors_to_save = [
-    '1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4', '1.A.4.a', '1.A.4.b', '1.A.4.c',
-    '1.A.5',
-    '1.B', '1.B.1', '1.B.2',
-    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4', '2.A.5',
-    '2.B', '2.B.2', '2.B.8', '2.B.9', '2.B.10', '2.C', '2.F', '2.H',
-    '3', 'M.AG', '3.A', '3.A.1', '3.A.2',
-    '3.C', '3.C.1', 'M.3.C.1.AG', '3.C.7', 'M.3.C.45.AG', '3.C.8', 'M.3.C.AG',
-    'M.LULUCF', 'M.AG.ELV',
-    '4', '4.A', '4.B', '4.C', '4.D',
-    '0', 'M.0.EL', 'M.BK', 'M.BK.A', 'M.BK.M', 'M.BIO', '5']
-
-
-# gas baskets
-gas_baskets = {
-    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
-    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR5GWP100)': ['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR6GWP100)': ['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
-    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
-    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
-    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
-    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
-}
-
-basket_copy = {
-    'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
-    'entities': ["HFCS", "PFCS"],
-    'source_GWP': gwp_to_use,
-}
-
-#### functions
-def is_int(input: str) -> bool:
-    try:
-        locale.atoi(input)
-        return True
-    except:
-        return False

+ 0 - 301
UNFCCC_GHG_data/UNFCCC_reader/Israel/read_ISR_BUR2_from_pdf.py

@@ -1,301 +0,0 @@
-# read Israel's BUR2 from pdf
-
-# TODO: bunkers trend tables not read because of special format
-
-from UNFCCC_GHG_data.helper import process_data_for_country, GWP_factors
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-import camelot
-import primap2 as pm2
-import pandas as pd
-import locale
-
-# configuration import
-from config_ISR_BUR2 import trend_table_def, gwp_to_use
-from config_ISR_BUR2 import inv_tab_conf, inv_table_def
-from config_ISR_BUR2 import coords_cols, coords_terminologies, coords_defaults, \
-    coords_value_mapping, filter_remove, filter_keep, meta_data
-from config_ISR_BUR2 import cat_conversion, sectors_to_save, downscaling, \
-    cats_to_agg, gas_baskets, terminology_proc
-from config_ISR_BUR2 import is_int, basket_copy
-
-### genral configuration
-input_folder = downloaded_data_path / 'UNFCCC' / 'Israel' / 'BUR2'
-output_folder = extracted_data_path / 'UNFCCC' / 'Israel'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'ISR_BUR2_2021_'
-inventory_file_pdf = '2nd_Biennial_Update_Report_2021_final.pdf'
-#years_to_read = range(1990, 2018 + 1)
-pages_to_read_trends = range(48, 54)
-pages_to_read_inventory = range(54, 66)
-
-# define locale to use for str to float conversion
-locale_to_use = 'en_IL.UTF-8'
-locale.setlocale(locale.LC_NUMERIC, locale_to_use)
-
-compression = dict(zlib=True, complevel=9)
-
-#### trend tables
-
-# read
-tables_trends = camelot.read_pdf(str(input_folder / inventory_file_pdf), pages=','.join(
-    [str(page) for page in pages_to_read_trends]), flavor='lattice')
-
-# convert to pm2
-table_trends = None
-for table in trend_table_def.keys():
-    current_def = trend_table_def[table]
-    new_table = None
-    for subtable in current_def['tables']:
-        if new_table is None:
-            new_table = tables_trends[subtable].df
-        else:
-            new_table = pd.concat([new_table, tables_trends[subtable].df])
-
-    for col in new_table.columns.values:
-        new_table[col] = new_table[col].str.replace("\n", "")
-
-    new_table.iloc[0, 0] = current_def['given_col']
-    new_table.columns = new_table.iloc[0]
-    new_table = new_table.drop(labels=[0])
-    new_table = new_table.reset_index(drop=True)
-
-    if 'take_only' in current_def.keys():
-        new_table = new_table[
-            new_table[current_def['given_col']].isin(current_def['take_only'])]
-
-    time_cols = [col for col in new_table.columns.values if is_int(col)]
-    for col in time_cols:
-        # no NE,NA etc, just numbers, so we can just remove the ','
-        new_table[col] = new_table[col].str.replace(',', '')
-        new_table[col] = new_table[col].str.replace(' ', '')
-
-    for col in current_def['cols_add']:
-        new_table[col] = current_def['cols_add'][col]
-
-    if table_trends is None:
-        table_trends = new_table
-    else:
-        table_trends = pd.concat([table_trends, new_table])
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if_trends = pm2.pm2io.convert_wide_dataframe_if(
-    table_trends,
-    coords_cols=coords_cols,
-    # add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    # coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    # filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format='%Y'
-)
-
-
-data_pm2_trends = pm2.pm2io.from_interchange_format(data_if_trends)
-
-#### inventory tables
-# read inventory tables
-tables_inv = camelot.read_pdf(
-    str(input_folder / inventory_file_pdf),
-    pages=','.join([str(page) for page in pages_to_read_inventory]),
-    flavor='lattice')
-
-# process
-table_inv = None
-for table in inv_table_def.keys():
-    new_table = None
-    print(f"working on year {table}")
-    for subtable in inv_table_def[table]['tables']:
-        print(f"adding table {subtable}")
-        if new_table is None:
-            new_table = tables_inv[subtable].df
-        else:
-            new_table = pd.concat([new_table, tables_inv[subtable].df], axis=0,
-                                  join='outer')
-            new_table = new_table.reset_index(drop=True)
-
-        # replace line breaks, double, and triple spaces in category names
-        new_table.iloc[:, 0] = new_table.iloc[:, 0].str.replace("\n", " ")
-        new_table.iloc[:, 0] = new_table.iloc[:, 0].str.replace("   ", " ")
-        new_table.iloc[:, 0] = new_table.iloc[:, 0].str.replace("  ", " ")
-
-    if table == "2010":
-        # table has a broken header. use last one
-        new_table.iloc[inv_tab_conf["entity_row"]] = inv_tab_conf["header_2010"]
-    else:
-        # replace line breaks in units and entities
-        new_table.iloc[inv_tab_conf["entity_row"]] = new_table.iloc[
-            inv_tab_conf["entity_row"]].str.replace('\n', '')
-
-    # get_year
-    year = new_table.iloc[inv_tab_conf["cat_pos"][0], inv_tab_conf["cat_pos"][1]]
-
-    # set category col label
-    new_table.iloc[inv_tab_conf["cat_pos"][0], inv_tab_conf["cat_pos"][1]] = 'category'
-
-    new_table = pm2.pm2io.nir_add_unit_information(
-        new_table,
-        unit_row=inv_tab_conf["unit_row"], entity_row=inv_tab_conf["entity_row"],
-        regexp_entity=inv_tab_conf["regex_entity"], regexp_unit=inv_tab_conf[
-            "regex_unit"],
-        default_unit="", manual_repl_unit=inv_tab_conf["unit_repl"])
-
-    # fix individual values
-    if table == '1996':
-        loc = new_table[new_table["category"] == "NITRIC ACID PRODUCTION"].index
-        value = new_table.loc[loc, "CH4"].values
-        new_table.loc[loc, "N2O"] = value[0, 0]
-        new_table.loc[loc, "CH4"] = ''
-    if table == '2015':
-        loc_total = new_table[
-            new_table["category"] == "Total national emissions and removals"].index
-        loc_IPPU = new_table[new_table["category"] == "2. Industrial processes"].index
-        value = new_table.loc[loc_IPPU, "PFCs"].values
-        new_table.loc[loc_total, "PFCs"] = value[0, 0]
-
-    # remove lines with empty category
-    new_table = new_table.drop(new_table[new_table["category"] == ""].index)
-
-    # rename E. Other (please specify) according to row above
-    e_locs = list(new_table[new_table["category"] == "E. Other (please specify)"].index)
-    for loc in e_locs:
-        iloc = new_table.index.get_loc(loc)
-        if new_table.iloc[iloc - 1]["category"][
-            0] == "D. CO2 emissions and removals from soil":
-            new_table.loc[loc]["category"] = "E. Other (LULUCF)"
-        elif new_table.iloc[iloc - 1]["category"][0] in ["D.Waste-water handling",
-                                                         'D. Waste-water handling']:
-            new_table.loc[loc]["category"] = "E. Other (Waste)"
-
-    # rename G. Other (please specify) according to row above
-    g_locs = list(new_table[new_table["category"] == "G. Other (please specify)"].index)
-    for loc in g_locs:
-        iloc = new_table.index.get_loc(loc)
-        if new_table.iloc[iloc - 1]["category"][
-            0] == "F. Field burning of agricultural residues":
-            new_table.loc[loc]["category"] = "G. Other (Agri)"
-        elif new_table.iloc[iloc - 1]["category"][
-            0] == "F. Consumption of halocarbons and sulphur hexafluoride":
-            new_table.loc[loc]["category"] = "G. Other (IPPU)"
-
-    # set index and convert to long format
-    new_table = new_table.set_index(inv_tab_conf["index_cols"])
-    new_table_long = pm2.pm2io.nir_convert_df_to_long(new_table, year,
-                                                      inv_tab_conf["header_long"])
-    # remove line breaks in values
-    new_table_long["data"] = new_table_long["data"].str.replace("\n", "")
-
-    if table_inv is None:
-        table_inv = new_table_long
-    else:
-        table_inv = pd.concat([table_inv, new_table_long], axis=0, join='outer')
-        table_inv = table_inv.reset_index(drop=True)
-
-# no NE,NA etc, just numbers, so we can just remove the ','
-table_inv["data"] = table_inv["data"].str.replace(',', '')
-table_inv["data"] = table_inv["data"].str.replace(' ', '')
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if_inv = pm2.pm2io.convert_long_dataframe_if(
-    table_inv,
-    coords_cols=coords_cols,
-    # add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    # coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    # filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format='%Y',
-)
-
-data_pm2_inv = pm2.pm2io.from_interchange_format(data_if_inv)
-
-#### combine
-# tolerance needs to be high as rounding in trend tables leads to inconsistent data
-data_pm2 = data_pm2_inv.pr.merge(data_pm2_trends,tolerance=0.11)
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"), data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding)
-
-
-#### processing
-data_proc_pm2 = data_pm2
-
-# combine CO2 emissions and removals
-temp_CO2 = data_proc_pm2["CO2"].copy()
-#data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].to_array()
-# .pr.sum(dim="variable", skipna=True, min_count=1)
-data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].pr.sum\
-    (dim="entity", skipna=True, min_count=1)
-data_proc_pm2["CO2"].attrs = temp_CO2.attrs
-data_proc_pm2["CO2"] = data_proc_pm2["CO2"].fillna(temp_CO2)
-
-# actual processing
-country_processing_step1 = {
-    'aggregate_cats': cats_to_agg,
-}
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=['CO2 emissions', 'CO2 removals'],
-    gas_baskets={},
-    processing_info_country=country_processing_step1,
-)
-
-country_processing_step2 = {
-    'downscale': downscaling,
-    'basket_copy': basket_copy,
-}
-
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=country_processing_step2,
-    cat_terminology_out = terminology_proc,
-    category_conversion = cat_conversion,
-    sectors_out = sectors_to_save,
-)
-
-# adapt source and metadata
-# TODO: processing info is present twice
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"),
-    encoding=encoding)

+ 0 - 676
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR3.py

@@ -1,676 +0,0 @@
-import pandas as pd
-gwp_to_use = "AR4GWP100"
-
-
-cat_names_fix = {
-    '2A3 Glass Prod.': '2A3 Glass Production',
-    '2F6 Other Applications': '2F6 Other Applications (please specify)',
-    '3A2 Manure Mngmt': '3A2 Manure Mngmt.',
-    '3C7 Rice Cultivations': '3C7 Rice Cultivation',
-}
-
-values_replacement = {
-    '': '-',
-    ' ': '-',
-}
-
-cols_for_space_stripping = ["Categories"]
-
-index_cols = ["Categories", "entity", "unit"]
-
-# parameters part 2: conversion to interchange format
-cats_remove = ['Memo items', 'Information items']
-
-cat_codes_manual = {
-    'Annual change in long-term storage of carbon in HWP waste': 'M.LTS.AC.HWP',
-    'Annual change in total long-term storage of carbon stored': 'M.LTS.AC.TOT',
-    'CO2 captured': 'M.CCS',
-    'CO2 from Biomass Burning for Energy Production': 'M.BIO',
-    'For domestic storage': 'M.CCS.DOM',
-    'For storage in other countries': 'M.CCS.OCT',
-    'International Aviation (International Bunkers)': 'M.BK.A',
-    'International Bunkers': 'M.BK',
-    'International Water-borne Transport (International Bunkers)': 'M.BK.M',
-    'Long-term storage of carbon in waste disposal sites': 'M.LTS.WASTE',
-    'Multilateral Operations': 'M.MULTIOP',
-    'Other (please specify)': 'M.OTHER',
-    'Total National Emissions and Removals': '0',
-}
-
-cat_code_regexp = r'(?P<code>^[A-Z0-9]{1,4})\s.*'
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "MYS-GHG-inventory",
-    "provenance": "measured",
-    "area": "MYS",
-    "scenario": "BUR3"
-}
-
-coords_value_mapping = {
-}
-
-coords_cols = {
-    "category": "Categories",
-    "entity": "entity",
-    "unit": "unit"
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/267685",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Malaysia - Third Biennial Update Report to the UNFCCC",
-    "comment": "Read fom pdf file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-terminology_proc = coords_terminologies["category"]
-
-table_def_templates = {
-    '184': { #184
-        "area": ['54,498,793,100'],
-        "cols": ['150,197,250,296,346,394,444,493,540,587,637,685,738'],
-        "rows_to_fix": {
-            3: ['Total National', '1A Fuel Combustion', '1A1 Energy', '1A2 Manufacturing',
-                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other emissions',
-                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A1 Cement',
-               ],
-        },
-    },
-    '185': { #184
-        "area": ['34,504,813,99'],
-        "cols": ['128,177,224,273,321,373,425,473,519,564,611,661,713,765'],
-        "rows_to_fix": {
-            3: ['Total National', '1A Fuel', '1A1 Energy', '1A2 Manufacturing',
-                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other',
-                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A Mineral',
-                '2A1 Cement', '2A2 Lime',
-               ],
-        },
-    },
-    '186': { #also 200
-        "area": ['53,498,786,104'],
-        "cols": ['150,197,238,296,347,396,444,489,540,587,634,686,739'],
-        "rows_to_fix": {
-            3: ['2A3 Glass', '2A4 Other Process', '2A5 Other (please',
-                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
-                '2B3 Adipic Acid', '2B4 Caprolactam,', '2B5 Carbide',
-                '2B6 Titanium', '2B7 Soda Ash', '2B8 Petrochemical',
-                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys'
-               ],
-            2: ['2B9 Fluorochemical'],
-        },
-    },
-    '187': { # also 201
-        "area": ['39,499,807,91'],
-        "cols": ['132,185,232,280,327,375,425,470,522,568,613,664,713,763'],
-        "rows_to_fix": {
-            3: ['2A3 Glass', '2A4 Other Process', '2A5 Other (please',
-                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
-                '2B3 Adipic Acid', '2B5 Carbide',
-                '2B6 Titanium', '2B7 Soda Ash', '2B8 Petrochemical',
-                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys',
-               ],
-            2: ['2B9 Fluorochemical'],
-            5: ['2B4 Caprolactam,'],
-        },
-    },
-    '188': {
-        "area": ['48,503,802,92'],
-        "cols": ['146,194,245,295,346,400,452,500,549,596,642,695,746'],
-        "rows_to_fix": {
-            3: ['2C3 Aluminium', '2C4 Magnesium', '2C7 Other (please',
-                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
-                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
-                '2F1 Refrigeration',
-               ],
-            2: ['2E2 TFT Flat Panel', '2E4 Heat Transfer'],
-            5: ['2F Product Uses as'],
-        },
-    },
-    '189': {
-        "area": ['41,499,806,95'],
-        "cols": ['141,184,233,282,331,376,427,472,520,567,618,665,717,760'],
-        "rows_to_fix": {
-            3: ['2C3 Aluminium', '2C4 Magnesium', '2C7 Other (please',
-                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
-                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
-                '2F1 Refrigeration',
-               ],
-            2: ['2E2 TFT Flat Panel', '2E4 Heat Transfer'],
-            5: ['2F Product Uses as'],
-        },
-    },
-    '190': {
-        "area": ['45,500,802,125'],
-        "cols": ['146,193,243,295,349,400,453,501,549,595,644,696,748'],
-        "rows_to_fix": {
-            3: ['2F2 Foam Blowing', '2F6 Other', '2G Other Product',
-                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
-                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
-               ],
-            2: ['2G1 Electrical', '2G3 N2O from', '3A1 Enteric'],
-        },
-    },
-    '191': {
-        "area": ['38,498,814,120'],
-        "cols": ['130,180,229,277,326,381,429,477,526,570,620,669,717,765'],
-        "rows_to_fix": {
-            3: ['2F2 Foam Blowing', '2F6 Other', '2G Other Product',
-                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
-                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
-               ],
-            2: ['2G1 Electrical', '2G3 N2O from', '3A1 Enteric'],
-        },
-    },
-    '192': {
-        "area": ['39,502,807,106'],
-        "cols": ['134,193,245,296,346,400,455,507,556,602,650,701,755'],
-        "rows_to_fix": {
-            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
-                '3C6 Indirect N2O', '3C8 Other (please', '3D1 Harvested Wood',
-                '3D2 Other (please',
-               ],
-            5: ['3C Aggregate',],
-        },
-    },
-    '193': {
-        "area": ['36,508,815,119'],
-        "cols": ['128,179,228,278,327,379,428,476,525,571,622,670,717,766'],
-        "rows_to_fix": {
-            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
-                '3C6 Indirect N2O', '3C8 Other (please', '3D1 Harvested',
-                '3D2 Other (please',
-               ],
-            5: ['3C Aggregate',],
-        },
-    },
-    '194': {
-        "area": ['80,502,762,151'],
-        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
-        "rows_to_fix": {
-            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',],
-            2: ['4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised Waste',
-                '4B Biological Treatment', '4D Wastewater', '4D1 Domestic Wastewater',
-                '4D2 Industrial Wastewater',
-               ],
-            5: ['5A Indirect N2O'],
-        },
-    },
-    '195': {
-        "area": ['78,508,765,103'],
-        "cols": ['191,230,271,314,352,400,438,475,519,566,600,645,686,730'],
-        "rows_to_fix": {
-            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',
-                '4B Biological', '4D Wastewater', '4D1 Domestic',
-                '4D2 Industrial', '5B Other (please'
-               ],
-            2: ['4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised',
-                '4A Solid Waste',
-               ],
-            5: ['5A Indirect N2O'],
-        },
-    },
-    '196': {
-        "area": ['80,502,762,151'],
-        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
-        "rows_to_fix": {
-            3: ['International Aviation', 'International Water-borne',
-                'CO2 from Biomass Burning', 'For storage in other',
-                'Long-term storage of', 'Annual change in total',
-                'Annual change in long-',
-               ],
-        },
-    },
-    '197': {
-        "area": ['74,507,779,201'],
-        "cols": ['182,226,268,311,354,398,444,482,524,565,610,654,693,733'],
-        "rows_to_fix": {
-            3: ['International Aviation', 'International Water-',
-                'CO2 from Biomass', 'For storage in other',
-                'Long-term storage of', 'Annual change in total',
-                'Annual change in long-',
-               ],
-        },
-    },
-    '198': { # first CH4 table
-        "area": ['54,498,793,100'],
-        "cols": ['140,197,250,296,346,394,444,493,540,587,637,685,738'],
-        "rows_to_fix": {
-            3: ['Total National', '1A Fuel Combustion', '1A1 Energy', '1A2 Manufacturing',
-                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other emissions',
-                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A1 Cement',
-               ],
-            -3: ['2A Mineral Industry'],
-        },
-    },
-    '199': {
-        "area": ['34,506,818,97'],
-        "cols": ['132,177,228,276,329,377,432,479,528,574,618,667,722,774'],
-        "rows_to_fix": {
-            3: ['Total National', '1A Fuel', '1A1 Energy', '1A2 Manufacturing',
-                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other',
-                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A1 Cement',
-                '2A Mineral', '2A2 Lime',
-               ],
-        },
-    },
-    '202': {
-        "area": ['48,503,802,92'],
-        "cols": ['146,194,245,295,346,400,452,500,549,596,642,695,746'],
-        "rows_to_fix": {
-            3: ['2C3 Aluminium', '2C7 Other (please',
-                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
-                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
-               ],
-            2: ['2C4 Magnesium', '2E2 TFT Flat Panel', '2E4 Heat Transfer',
-                '2F1 Refrigeration',
-               ],
-            5: ['2F Product Uses as'],
-        },
-    },
-    '203': {
-        "area": ['41,499,806,95'],
-        "cols": ['141,184,233,282,331,376,427,472,520,567,618,665,717,760'],
-        "rows_to_fix": {
-            3: ['2C3 Aluminium', '2C7 Other (please',
-                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
-                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
-               ],
-            2: ['2C4 Magnesium', '2E2 TFT Flat Panel', '2E4 Heat Transfer',
-                '2F1 Refrigeration'
-               ],
-            5: ['2F Product Uses as'],
-        },
-    },
-    '204': {
-        "area": ['45,500,802,125'],
-        "cols": ['146,193,243,295,349,400,455,501,549,595,644,696,748'],
-        "rows_to_fix": {
-            3: ['2F6 Other', '2G Other Product',
-                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
-                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
-                '3A1 Enteric',
-               ],
-            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G3 N2O from'],
-        },
-    },
-    '205': {
-        "area": ['38,498,814,120'],
-        "cols": ['130,180,229,277,326,381,429,477,526,570,620,669,717,765'],
-        "rows_to_fix": {
-            3: ['2F6 Other', '2G Other Product',
-                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
-                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
-                '3A1 Enteric',
-               ],
-            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G3 N2O from'],
-        },
-    },
-    '206': { #also 220
-        "area": ['39,502,807,106'],
-        "cols": ['134,193,245,296,346,400,455,507,556,602,650,701,755'],
-        "rows_to_fix": {
-            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
-                '3C6 Indirect N2O', '3C8 Other (please',
-                '3D2 Other (please',
-               ],
-            2: ['3D1 Harvested Wood',],
-            5: ['3C Aggregate',],
-        },
-    },
-    '207': { # also 221
-        "area": ['36,508,815,110'],
-        "cols": ['128,179,228,278,327,379,428,476,527,571,622,670,717,766'],
-        "rows_to_fix": {
-            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
-                '3C6 Indirect N2O', '3C8 Other (please',
-                '3D2 Other (please',
-               ],
-            2: ['3D1 Harvested',],
-            5: ['3C Aggregate',],
-        },
-    },
-    '208': { # also 222
-        "area": ['80,502,762,151'],
-        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
-        "rows_to_fix": {
-            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',
-                '4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised Waste',
-                '4B Biological Treatment', '4D Wastewater', '4D1 Domestic Wastewater',
-                '4D2 Industrial Wastewater'
-               ],
-            5: ['5A Indirect N2O'],
-        },
-    },
-    '209': { # also 223
-        "area": ['78,508,765,103'],
-        "cols": ['191,230,271,314,352,400,438,475,519,560,600,645,686,730'],
-        "rows_to_fix": {
-            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',
-                '4B Biological', '4D Wastewater', '4D1 Domestic',
-                '4D2 Industrial', '5B Other (please',
-                '4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised',
-                '4A Solid Waste'
-               ],
-            5: ['5A Indirect N2O'],
-        },
-    },
-    '210': { # also 224
-        "area": ['80,502,762,151'],
-        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
-        "rows_to_fix": {
-            3: ['International Aviation', 'International Water-borne',
-                'Long-term storage of', 'Annual change in total',
-                'Annual change in long-',
-               ],
-            2: ['CO2 from Biomass Burning', 'For storage in other',],
-        },
-    },
-    '211': { # also 225
-        "area": ['74,507,779,201'],
-        "cols": ['182,226,268,311,354,398,444,482,524,565,610,654,693,733'],
-        "rows_to_fix": {
-            3: ['International Aviation', 'International Water-',
-                'Long-term storage of', 'Annual change in total',
-                'Annual change in long-', 'CO2 from Biomass',
-               ],
-            2: ['For storage in other',],
-        },
-    },
-    '212': {
-        "area": ['54,498,793,100'],
-        "cols": ['150,197,250,296,346,394,444,493,540,587,637,685,738'],
-        "rows_to_fix": {
-            3: ['Total National', '1A Fuel Combustion', '1A1 Energy', '1A2 Manufacturing',
-                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other emissions',
-                '1C Carbon Dioxide', '2 INDUSTRIAL',
-               ],
-            2: ['2A1 Cement',],
-        },
-    },
-    '213': {
-        "area": ['34,504,813,99'],
-        "cols": ['128,177,224,273,321,373,425,473,519,564,611,661,713,765'],
-        "rows_to_fix": {
-            3: ['Total National', '1A Fuel', '1A1 Energy', '1A2 Manufacturing',
-                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other',
-                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A Mineral',
-               ],
-            2: ['2A1 Cement', '2A2 Lime',],
-        },
-    },
-    '214': {
-        "area": ['47,499,801,93'],
-        "cols": ['141,197,246,297,350,396,453,502,550,595,642,692,748'],
-        "rows_to_fix": {
-            3: ['2A5 Other (please',
-                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
-                '2B3 Adipic Acid', '2B4 Caprolactam,', '2B5 Carbide',
-                '2B6 Titanium', '2B7 Soda Ash', '2B8 Petrochemical',
-                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys'
-               ],
-            2: ['2A3 Glass', '2A4 Other Process', '2B9 Fluorochemical'],
-            -3: ['2C Metal Industry'],
-        },
-    },
-    '215': {
-        "area": ['39,499,807,91'],
-        "cols": ['132,180,232,280,327,375,425,470,522,568,613,664,713,763'],
-        "rows_to_fix": {
-            3: ['2A5 Other (please',
-                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
-                '2B3 Adipic Acid', '2B4 Caprolactam,', '2B5 Carbide',
-                '2B6 Titanium Dioxide', '2B7 Soda Ash', '2B8 Petrochemical',
-                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys'
-               ],
-            2: ['2A4 Other Process', '2B9 Fluorochemical'],
-            -3: ['2C Metal Industry'],
-        },
-    },
-    '216': {
-        "area": ['48,503,802,92'],
-        "cols": ['146,194,245,295,346,400,452,500,549,596,642,695,746'],
-        "rows_to_fix": {
-            3: ['2C7 Other (please', '2D Non-Energy', '2D2 Paraffin Wax',
-                '2D4 Other (please', '2E Electronics', '2E1 Integrated',
-                '2E5 Other (please',
-               ],
-            2: ['2C3 Aluminium', '2C4 Magnesium', '2E2 TFT Flat Panel',
-                '2E4 Heat Transfer', '2F1 Refrigeration',
-               ],
-            5: ['2F Product Uses as'],
-        },
-    },
-    '217': {
-        "area": ['41,499,806,95'],
-        "cols": ['141,184,233,282,331,376,427,472,520,567,618,665,717,760'],
-        "rows_to_fix": {
-            3: ['2C7 Other (please', '2D Non-Energy', '2D2 Paraffin Wax',
-                '2D4 Other (please', '2E Electronics', '2E1 Integrated',
-                '2E5 Other (please',
-               ],
-            2: ['2C3 Aluminium', '2C4 Magnesium', '2E2 TFT Flat Panel',
-                '2E4 Heat Transfer', '2F1 Refrigeration',
-               ],
-            5: ['2F Product Uses as'],
-        },
-    },
-    '218': {
-        "area": ['45,500,802,125'],
-        "cols": ['146,193,243,295,349,400,455,501,549,595,644,696,748'],
-        "rows_to_fix": {
-            3: ['2F6 Other', '2G Other Product', '2G2 SF6 and PFCs',
-                '2G3 N2O from', '2H3 Other (please', '3 AGRICULTURE,',
-               ],
-            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G4 Other (Please',
-                '2H1 Pulp and Paper', '2H2 Food and', '3A1 Enteric',],
-        },
-    },
-    '219': {
-        "area": ['38,498,814,120'],
-        "cols": ['130,180,229,277,326,381,429,477,526,570,620,669,717,765'],
-        "rows_to_fix": {
-            3: ['2F6 Other', '2G Other Product', '2G2 SF6 and PFCs',
-                '2G3 N2O from', '2H3 Other (please', '3 AGRICULTURE,',
-               ],
-            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G4 Other (Please',
-                '2H1 Pulp and Paper', '2H2 Food and', '3A1 Enteric',],
-        },
-    },
-    '226': { # also 334, 238
-        "area": ['48,510,797,99'],
-        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
-        "rows_to_fix": {
-            2: ['2B4 Caprolactam, Glyoxal and Glyoxylic Acid'],
-        }
-    },
-    '227': { # also 331, 335, 339
-        "area": ['27,510,818,99'],
-        "cols": ['250,290,333,372,413,452,494,536,576,616,656,699,739,781'],
-        "rows_to_fix": {
-            2: ['2B4 Caprolactam, Glyoxal and Glyoxylic Acid'],
-        }
-    },
-    '228': {
-        "area": ['48,510,797,99'],
-        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone'],
-            2: ['2D Non-Energy Products from Fuels and Solvent'],
-        },
-    },
-    '229': {
-        "area": ['25,512,819,86'],
-        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone'],
-            2: ['2D Non-Energy Products from Fuels and Solvent'],
-        },
-    },
-    '230': {
-        "area": ['48,510,797,99'],
-        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
-        "rows_to_fix": {
-            -3: ['Total National Emissions and Removals', '2 INDUSTRIAL PROCESSES AND PRODUCT USE'],
-            2: ['2B4 Caprolactam, Glyoxal and Glyoxylic Acid'],
-        }
-    },
-    '232': { # also 236
-        "area": ['48,510,797,99'],
-        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
-        "rows_to_fix": {
-            -3: ['2G2 SF6 and PFCs from Other Product Uses',],
-            2: ['2D Non-Energy Products from Fuels and Solvent',
-                '2F Product Uses as Substitutes for Ozone',]
-        },
-    },
-    '233': {
-        "area": ['25,512,819,86'],
-        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
-        "rows_to_fix": {
-            -5: ['2F Product Uses as Substitutes for Ozone'],
-            2: ['2D Non-Energy Products from Fuels and Solvent'],
-            -3: ['2G Other Product Manufacture and Use',
-                 '2G2 SF6 and PFCs from Other Product Uses',]
-        },
-    },
-    '237': {
-        "area": ['25,512,819,86'],
-        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
-        "rows_to_fix": {
-            2: ['2D Non-Energy Products from Fuels and Solvent',
-                '2F Product Uses as Substitutes for Ozone'],
-        },
-    },
-    '240': {
-        "area": ['48,510,797,99'],
-        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
-        "rows_to_fix": {
-            2: ['2D Non-Energy Products from Fuels and Solvent',
-                '2F Product Uses as Substitutes for Ozone'],
-            -3: ['2E Electronics Industry',
-                 '2F1 Refrigeration and Air Conditioning',
-                 '2G2 SF6 and PFCs from Other Product Uses',],
-        },
-    },
-    '241': {
-        "area": ['25,512,819,86'],
-        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
-        "rows_to_fix": {
-            2: ['2D Non-Energy Products from Fuels and Solvent',
-                '2F Product Uses as Substitutes for Ozone',
-                '2E1 Integrated Circuit or Semiconductor',],
-            -3: ['2F1 Refrigeration and Air Conditioning',
-                 '2G2 SF6 and PFCs from Other Product Uses',],
-        },
-    },
-}
-
-table_defs = {
-    '184': {"template": '184', "entity": "CO2", "unit": "Gg CO2 / yr"}, #CO2
-    '185': {"template": '185', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '186': {"template": '186', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '187': {"template": '187', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '188': {"template": '188', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '189': {"template": '189', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '190': {"template": '190', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '191': {"template": '191', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '192': {"template": '192', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '193': {"template": '193', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '194': {"template": '194', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '195': {"template": '195', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '196': {"template": '196', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '197': {"template": '197', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '198': {"template": '198', "entity": "CH4", "unit": "Gg CH4 / yr"}, #CH4
-    '199': {"template": '199', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '200': {"template": '186', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '201': {"template": '187', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '202': {"template": '202', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '203': {"template": '203', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '204': {"template": '204', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '205': {"template": '205', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '206': {"template": '206', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '207': {"template": '207', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '208': {"template": '208', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '209': {"template": '209', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '210': {"template": '210', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '211': {"template": '211', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '212': {"template": '212', "entity": "N2O", "unit": "Gg N2O / yr"}, #N2O
-    '213': {"template": '213', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '214': {"template": '214', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '215': {"template": '215', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '216': {"template": '216', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '217': {"template": '217', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '218': {"template": '218', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '219': {"template": '219', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '220': {"template": '206', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '221': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '222': {"template": '208', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '223': {"template": '209', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '224': {"template": '210', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '225': {"template": '211', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '226': {"template": '226', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"}, #HFCs
-    '227': {"template": '227', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '228': {"template": '228', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '229': {"template": '229', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '230': {"template": '230', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"}, #PFCs
-    '231': {"template": '227', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '232': {"template": '232', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '233': {"template": '233', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '234': {"template": '226', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"}, #SF6
-    '235': {"template": '227', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '236': {"template": '232', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '237': {"template": '237', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '238': {"template": '226', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"}, #NF3
-    '239': {"template": '227', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '240': {"template": '240', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"},
-    '241': {"template": '241', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"},
-}
-
-country_processing_step1 = {
-    'aggregate_cats': {
-        'M.3.C.AG': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
-                                 '3.C.6', '3.C.7', '3.C.8'],
-                     'name': 'Aggregate sources and non-CO2 emissions sources on land '
-                             '(Agriculture)'},
-        'M.3.D.AG': {'sources': ['3.D.2'],
-                     'name': 'Other (Agriculture)'},
-        'M.AG.ELV': {'sources': ['M.3.C.AG', 'M.3.D.AG'],
-                     'name': 'Agriculture excluding livestock'},
-        'M.AG': {'sources': ['3.A', 'M.AG.ELV'],
-                     'name': 'Agriculture'},
-        'M.3.D.LU': {'sources': ['3.D.1'],
-                     'name': 'Other (LULUCF)'},
-        'M.LULUCF': {'sources': ['3.B', 'M.3.D.LU'],
-                     'name': 'LULUCF'},
-        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'],
-                     'name': 'National total emissions excluding LULUCF'},
-    },
-    'basket_copy': {
-        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
-        'entities': ["HFCS", "PFCS"],
-        'source_GWP': gwp_to_use,
-    },
-}
-
-gas_baskets = {
-    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
-    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
-    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
-    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
-    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
-    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
-}

+ 0 - 402
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR4.py

@@ -1,402 +0,0 @@
-import pandas as pd
-gwp_to_use = "AR4GWP100"
-
-
-cat_names_fix = {
-    #'2A3 Glass Prod.': '2A3 Glass Production',
-    #'2F6 Other Applications': '2F6 Other Applications (please specify)',
-    #'3A2 Manure Mngmt': '3A2 Manure Mngmt.',
-    #'3C7 Rice Cultivations': '3C7 Rice Cultivation',
-}
-
-values_replacement = {
-    '': '-',
-    ' ': '-',
-}
-
-cols_for_space_stripping = ["Categories"]
-
-index_cols = ["Categories", "entity", "unit"]
-
-# parameters part 2: conversion to interchange format
-cats_remove = ['Memo items', 'Information items',  'Information items (1)']
-
-cat_codes_manual = {
-    'Annual change in long-term storage of carbon in HWP waste': 'M.LTS.AC.HWP',
-    'Annual change in total long-term storage of carbon stored': 'M.LTS.AC.TOT',
-    'CO2 captured': 'M.CCS',
-    'CO2 from Biomass Burning for Energy Production': 'M.BIO',
-    'For domestic storage': 'M.CCS.DOM',
-    'For storage in other countries': 'M.CCS.OCT',
-    'International Aviation (International Bunkers)': 'M.BK.A',
-    'International Bunkers': 'M.BK',
-    'International Water-borne Transport (International Bunkers)': 'M.BK.M',
-    'Long-term storage of carbon in waste disposal sites': 'M.LTS.WASTE',
-    'Multilateral Operations': 'M.MULTIOP',
-    'Other (please specify)': 'M.OTHER',
-    'Total National Emissions and Removals': '0',
-}
-
-cat_code_regexp = r'(?P<code>^[A-Z0-9]{1,4})\s.*'
-
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "MYS-GHG-inventory",
-    "provenance": "measured",
-    "area": "MYS",
-    "scenario": "BUR4"
-}
-
-coords_value_mapping = {
-}
-
-coords_cols = {
-    "category": "Categories",
-    "entity": "entity",
-    "unit": "unit"
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-#filter_remove = {
-#    "f1": {
-#        "entity": ["CO2(grossemissions)", "CO2(removals)"],
-#    },
-#}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/624776",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Malaysia - Fourth Biennial Update Report under the UNFCCC",
-    "comment": "Read fom pdf file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-terminology_proc = coords_terminologies["category"]
-
-table_def_templates = {
-    # CO2
-    '203': {  # 203, 249
-        "area": ['70,480,768,169'],
-    },
-    '204': {  # 204
-        "area": ['70,500,763,141'],
-    },
-    '205': {  # 205, 209, 2014, 2018
-        "area": ['70,495,763,95'],
-        "rows_to_fix": {
-            2: ['5A Indirect N2O emissions from the atmospheric deposition of'],
-        },
-    },
-    '206': {  # 206
-        "area": ['70,495,763,353'],
-    },
-    '207': {  # 207, 208, 211, 212, 213, 215, 217, 223, 227, 231,
-        # 251, 257, 259, 263, 265
-        "area": ['70,495,763,95'],
-    },
-    '216': {  #  216
-        "area": ['70,500,763,95'],
-    },
-    # CH4
-    '219': {  # 219, 255
-        "area": ['70,480,768,100'],
-    },
-    '220': {  # 220, 224, 228
-        "area": ['70,495,763,95'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    '221': {  # 221
-        "area": ['92,508,748,92'],
-        "cols": ['298,340,380,422,462,502,542,582,622,662,702'],
-        "rows_to_fix": {
-            3: ['3C Aggregate sources and Non-CO2 emissions'],
-            2: ['5A Indirect N2O emissions from the atmospheric'],
-        },
-    },
-    '222': {  # 222
-        "area": ['70,495,763,323'],
-        "rows_to_fix": {
-            2: ['Annual change in long-term storage of carbon in HWP'],
-        },
-    },
-    '225': {  # 225
-        "area": ['92,508,748,92'],
-        "cols": ['311,357,400,443,486,529,572,615,658,701'],
-        "rows_to_fix": {
-            3: ['3C Aggregate sources and Non-CO2 emissions'],
-        },
-    },
-    '226': {  # 226, 230
-        "area": ['70,495,763,95'],
-        "rows_to_fix": {
-            2: ['5A Indirect N2O emissions from the atmospheric',
-                'Annual change in long-term storage of carbon in HWP'],
-        },
-    },
-    '229': {  # 229
-        "area": ['114,508,725,92'],
-        "cols": ['333,379,421,464,506,548,590,632,674'],
-        "rows_to_fix": {
-            3: ['3C Aggregate sources and Non-CO2 emissions'],
-        },
-    },
-    # N2O
-    '232': {  # 232
-        "area": ['70,495,763,95'],
-        "cols": ['315,366,416,466,516,566,616,666,716'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    '233': {  # 233
-        "area": ['70,495,763,95'],
-        "rows_to_fix": {
-            3: ['3C Aggregate sources and Non-CO2 emissions'],
-        },
-    },
-    '234': {  # 234
-        "area": ['70,495,763,95'],
-        "rows_to_fix": {
-            3: ['International Water-borne Transport (International'],
-        },
-    },
-    '236': {  # 236
-        "area": ['70,495,763,95'],
-        "cols": ['298,344,392,439,487,534,580,629,675,721'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    '240': {  # 240
-        "area": ['70,495,763,95'],
-        "cols": ['283,329,372,416,459,504,550,594,639,682,726'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    # HFCs
-    '243': {  # 243
-        "area": ['70,480,763,95'],
-        "cols": ['408,449,489,527,567,604,644,681,721'],
-    },
-    '244': {  # 244
-        "area": ['70,495,763,95'],
-        "cols": ['408,449,489,527,567,604,644,681,721'],
-    },
-    '245': {  # 245, 246
-        "area": ['70,495,763,95'],
-        "cols": ['405,442,478,515,550,587,621,657,693,729'],
-    },
-    '247': {  # 247, 248
-        "area": ['70,495,763,95'],
-        "cols": ['384,426,459,493,531,564,597,633,666,700,735'],
-    },
-    # PFCs
-    '250': {  # 250
-        "area": ['70,495,763,95'],
-        "cols": ['341,389,436,485,531,579,626,674,723'],
-    },
-    '252': {  # 252
-        "area": ['70,495,763,95'],
-        "cols": ['323,370,415,459,504,547,590,636,680,726'],
-    },
-    '253': {  # 253
-        "area": ['70,495,763,95'],
-        "cols": ['334,378,419,464,511,554,597,636,668,702,735'],
-    },
-    '254': {  # 254
-        "area": ['70,495,763,95'],
-        "cols": ['330,378,419,464,511,554,597,636,668,702,735'],
-        "rows_to_fix": {
-            -3: ['2F Product Uses as Substitutes for Ozone Depleting Substances'],
-        },
-    },
-    # SF6
-    '256': {  # 256
-        "area": ['70,495,763,95'],
-        "cols": ['382,420,462,504,546,588,630,672,714'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    '258': {  # 258
-        "area": ['70,495,763,95'],
-        "cols": ['363,399,441,481,522,564,606,646,688,728'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    '260': {  # 260
-        "area": ['70,495,763,95'],
-        "cols": ['346,381,419,458,498,536,576,614,652,692,732'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    # NF3
-    '261': {  # 261
-        "area": ['70,490,768,100'],
-        "cols": ['364,412,454,496,538,581,623,667,710'],
-    },
-    '262': {  # 262
-        "area": ['70,495,763,95'],
-        "cols": ['376,420,462,504,545,591,633,676,718'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    '264': {  # 264
-        "area": ['70,495,763,95'],
-        "cols": ['370,415,451,491,530,569,609,651,689,729'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-    '266': {  # 266
-        "area": ['70,495,763,95'],
-        "cols": ['355,392,430,467,505,544,580,619,656,695,732'],
-        "rows_to_fix": {
-            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
-        },
-    },
-}
-
-table_defs = {
-    '203': {"template": '203', "entity": "CO2", "unit": "Gg CO2 / yr"},  # CO2
-    '204': {"template": '204', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '205': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '206': {"template": '206', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '207': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '208': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '209': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '210': {"template": '206', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '211': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '212': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '213': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '214': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '215': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '216': {"template": '216', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '217': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '218': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
-    '219': {"template": '219', "entity": "CH4", "unit": "Gg CH4 / yr"},  # CH4
-    '220': {"template": '220', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '221': {"template": '221', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '222': {"template": '222', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '223': {"template": '207', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '224': {"template": '220', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '225': {"template": '225', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '226': {"template": '226', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '227': {"template": '207', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '228': {"template": '220', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '229': {"template": '229', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '230': {"template": '226', "entity": "CH4", "unit": "Gg CH4 / yr"},
-    '231': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},  # N2O
-    '232': {"template": '232', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '233': {"template": '233', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '234': {"template": '234', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '235': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '236': {"template": '236', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '237': {"template": '233', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '238': {"template": '234', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '239': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '240': {"template": '240', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '241': {"template": '233', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '242': {"template": '234', "entity": "N2O", "unit": "Gg N2O / yr"},
-    '243': {"template": '243', "entity": f"HFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},  # HFCs
-    '244': {"template": '244', "entity": f"HFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '245': {"template": '245', "entity": f"HFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '246': {"template": '245', "entity": f"HFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '247': {"template": '247', "entity": f"HFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '248': {"template": '247', "entity": f"HFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '249': {"template": '203', "entity": f"PFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},  # PFCs
-    '250': {"template": '250', "entity": f"PFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '251': {"template": '207', "entity": f"PFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '252': {"template": '252', "entity": f"PFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '253': {"template": '253', "entity": f"PFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '254': {"template": '254', "entity": f"PFCS ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '255': {"template": '219', "entity": f"SF6 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},  # SF6
-    '256': {"template": '256', "entity": f"SF6 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '257': {"template": '207', "entity": f"SF6 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '258': {"template": '258', "entity": f"SF6 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '259': {"template": '207', "entity": f"SF6 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '260': {"template": '260', "entity": f"SF6 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '261': {"template": '261', "entity": f"NF3 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},  # NF3
-    '262': {"template": '262', "entity": f"NF3 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '263': {"template": '207', "entity": f"NF3 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '264': {"template": '264', "entity": f"NF3 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '265': {"template": '207', "entity": f"NF3 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-    '266': {"template": '266', "entity": f"NF3 ({gwp_to_use})",
-            "unit": "Gg CO2 / yr"},
-}
-
-country_processing_step1 = {
-    'aggregate_cats': {
-        'M.3.C.AG': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
-                                 '3.C.6', '3.C.7', '3.C.8'],
-                     'name': 'Aggregate sources and non-CO2 emissions sources on land '
-                             '(Agriculture)'},
-        'M.3.D.AG': {'sources': ['3.D.2'],
-                     'name': 'Other (Agriculture)'},
-        'M.AG.ELV': {'sources': ['M.3.C.AG', 'M.3.D.AG'],
-                     'name': 'Agriculture excluding livestock'},
-        'M.AG': {'sources': ['3.A', 'M.AG.ELV'],
-                     'name': 'Agriculture'},
-        'M.3.D.LU': {'sources': ['3.D.1'],
-                     'name': 'Other (LULUCF)'},
-        'M.LULUCF': {'sources': ['3.B', 'M.3.D.LU'],
-                     'name': 'LULUCF'},
-        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'],
-                     'name': 'National total emissions excluding LULUCF'},
-    },
-    'basket_copy': {
-        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
-        'entities': ["HFCS", "PFCS"],
-        'source_GWP': gwp_to_use,
-    },
-}
-
-gas_baskets = {
-    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
-    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
-    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
-    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
-    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
-    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
-}

+ 0 - 211
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR3_from_pdf.py

@@ -1,211 +0,0 @@
-# this script reads data from Malaysia's BUR3
-
-import camelot
-import primap2 as pm2
-from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
-
-from UNFCCC_GHG_data.helper import process_data_for_country, fix_rows
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from config_MYS_BUR3 import coords_cols, coords_defaults, coords_terminologies, \
-    meta_data, add_coords_cols
-from config_MYS_BUR3 import gas_baskets, terminology_proc, country_processing_step1
-from config_MYS_BUR3 import table_def_templates, table_defs, index_cols
-from config_MYS_BUR3 import values_replacement, cat_names_fix, cols_for_space_stripping
-from config_MYS_BUR3 import cat_codes_manual, cats_remove, cat_code_regexp
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Malaysia' / 'BUR3'
-output_folder = extracted_data_path / 'UNFCCC' / 'Malaysia'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-pdf_file = "MALAYSIA_BUR3-UNFCCC_Submission.pdf"
-pdf_pages = range(184, 242)
-# CH4: 198 - 211
-# N2O: 212 - 225
-# HFCS: 226 - 228
-# PFCs: 229 - 233
-# SF6: 234 - 237
-# NF3: 238 - 241
-
-output_filename = 'MYS_BUR3_2020_'
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# reading data and aggregation into one dataframe
-# ###
-df_all = None
-for page in pdf_pages:
-    print(f"++++++++++++++++++++++++++++++++")
-    print(f"+++++ Working on page {page} ++++++")
-    print(f"++++++++++++++++++++++++++++++++")
-    page_template_nr = table_defs[str(page)]["template"]
-    area = table_def_templates[page_template_nr]["area"]
-    if "cols" in table_def_templates[page_template_nr].keys():
-        cols = table_def_templates[page_template_nr]["cols"]
-        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
-                                  flavor='stream', table_areas=area, columns=cols,
-                                  split_text=True)
-    else:
-        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
-                                  flavor='stream', table_areas=area)
-
-    df_current = tables[0].df.copy()
-    df_current.iloc[0,0] = 'Categories'
-    df_current.columns = df_current.iloc[0]
-    df_current = df_current.drop(0)
-    # replace double \n
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].str.replace("\n", " ")
-    # replace double and triple spaces
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].str.replace("   ", " ")
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].str.replace("  ", " ")
-
-    # fix the split rows
-    if "rows_to_fix" in table_def_templates[page_template_nr].keys():
-        for n_rows in table_def_templates[page_template_nr]["rows_to_fix"].keys():
-            df_current = fix_rows(df_current,
-                                  table_def_templates[page_template_nr]["rows_to_fix"][
-                                      n_rows], index_cols[0], n_rows)
-
-    # replace category names with typos
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].replace(cat_names_fix)
-
-    # replace empty stings
-    df_current = df_current.replace(values_replacement)
-
-    # add entity and unit information
-    df_current.insert(1, "unit", table_defs[str(page)]["unit"])
-    df_current.insert(1, "entity", table_defs[str(page)]["entity"])
-
-    # set index
-    # df_current = df_current.set_index(index_cols)
-    # strip trailing and leading spaces
-    for col in cols_for_space_stripping:
-        df_current[col] = df_current[col].str.strip()
-
-    # print(df_current.columns.values)
-
-    # aggregate dfs
-    if df_all is None:
-        df_all = df_current
-    else:
-        # find intersecting cols
-        cols_all = df_all.columns.values
-        cols_current = df_current.columns.values
-        cols_both = list(set(cols_all).intersection(set(cols_current)))
-        # print(cols_both)
-        if len(cols_both) > 0:
-            df_all = df_all.merge(df_current, how='outer', on=cols_both,
-                                  suffixes=(None, None))
-        else:
-            df_all = df_all.merge(df_current, how='outer', suffixes=(None, None))
-        df_all = df_all.groupby(index_cols).first().reset_index()
-        # df_all = df_all.join(df_current, how='outer')
-
-# ###
-# conversion to primap2 interchange format
-# ###
-# drop the rows with memo items etc
-for cat in cats_remove:
-    df_all = df_all.drop(df_all[df_all["Categories"] == cat].index)
-# make a copy of the categories row
-df_all["orig_cat_name"] = df_all["Categories"]
-
-# replace cat names by codes in col "Categories"
-# first the manual replacements
-df_all["Categories"] = df_all["Categories"].replace(cat_codes_manual)
-# then the regex repalcements
-repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('code'))
-df_all["Categories"] = df_all["Categories"].str.replace(cat_code_regexp, repl, regex=True)
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-# remove thousands separators as pd.to_numeric can't deal with that
-# also replace None with NaN
-year_cols = list(set(df_all.columns) - set(['Categories', 'entity', 'unit', 'orig_cat_name']))
-for col in year_cols:
-    df_all.loc[:, col] = df_all.loc[:, col].str.strip()
-    repl = lambda m: m.group('part1') + m.group('part2')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-    df_all[col][df_all[col].isnull()] = 'NaN'
-    # manually map code NENO to nan
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('NENO','NaN')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('O NANaN','NaN')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NO','0')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NA NO I','0')
-    # TODO: add code to PRIMAP2
-
-# drop orig_cat_name as it's non-unique per category
-df_all = df_all.drop(columns=["orig_cat_name"])
-
-data_if = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    #coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-    )
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save raw data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding)
-
-# ###
-# ## process the data
-# ###
-data_proc_pm2 = data_pm2
-
-# actual processing
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    gas_baskets=gas_baskets,
-    entities_to_ignore=[],
-    processing_info_country=country_processing_step1,
-)
-
-# adapt source and metadata
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"),
-    encoding=encoding)

+ 0 - 214
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR4_from_pdf.py

@@ -1,214 +0,0 @@
-# this script reads data from Malaysia's BUR4
-# code ist mostly identical to BUR3
-
-
-import camelot
-import primap2 as pm2
-from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
-
-from UNFCCC_GHG_data.helper import process_data_for_country, fix_rows
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from config_MYS_BUR4 import coords_cols, coords_defaults, coords_terminologies, \
-    meta_data, add_coords_cols
-from config_MYS_BUR4 import gas_baskets, terminology_proc, country_processing_step1
-from config_MYS_BUR4 import table_def_templates, table_defs, index_cols
-from config_MYS_BUR4 import values_replacement, cat_names_fix, cols_for_space_stripping
-from config_MYS_BUR4 import cat_codes_manual, cats_remove, cat_code_regexp
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Malaysia' / 'BUR4'
-output_folder = extracted_data_path / 'UNFCCC' / 'Malaysia'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-pdf_file = "MY_BUR4_2022.pdf"
-pdf_pages = range(203, 267)
-# CO2: 203 - 218
-# CH4: 219 - 230
-# N2O: 231 - 2242
-# HFCS: 243 - 248
-# PFCs: 249 - 254
-# SF6: 255 - 260
-# NF3: 261 - 266
-
-output_filename = 'MYS_BUR4_2022_'
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# reading data and aggregation into one dataframe
-# ###
-df_all = None
-for page in pdf_pages:
-    print(f"++++++++++++++++++++++++++++++++")
-    print(f"+++++ Working on page {page} ++++++")
-    print(f"++++++++++++++++++++++++++++++++")
-    page_template_nr = table_defs[str(page)]["template"]
-    area = table_def_templates[page_template_nr]["area"]
-    if "cols" in table_def_templates[page_template_nr].keys():
-        cols = table_def_templates[page_template_nr]["cols"]
-        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
-                                  flavor='stream', table_areas=area, columns=cols,
-                                  split_text=True)
-    else:
-        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
-                                  flavor='stream', table_areas=area)
-
-    df_current = tables[0].df.copy()
-    df_current.iloc[0,0] = 'Categories'
-    df_current.columns = df_current.iloc[0]
-    df_current = df_current.drop(0)
-    # replace double \n
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].str.replace("\n", " ")
-    # replace double and triple spaces
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].str.replace("   ", " ")
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].str.replace("  ", " ")
-
-    # fix the split rows
-    if "rows_to_fix" in table_def_templates[page_template_nr].keys():
-        for n_rows in table_def_templates[page_template_nr]["rows_to_fix"].keys():
-            df_current = fix_rows(df_current,
-                                  table_def_templates[page_template_nr]["rows_to_fix"][
-                                      n_rows], index_cols[0], n_rows)
-
-    # replace category names with typos
-    df_current[index_cols[0]] = \
-        df_current[index_cols[0]].replace(cat_names_fix)
-
-    # replace empty stings
-    df_current = df_current.replace(values_replacement)
-
-    # add entity and unit information
-    df_current.insert(1, "unit", table_defs[str(page)]["unit"])
-    df_current.insert(1, "entity", table_defs[str(page)]["entity"])
-
-    # set index
-    # df_current = df_current.set_index(index_cols)
-    # strip trailing and leading spaces
-    for col in cols_for_space_stripping:
-        df_current[col] = df_current[col].str.strip()
-
-    # print(df_current.columns.values)
-
-    # aggregate dfs
-    if df_all is None:
-        df_all = df_current
-    else:
-        # find intersecting cols
-        cols_all = df_all.columns.values
-        cols_current = df_current.columns.values
-        cols_both = list(set(cols_all).intersection(set(cols_current)))
-        # print(cols_both)
-        if len(cols_both) > 0:
-            df_all = df_all.merge(df_current, how='outer', on=cols_both,
-                                  suffixes=(None, None))
-        else:
-            df_all = df_all.merge(df_current, how='outer', suffixes=(None, None))
-        df_all = df_all.groupby(index_cols).first().reset_index()
-        # df_all = df_all.join(df_current, how='outer')
-
-# ###
-# conversion to primap2 interchange format
-# ###
-# drop the rows with memo items etc
-for cat in cats_remove:
-    df_all = df_all.drop(df_all[df_all["Categories"] == cat].index)
-# make a copy of the categories row
-df_all["orig_cat_name"] = df_all["Categories"]
-
-# replace cat names by codes in col "Categories"
-# first the manual replacements
-df_all["Categories"] = df_all["Categories"].replace(cat_codes_manual)
-# then the regex repalcements
-repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('code'))
-df_all["Categories"] = df_all["Categories"].str.replace(cat_code_regexp, repl, regex=True)
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-# remove thousands separators as pd.to_numeric can't deal with that
-# also replace None with NaN
-year_cols = list(set(df_all.columns) - set(['Categories', 'entity', 'unit', 'orig_cat_name']))
-for col in year_cols:
-    df_all.loc[:, col] = df_all.loc[:, col].str.strip()
-    repl = lambda m: m.group('part1') + m.group('part2')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-    df_all[col][df_all[col].isnull()] = 'NaN'
-    # manually map code NENO to nan
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('NENO','NaN')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('O NANaN','NaN')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NO','0')
-    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NA NO I','0')
-    # TODO: add code to PRIMAP2
-
-# drop orig_cat_name as it's non-unique per category
-df_all = df_all.drop(columns=["orig_cat_name"])
-
-data_if = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    #coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-    )
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save raw data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding)
-
-# ###
-# ## process the data
-# ###
-data_proc_pm2 = data_pm2
-
-# actual processing
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    gas_baskets=gas_baskets,
-    entities_to_ignore=[],
-    processing_info_country=country_processing_step1,
-)
-
-# adapt source and metadata
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"),
-    encoding=encoding)

+ 0 - 227
UNFCCC_GHG_data/UNFCCC_reader/Mexico/read_MEX_BUR3_from_pdf.py

@@ -1,227 +0,0 @@
-# this script reads data from Mexico's BUR3
-# Data is read from the pdf file
-
-import pandas as pd
-import primap2 as pm2
-import camelot
-from config_MEX_BUR3 import page_defs, fix_rows
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Mexico' / 'BUR3'
-output_folder = extracted_data_path / 'UNFCCC' / 'Mexico'
-if not output_folder.exists():
-   output_folder.mkdir()
-
-output_filename = 'MEX_BUR3_2022_'
-compression = dict(zlib=True, complevel=9)
-inventory_file = 'Mexico_3er_BUR.pdf'
-
-gwp_to_use = 'AR5GWP100'
-year = 2019
-entity_row = 0
-unit_row = 1
-
-index_cols = "Categorías de fuentes y sumideros de GEI"
-# special header as category UNFCCC_GHG_data and name in one column
-header_long = ["orig_cat_name", "entity", "unit", "time", "data"]
-
-units = {
-    "CO₂": "Gg",
-    "CH₄": "Gg",
-    "N₂O": "Gg",
-    "HFC": "GgCO2eq",
-    "PFC": "GgCO2eq",
-    "NF₃": "GgCO2eq",
-    "SF₆": "GgCO2eq",
-    "EMISIONES NETAS PCG AR5": "GgCO2eq",
-}
-
-# manual category codes
-cat_codes_manual = {
-    'Todas las emisiones y las absorciones nacionales': '0',
-    'Todas las emisiones (sin [3B] Tierra ni [3D1] Productos de madera recolectada': 'M0EL',
-    '2F6 Otras aplicaciones': '2F6',
-}
-
-cat_code_regexp = r'^\[(?P<UNFCCC_GHG_data>[a-zA-Z0-9]{1,3})\].*'
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "MEX-GHG-Inventory",
-    "provenance": "measured",
-    "area": "MEX",
-    "scenario": "BUR3",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-    "entity": {
-        'CH₄': 'CH4',
-        'CO₂': 'CO2',
-        'EMISIONES NETAS PCG AR5': 'KYOTOGHG (AR5GWP100)',
-        'HFC': f"HFCS ({gwp_to_use})",
-        'NF₃': f"NF3 ({gwp_to_use})",
-        'N₂O': 'N2O',
-        'PFC': f"PFCS ({gwp_to_use})",
-        'SF₆': f"SF6 ({gwp_to_use})",
-    },
-}
-
-
-filter_remove = {}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/512231",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Mexico. Biennial update report (BUR). BUR3",
-    "comment": "Read fom pdf by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
-
-# convert to mass units where possible
-entities_to_convert_to_mass = [
-    'NF3', 'SF6'
-]
-
-# ###
-# read the data from pdf into one long format dataframe
-# ###
-df_all = None
-for page in page_defs.keys():
-    print(f"Working on page {page}")
-    page_def = page_defs[page]
-    tables = camelot.read_pdf(str(input_folder / inventory_file), pages=page,
-                              **page_def["camelot"])
-    df_this_table = tables[0].df
-
-    # fix rows
-    for n_rows in page_def["rows_to_fix"].keys():
-        # replace line breaks, long hyphens, double, and triple spaces in category names
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("\n", " ")
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("   ", " ")
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("  ", " ")
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("–", "-")
-        # replace double space in entity
-        df_this_table.iloc[0, :] = df_this_table.iloc[0, :].str.replace("  ", " ")
-        df_this_table = fix_rows(df_this_table, page_def["rows_to_fix"][n_rows], 0,
-                                 n_rows)
-
-    # add units
-    for col in df_this_table.columns.values:
-        if df_this_table[col].iloc[0] in units.keys():
-            df_this_table[col].iloc[1] = units[df_this_table[col].iloc[0]]
-
-    # bring in right format for conversion to long format
-    df_this_table = pm2.pm2io.nir_add_unit_information(df_this_table, unit_row=unit_row,
-                                                       entity_row=entity_row,
-                                                       regexp_unit=".*",
-                                                       regexp_entity=".*",
-                                                       default_unit="GgCO2eq")
-
-    # set index and convert to long format
-    df_this_table = df_this_table.set_index(index_cols)
-    df_this_table_long = pm2.pm2io.nir_convert_df_to_long(df_this_table, year,
-                                                          header_long)
-
-    # combine with tables for other sectors (merge not append)
-    if df_all is None:
-        df_all = df_this_table_long
-    else:
-        df_all = pd.concat([df_all, df_this_table_long], axis=0, join='outer')
-
-# ###
-# conversion to PM2 IF
-# ###
-# make a copy of the categories row
-df_all["category"] = df_all["orig_cat_name"]
-
-# replace cat names by codes in col "category"
-# first the manual replacements
-df_all["category"] = df_all["category"].replace(cat_codes_manual)
-# then the regex replacements
-repl = lambda m: m.group('UNFCCC_GHG_data')
-df_all["category"] = df_all["category"].str.replace(cat_code_regexp, repl, regex=True)
-df_all = df_all.reset_index(drop=True)
-
-# replace "," and " " with "" in data
-df_all.loc[:, "data"] = df_all.loc[:, "data"].str.replace(',','', regex=False)
-df_all.loc[:, "data"] = df_all.loc[:, "data"].str.replace(' ','', regex=False)
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True
-    )
-
-cat_label = "category (IPCC2006)"
-# fix error cats
-data_if[cat_label] = data_if[cat_label].str.replace("error_", "")
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-# convert to mass units from CO2eq
-
-entities_to_convert = [f"{entity} ({gwp_to_use})" for entity in
-                       entities_to_convert_to_mass]
-
-for entity in entities_to_convert:
-    converted = data_pm2[entity].pr.convert_to_mass()
-    basic_entity = entity.split(" ")[0]
-    converted = converted.to_dataset(name=basic_entity)
-    data_pm2 = data_pm2.pr.merge(converted)
-    data_pm2[basic_entity].attrs["entity"] = basic_entity
-
-# drop the GWP data
-data_pm2 = data_pm2.drop_vars(entities_to_convert)
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + ".nc"),
-    encoding=encoding)

+ 0 - 335
UNFCCC_GHG_data/UNFCCC_reader/Mongolia/read_MNG_BUR2_from_pdf.py

@@ -1,335 +0,0 @@
-import camelot
-import primap2 as pm2
-import pandas as pd
-
-from UNFCCC_GHG_data.helper import (
-    downloaded_data_path,
-    extracted_data_path,
-    fix_rows,
-    process_data_for_country,
-)
-from config_MNG_BUR2 import (
-    inv_conf,
-    inv_conf_per_year,
-    inv_conf_per_entity,
-    coords_cols,
-    coords_defaults,
-    coords_terminologies,
-    coords_value_mapping,
-    filter_remove,
-    meta_data,
-    country_processing_step1,
-    gas_baskets,
-)
-
-# ###
-# configuration
-# ###
-
-input_folder = downloaded_data_path / "UNFCCC" / "Mongolia" / "BUR2"
-output_folder = extracted_data_path / "UNFCCC" / "Mongolia"
-
-if not output_folder.exists():
-    output_folder.mkdir()
-
-pdf_file = "20231112_NIR_MGL.pdf"
-output_filename = "MNG_BUR2_2023_"
-category_column = f"category ({coords_terminologies['category']})"
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# 1. Read in main tables
-# ###
-
-df_main = None
-for year in inv_conf_per_year.keys():
-    print("-" * 60)
-    print(f"Reading year {year}.")
-    print("-" * 60)
-    df_year = None
-    for page in inv_conf_per_year[year]["page_defs"].keys():
-        print(f"Reading table from page {page}.")
-        tables_inventory_original = camelot.read_pdf(
-            str(input_folder / pdf_file),
-            pages=page,
-            table_areas=inv_conf_per_year[year]["page_defs"][page]["area"],
-            columns=inv_conf_per_year[year]["page_defs"][page]["cols"],
-            flavor="stream",
-            split_text=True,
-        )
-        print("Reading complete.")
-
-        df_page = tables_inventory_original[0].df
-
-        if df_year is None:
-            df_year = df_page
-        else:
-            df_year = pd.concat(
-                [df_year, df_page],
-                axis=0,
-                join="outer",
-            ).reset_index(drop=True)
-
-    print(f"Concatenating all tables for {year}.")
-
-    # fix content that spreads across multiple rows
-    if "rows_to_fix" in inv_conf_per_year[year]:
-        for n_rows in inv_conf_per_year[year]["rows_to_fix"].keys():
-            print(f"Merge content for {n_rows=}")
-            df_year = fix_rows(
-                df_year,
-                rows_to_fix=inv_conf_per_year[year]["rows_to_fix"][n_rows],
-                col_to_use=0,
-                n_rows=n_rows,
-            )
-
-    df_header = pd.DataFrame([inv_conf["header"], inv_conf["unit"]])
-
-    skip_rows = 11
-    df_year = pd.concat(
-        [df_header, df_year[skip_rows:]], axis=0, join="outer"
-    ).reset_index(drop=True)
-
-    df_year = pm2.pm2io.nir_add_unit_information(
-        df_year,
-        unit_row=inv_conf["unit_row"],
-        entity_row=inv_conf["entity_row"],
-        regexp_entity=".*",
-        regexp_unit=".*",
-        default_unit="Gg",
-    )
-
-    print("Added unit information.")
-
-    # set index
-    df_year = df_year.set_index(inv_conf["index_cols"])
-
-    # convert to long format
-    df_year_long = pm2.pm2io.nir_convert_df_to_long(
-        df_year, year, inv_conf["header_long"]
-    )
-
-    # extract from tuple
-    df_year_long["orig_cat_name"] = df_year_long["orig_cat_name"].str[0]
-
-    # prep for conversion to PM2 IF and native format
-    # make a copy of the categories row
-    df_year_long["category"] = df_year_long["orig_cat_name"]
-
-    # replace cat names by codes in col "category"
-    # first the manual replacements
-
-    df_year_long["category"] = df_year_long["category"].replace(
-        inv_conf["cat_codes_manual"]
-    )
-
-    df_year_long["category"] = df_year_long["category"].str.replace(".", "")
-
-    # then the regex replacements
-    def repl(m):
-        return m.group("code")
-
-    df_year_long["category"] = df_year_long["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_year_long = df_year_long.reset_index(drop=True)
-
-    df_year_long["data"] = df_year_long["data"].str.replace(",", "")
-
-    # make sure all col headers are str
-    df_year_long.columns = df_year_long.columns.map(str)
-
-    df_year_long = df_year_long.drop(columns=["orig_cat_name"])
-
-    if df_main is None:
-        df_main = df_year_long
-    else:
-        df_main = pd.concat(
-            [df_main, df_year_long],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-### convert to interchange format ###
-print("Converting to interchange format.")
-df_main_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_main,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-### convert to primap2 format ###
-print("Converting to primap2 format.")
-data_main_pm2 = pm2.pm2io.from_interchange_format(df_main_IF)
-
-# ###
-# 2. Read in trend tables
-# ###
-
-df_trend = None
-for entity in inv_conf_per_entity.keys():
-    print("-" * 60)
-    print(f"Reading entity {entity}.")
-
-    df_entity = None
-
-    for page in inv_conf_per_entity[entity]["page_defs"].keys():
-        print(f"Reading page {page}.")
-
-        tables_inventory_original = camelot.read_pdf(
-            str(input_folder / pdf_file),
-            pages=page,
-            table_areas=inv_conf_per_entity[entity]["page_defs"][page]["area"],
-            columns=inv_conf_per_entity[entity]["page_defs"][page]["cols"],
-            flavor="stream",
-            split_text=True,
-        )
-        df_page = tables_inventory_original[0].df
-
-        if df_entity is None:
-            df_entity = df_page
-        else:
-            df_entity = pd.concat(
-                [df_entity, df_page],
-                axis=0,
-                join="outer",
-            ).reset_index(drop=True)
-        print(f"adding table from page {page}.")
-
-    if "rows_to_fix" in inv_conf_per_entity[entity]:
-        for n_rows in inv_conf_per_entity[entity]["rows_to_fix"].keys():
-            print(f"Merge content for {n_rows=}")
-            df_entity = fix_rows(
-                df_entity,
-                rows_to_fix=inv_conf_per_entity[entity]["rows_to_fix"][n_rows],
-                col_to_use=0,
-                n_rows=n_rows,
-            )
-
-    df_entity.columns = df_entity.iloc[0, :]
-    df_entity = df_entity[1:]
-
-    # unit is always Gg
-    df_entity.loc[:, "unit"] = inv_conf_per_entity[entity]["unit"]
-
-    # only one entity per table
-    df_entity.loc[:, "entity"] = entity
-
-    # TODO: Fix pandas "set value on slice of copy" warning
-    df_entity.loc[:, "category"] = df_entity.loc[
-        :, inv_conf_per_entity[entity]["category_column"]
-    ]
-
-    if "rows_to_drop" in inv_conf_per_entity[entity]:
-        for row in inv_conf_per_entity[entity]["rows_to_drop"]:
-            row_to_delete = df_entity.index[df_entity["category"] == row][0]
-            df_entity = df_entity.drop(index=row_to_delete)
-
-    df_entity.loc[:, "category"] = df_entity.loc[:, "category"].replace(
-        inv_conf_per_entity[entity]["cat_codes_manual"]
-    )
-
-    def repl(m):
-        return m.group("code")
-
-    df_entity.loc[:, "category"] = df_entity["category"].str.replace(
-        inv_conf["cat_code_regexp"], repl, regex=True
-    )
-
-    df_entity = df_entity.drop(columns=inv_conf_per_entity[entity]["columns_to_drop"])
-
-    for year in inv_conf_per_entity[entity]["years"]:
-        df_entity.loc[:, year] = df_entity[year].str.replace(",", "")
-
-    if df_trend is None:
-        df_trend = df_entity
-    else:
-        df_trend = pd.concat(
-            [df_trend, df_entity],
-            axis=0,
-            join="outer",
-        ).reset_index(drop=True)
-
-### convert to interchange format ###
-df_trend_IF = pm2.pm2io.convert_wide_dataframe_if(
-    data_wide=df_trend,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    # filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-)
-
-### convert to primap2 format ###
-print("Converting to primap2 format.")
-data_trend_pm2 = pm2.pm2io.from_interchange_format(df_trend_IF)
-
-# ###
-# Merge main and trend tables.
-# ###
-
-print("Merging main and trend table.")
-data_pm2 = data_main_pm2.pr.merge(data_trend_pm2, tolerance=1)
-
-# ###
-# Save raw data to IF and native format.
-# ###
-
-data_if = data_pm2.pr.to_interchange_format()
-
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_if,
-)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding,
-)
-
-# ###
-# Processing
-# ###
-
-data_proc_pm2 = process_data_for_country(
-    data_country=data_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    filter_dims=None,
-    cat_terminology_out=None,
-    category_conversion=None,
-    sectors_out=None,
-    processing_info_country=country_processing_step1,
-)
-
-# ###
-# save processed data to IF and native format
-# ###
-
-terminology_proc = coords_terminologies["category"]
-
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if
-)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"), encoding=encoding
-)
-
-print("Saved processed data.")

+ 0 - 67
UNFCCC_GHG_data/UNFCCC_reader/Montenegro/config_MNE_BUR3.py

@@ -1,67 +0,0 @@
-# most time series are contained twice and 2005 data is also contained twice. Some
-# data is inconsistent and we remove the time series with errors
-drop_data = {
-    2: { # individual sector time series are (mostly) wrong, leave only 0.EL timeseries
-        "cats": ["1", "1.A", "1.A.1", "1.A.1", "1.A.2", "1.A.3", "1.A.4", "1.A.5", "1.B", "1.B.1", "1.B.2",
-                 "2", "2.A", "2.B", "2.C", "2.D", "2.E", "2.F", "2.G", "2.H",
-                 "3", "3.A", "3.B"],
-        #"years": ["2005"], # 2005 data copy of 2019
-    },
-    3: { # individual sector time series are (mostly) wrong, leave only 0.EL timeseries
-        "cats": ["3.C", "3.D", "3.E", "3.F", "3.G", "5", "5.A", "5.B", "5.C", "5.D", "6"]
-        #"years": ["2005"],
-    },
-    6: { #2005 data copy of 2019
-        "years": ["2005"],
-    },
-    7: { # 2005 data copy of 2019 for 3.G
-        "years": ["2005"],
-    },
-    25: { # 2005 data copy of 2019 (CO2, 2005-2019, first table)
-        "years": ["2005"],
-    },
-    26: { # 2005 data copy of 2019 (CO2, 2005-2019, second table)
-        "years": ["2005"],
-    },
-}
-
-cat_mapping = {
-    '3': 'M.AG',
-    '3.A': '3.A.1',
-    '3.B': '3.A.2',
-    '3.C': '3.C.7', # rice
-    '3.D': 'M.3.C.45AG', # Agricultural soils
-    '3.E': '3.C.1.c', # prescribed burning of savanna
-    '3.F': '3.C.1.b', # field burning of agricultural residues
-    '3.G': '3.C.3', # urea application
-    '4': 'M.LULUCF',
-    '4.A': '3.B.1', # forest
-    '4.B': '3.B.2', # cropland
-    '4.C': '3.B.3', # grassland
-    '4.D': '3.B.4', # wetland
-    '4.E': '3.B.5', # Settlements
-    '4.F': '3.B.6', # other land
-    '4.G': '3.D.1', # HWP
-    '5': '4',
-    '5.A': '4.A',
-    '5.B': '4.B',
-    '5.C': '4.C',
-    '5.D': '4.D',
-    '6': '5',
-}
-
-aggregate_cats = {
-    '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-    '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.3', '3.B.4', '3.B.5', '3.B.6'], 'name': 'Land'},
-    'M.3.C.1.AG': {'sources': ['3.C.1.c', '3.C.1.b'], 'name': 'Emissions from Biomass '
-                                                          'Burning (Agriculture)'},
-    '3.C.1': {'sources': ['3.C.1.c', '3.C.1.b'], 'name': 'Emissions from Biomass Burning'},
-    '3.C': {'sources': ['3.C.1', '3.C.3', 'M.3.C.45AG', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-    'M.3.C.AG': {'sources': ['3.C.1.AG', '3.C.3', 'M.3.C.45AG', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
-    '3.D': {'sources': ['3.D.1'], 'name': 'Other'},
-    '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-    'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock emissions'},
-    '0': {'sources': ['1', '2', '3', '4', '5']},
-}

+ 0 - 286
UNFCCC_GHG_data/UNFCCC_reader/Montenegro/read_MNE_BUR3_from_pdf.py

@@ -1,286 +0,0 @@
-# Montenegro BUR 3
-# Code to read the emissions inventory contained in Montenegro's third BUR from pdf
-# and convert into PRIMAP2 format
-
-# ###
-# imports
-# ###
-import camelot
-import primap2 as pm2
-import pandas as pd
-from pathlib import Path
-import re
-import copy
-
-from config_MNE_BUR3 import drop_data, cat_mapping, aggregate_cats
-from primap2.pm2io._data_reading import matches_time_format
-
-# ###
-# configuration
-# ###
-
-# folders and files
-root_path = Path(__file__).parents[3].absolute()
-root_path = root_path.resolve()
-downloaded_data_path = root_path / "downloaded_data"
-extracted_data_path = root_path / "extracted_data"
-
-input_folder = downloaded_data_path / 'UNFCCC' / 'Montenegro' / 'BUR3'
-output_folder = extracted_data_path / 'UNFCCC' / 'Montenegro'
-output_filename = 'MNE_BUR3_2022_'
-compression = dict(zlib=True, complevel=9)
-
-inventory_file_pdf = 'NIR-2021_MNE_Finalversion.pdf'
-
-# reading and processing
-years_to_read = range(1990, 2018 + 1)
-pages_to_read = range(535,583)
-
-pos_entity = [0, 0]
-cat_code_col = 0
-cat_name_col = 1
-regex_unit = r"\((.*)\)"
-regex_entity = r"^(.*)\s\("
-
-gwp_to_use = 'AR4GWP100'
-
-# conversion to PRIMAP2 format
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_2006_MNE_Inv",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "MNE-GHG-inventory ",
-    "provenance": "measured",
-    "area": "MNE",
-    "scenario": "BUR3",
-}
-
-coords_value_mapping = {
-    'unit': 'PRIMAP1',
-    'entity': {
-        f"GHG ({gwp_to_use})": f"KYOTOGHG ({gwp_to_use})",
-        f"HFC ({gwp_to_use})": f"HFCS ({gwp_to_use})",
-        f"PFC ({gwp_to_use})": f"PFCS ({gwp_to_use})",
-    },
-    'category': {
-        'Total national GHG emissions (with LULUCF)': '0',
-        'Total national GHG emissions (without LULUCF)': 'M.0.EL',
-        'International Bunkers': 'M.BK',
-        '1.A.3.a.i': 'M.BK.A',
-        '1.A.3.d.i': 'M.BK.M',
-        'CO2 from Biomass Combustion for Energy Production': 'M.BIO',
-        '6 Other': '6',
-        '2 H': '2.H',
-    },
-}
-
-coords_value_filling = {
-    "category": {
-        "orig_cat_name": {
-            'International Bunkers': 'M.BK',
-        },
-    },
-}
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-filter_remove = {
-    "f1": {
-        "category": ["Memo items"],
-    },
-}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/461972",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Montenegro. Biennial update report (BUR). BUR 3. National inventory report.",
-    "comment": "Read fom pdf file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-# ###
-# Read all time series table from pdf
-# ###
-tables = camelot.read_pdf(str(input_folder / inventory_file_pdf), pages=','.join([str(page) for page in pages_to_read]), flavor='lattice')
-
-# ###
-# process tables and combine them using the pm2 pr.merge function
-# ###
-data_all = None
-for i, table in enumerate(tables):
-    df_current_table = table.df.copy(deep=True)
-    # get entity and unit
-    entity_unit = df_current_table.iloc[0, 0]
-    match = re.search(regex_unit, entity_unit)
-    unit = match.group(1)
-    match = re.search(regex_entity, entity_unit)
-    entity = match.group(1)
-    if "CO2 equivalent" in unit:
-        entity = f"{entity} ({gwp_to_use})"
-        unit_parts = unit.split(" ")
-        unit = f"{unit_parts[0]} CO2eq"
-
-    # remove "/n" from category UNFCCC_GHG_data and name columns
-    df_current_table.iloc[:, 0] = df_current_table.iloc[:, 0].str.replace("\n", "")
-    df_current_table.iloc[:, 1] = df_current_table.iloc[:, 1].str.replace("\n", "")
-
-    # fix header
-    df_current_table.iloc[0, 0] = "category"
-    df_current_table.iloc[0, 1] = "orig_cat_name"
-    df_current_table.columns = df_current_table.iloc[0]
-    df_current_table = df_current_table.drop(0, axis=0)
-
-    # remove ',' in numbers
-    years = df_current_table.columns[2:]
-    repl = lambda m: m.group('part1') + m.group('part2')
-    for year in years:
-        df_current_table.loc[:, year] = df_current_table.loc[:, year].str.replace(
-            '(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-
-    # add entity and unit cols
-    df_current_table["entity"] = entity
-    df_current_table["unit"] = unit
-
-    if i in drop_data:
-        to_drop = drop_data[i]
-        if "cats" in to_drop.keys():
-            mask = df_current_table["category"].isin(to_drop["cats"])
-            df_current_table = df_current_table.drop(df_current_table[mask].index,
-                                                     axis=0)
-        if "years" in to_drop.keys():
-            df_current_table = df_current_table.drop(columns=to_drop["years"])
-
-    df_current_table["category"] = df_current_table["category"].fillna(
-        value=df_current_table["orig_cat_name"])
-
-    df_current_table = df_current_table.drop(columns="orig_cat_name")
-
-    df_current_table_IF = pm2.pm2io.convert_wide_dataframe_if(
-        df_current_table,
-        coords_cols=coords_cols,
-        coords_defaults=coords_defaults,
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=coords_value_mapping,
-        filter_remove=filter_remove,
-        meta_data=meta_data,
-        convert_str=True,
-    )
-
-    current_table_pm2 = pm2.pm2io.from_interchange_format(df_current_table_IF)
-
-    if data_all is None:
-        data_all = current_table_pm2
-    else:
-        data_all = data_all.pr.merge(current_table_pm2, tolerance=0.001)
-
-    print(f"{entity}, {unit}: {years[0]}-{years[-1]}")
-
-# ###
-# postprocessing
-# ###
-
-# convert to mass units from CO2eq
-entities_to_convert = ['N2O', 'SF6', 'CH4']
-entities_to_convert = [f"{entity} ({gwp_to_use})" for entity in entities_to_convert]
-
-# for entity in entities_to_convert:
-#     converted = data_all[entity].pr.convert_to_mass()
-#     basic_entity = entity.split(" ")[0]
-#     converted = converted.to_dataset(name=basic_entity)
-#     data_all = data_all.pr.merge(converted)
-#     data_all[basic_entity].attrs["entity"] = basic_entity
-#
-# # drop the GWP data
-# data_all = data_all.drop_vars(entities_to_convert)
-
-# convert back to IF
-data_if = data_all.pr.to_interchange_format()
-
-# ###
-# convert to IPCC2006 categories
-# ###
-data_if_2006 = copy.deepcopy(data_if)
-data_if_2006.attrs = copy.deepcopy(data_if.attrs)
-
-# map categories
-data_if_2006 = data_if_2006.replace(
-    {f"category ({coords_terminologies['category']})": cat_mapping})
-data_if_2006[f"category ({coords_terminologies['category']})"].unique()
-
-# rename the category col
-data_if_2006.rename(columns={
-    f"category ({coords_terminologies['category']})": 'category (IPCC2006_PRIMAP)'},
-                    inplace=True)
-data_if_2006.attrs['attrs']['cat'] = 'category (IPCC2006_PRIMAP)'
-data_if_2006.attrs['dimensions']['*'] = [
-    'category (IPCC2006_PRIMAP)' if item == f"category ({coords_terminologies['category']})"
-    else item for item in data_if_2006.attrs['dimensions']['*']]
-# aggregate categories
-for cat_to_agg in aggregate_cats:
-    mask = data_if_2006["category (IPCC2006_PRIMAP)"].isin(
-        aggregate_cats[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-    # print(df_test)
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, "category (IPCC2006_PRIMAP)", cat_to_agg)
-        # df_combine.insert(1, "cat_name_translation", aggregate_cats[cat_to_agg]["name"])
-        # df_combine.insert(2, "orig_cat_name", "computed")
-
-        df_combine = df_combine.reset_index()
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine], axis=0, join='outer')
-        data_if_2006 = data_if_2006.reset_index(drop=True)
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-
-# convert back to IF to have units in the fixed format
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-
-# data in original categories
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-encoding = {var: compression for var in data_all.data_vars}
-data_all.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)
-
-# data in 2006 categories
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + "IPCC2006_PRIMAP"), data_if_2006)
-
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(output_folder / (output_filename + "IPCC2006_PRIMAP" + ".nc"), encoding=encoding)

+ 0 - 141
UNFCCC_GHG_data/UNFCCC_reader/Morocco/config_MAR_BUR3.py

@@ -1,141 +0,0 @@
-# define which raw tables to combine
-table_defs = {
-    2010: {
-        'Energy': [0, 1],
-        'Agriculture': [10],
-        'IPPU': [15, 16, 17],
-        'LULUCF': [30],
-        'Waste': [35],
-    },
-    2012: {
-        'Energy': [2, 3],
-        'Agriculture': [11],
-        'IPPU': [18, 19, 20],
-        'LULUCF': [31],
-        'Waste': [36],
-    },
-    2014: {
-        'Energy': [4, 5],
-        'Agriculture': [10],
-        'IPPU': [21, 22, 23],
-        'LULUCF': [32],
-        'Waste': [37],
-    },
-    2016: {
-        'Energy': [6, 7],
-        'Agriculture': [10],
-        'IPPU': [24, 25, 26],
-        'LULUCF': [33],
-        'Waste': [38],
-    },
-    2018: {
-        'Energy': [8, 9],
-        'Agriculture': [14],
-        'IPPU': [27, 28, 29],
-        'LULUCF': [34],
-        'Waste': [39],
-    },
-}
-
-header_defs = {
-    'Energy': [['Catégories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'COVNM', 'SO2'],
-        ['', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg']],
-    'Agriculture': [['Catégories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'COVNM', 'SO2'],
-        ['', 'Gg', 'GgCO2eq', 'GgCO2eq', 'Gg', 'Gg', 'Gg', 'Gg']], # units are wrong
-    # in BUR pdf
-    'IPPU': [['Catégories', 'CO2', 'CH4', 'N2O', 'HFCs', 'PFCs', 'SF6', 'NOx', 'CO', 'COVNM', 'SO2'],
-        ['', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'Gg', 'Gg', 'Gg', 'Gg']],
-    'LULUCF': [['Catégories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'COVNM', 'SO2'],
-        ['', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'Gg', 'Gg', 'Gg', 'Gg']],
-    'Waste': [['Catégories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'COVNM', 'SO2'],
-        ['', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'Gg', 'Gg', 'Gg', 'Gg']],
-}
-
-remove_cats = ['3.A.4', '3.B', '3.B.4', '1.B.2.a', '1.B.2.b', '1.B.2.c']
-
-cat_mapping = {
-    "1.B.2.a.4": "1.B.2.a.iii.4",
-    "1.B.2.a.5": "1.B.2.a.iii.5",
-    "1.B.2.a.6": "1.B.2.a.iii.6",
-    "1.B.2.b.2": "1.B.2.b.iii.2",
-    "1.B.2.b.4": "1.B.2.b.iii.4",
-    "1.B.2.b.5": "1.B.2.b.iii.5",
-    "1.B.2.b.6": "1.B.2.b.iii.6",
-    "1.B.2.c.1": "1.B.2.b.i", # simplification, split to oil and gas ("1.B.2.X.i")
-    "1.B.2.c.2": "1.B.2.b.ii", # simplification, split to oil and gas ("1.B.2.X.ii")
-    '1.A.2.g': '1.A.2.m', # other industry
-    '3.A': '3.A.1', # enteric fermentation
-    '3.A.1': '3.A.1.a', # cattle
-    '3.A.1.a': '3.A.1.a.i',
-    '3.A.1.b': '3.A.1.a.ii',
-    '3.A.2': '3.A.1.c',
-    '3.A.3': '3.A.1.h', # Swine
-    '3.A.4.a': '3.A.1.d', # goats
-    '3.A.4.b': '3.A.1.e', # camels
-    '3.A.4.c': '3.A.1.f', # horses
-    '3.A.4.d': '3.A.1.g', # Mules and asses
-    '3.A.4.e': '3.A.1.i', # poultry
-#    '3.B': '3.A.2', # Manure Management
-    '3.B.1': '3.A.2.a', # cattle
-    '3.B.1.a': '3.A.2.a.i',
-    '3.B.1.b': '3.A.2.a.ii',
-    '3.B.2': '3.A.2.c', # Sheep
-    '3.B.3': '3.A.2.h', # Swine
-    '3.B.4.a': '3.A.2.d', # Goats
-    '3.B.4.b': '3.A.2.e', # Camels
-    '3.B.4.c': '3.A.2.f', # Horses
-    '3.B.4.d': '3.A.2.g', # Mules and Asses
-    '3.B.4.e': '3.A.2.i', # Poultry
-    '3.B.5': '3.C.6', # indirect N2O from manure management
-    '3.C': '3.C.7', # rice
-    '3.D': 'M.3.C.45AG', # Agricultural soils
-    '3.D.a': '3.C.4', #direct N2O from agri soils
-    '3.D.a.1': '3.C.4.a', # inorganic fertilizers
-    '3.D.a.2': '3.C.4.b', # organic fertilizers
-    '3.D.a.3': '3.C.4.c', # urine and dung by grazing animals
-    '3.D.a.4': '3.C.4.d', # N in crop residues
-    '3.D.b': '3.C.5', # indirect N2O from managed soils
-    '3.D.b.1': '3.C.5.a', # Atmospheric deposition
-    '3.D.b.2': '3.C.5.b', # nitrogen leeching and runoff
-    '3.H': '3.C.3', # urea application
-    'LU.3.B.1': '3.B.1', # forest
-    'LU.3.B.2': '3.B.2', # cropland
-    'LU.3.B.3': '3.B.3', # grassland
-    'LU.3.B.4': '3.B.4', # wetland
-    'LU.3.B.5': '3.B.5', # Settlements
-    'LU.3.B.6': '3.B.6', # other land
-}
-
-aggregate_cats = {
-    '1.B.2.a.iii': {'sources': ['1.B.2.a.iii.4', '1.B.2.a.iii.5', '1.B.2.a.iii.6'],
-                    'name': 'All Other'},
-    '1.B.2.b.iii': {'sources': ['1.B.2.b.iii.2', '1.B.2.b.iii.4', '1.B.2.b.iii.5',
-                                '1.B.2.b.iii.6',],
-                    'name': 'All Other'},
-    '1.B.2.a': {'sources': ['1.B.2.a.iii'], 'name': 'Oil'},
-    '1.B.2.b': {'sources': ['1.B.2.b.i', '1.B.2.b.ii', '1.B.2.b.iii'],
-                'name': 'Natural Gas'},
-    '2.D':  {'sources': ['2.D.4'], 'name': 'Non-Energy Products from Fuels and Solvent Use'},
-    '2.F.1':  {'sources': ['2.F.1.a', '2.F.1.b'], 'name': 'Refrigeration and Air Conditioning'},
-    '2.F':  {'sources': ["2.F.1", "2.F.2", "2.F.3", "2.F.4", "2.F.5", "2.F.6"],
-             'name': 'Product uses as Substitutes for Ozone Depleting Substances'},
-    '2.H':  {'sources': ["2.H.1", "2.H.2", "2.H.3"], 'name': 'Other'},
-    '3.A.2': {'sources': ['3.A.2.a', '3.A.2.c', '3.A.2.d', '3.A.2.e', '3.A.2.f',
-                          '3.A.2.g', '3.A.2.h', '3.A.2.i'],
-              'name': 'Manure Management'},
-    '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-    '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.3', '3.B.4', '3.B.5', '3.B.6'], 'name': 'Land'},
-    '3.C': {'sources': ['3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-    'M.3.C.AG': {'sources': ['3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
-    'M.AG': {'sources': ['3.A', 'M.3.C.AG'], 'name': 'Agriculture'},
-    '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-    'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock emissions'},
-    '4': {'sources': ['4.A', '4.D'], 'name': 'Waste'},
-    '0': {'sources': ['1', '2', '3', '4']},
-    'M.0.EL': {'sources': ['1', '2', 'M.AG', '4']},
-}
-
-zero_cats = ['1.B.2.a.i', '1.B.2.a.ii'] # venting and flaring with 0 for oil as
-# all mapped to natural gas

+ 0 - 324
UNFCCC_GHG_data/UNFCCC_reader/Morocco/read_MAR_BUR3_from_pdf.py

@@ -1,324 +0,0 @@
-# this script reads data from Morocco's BUR3
-# Data is read from pdf
-
-import camelot
-import primap2 as pm2
-import pandas as pd
-import copy
-
-from config_MAR_BUR3 import zero_cats, cat_mapping, aggregate_cats, remove_cats, \
-    table_defs, header_defs
-from primap2.pm2io._data_reading import matches_time_format, filter_data
-from UNFCCC_GHG_data.helper import extracted_data_path, downloaded_data_path
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Morocco' / 'BUR3'
-output_folder = extracted_data_path / 'UNFCCC' / 'Morocco'
-output_filename = 'MAR_BUR3_2022_'
-inventory_file = 'Morocco_BUR3_Fr.pdf'
-gwp_to_use = 'AR4GWP100'
-
-# years to read
-years = [2010, 2012, 2014, 2016, 2018]
-pages_to_read = range(104, 138)
-
-compression = dict(zlib=True, complevel=9)
-
-# special header as category UNFCCC_GHG_data and name in one column
-header_long = ["orig_cat_name", "entity", "unit", "time", "data"]
-
-index_cols = ['Catégories']
-
-# rows to remove
-cats_remove = [
-    'Agriculture' # always empty
-]
-
-# manual category codes
-cat_codes_manual = {
-    '1.A.2.e -Industries agro-alimentaires et du tabac': '1.A.2.e',
-    '1.A.2.f -Industries des minéraux non- métalliques': '1.A.2.f',
-    #'Agriculture': 'M.AG',
-    '2. PIUP': '2',
-    'UTCATF': 'M.LULUCF',
-    '3.B.1 Terres forestières': 'LU.3.B.1',
-    '3.B.2 Terres cultivées': 'LU.3.B.2',
-    '3.B.3 Prairies': 'LU.3.B.3',
-    '3.B.4 Terres humides': 'LU.3.B.4',
-    '3.B.5 Etablissements': 'LU.3.B.5',
-    '3.B.6 Autres terres': 'LU.3.B.6',
-    '1.B.1.a.i.1 -Exploitation minière': '1.A.1.a.i.1',
-}
-
-cat_code_regexp = r'(?P<UNFCCC_GHG_data>^[a-zA-Z0-9\.]{1,14})\s-\s.*'
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_2006_MAR_Inv",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "MAR-GHG-inventory ",
-    "provenance": "measured",
-    "area": "MAR",
-    "scenario": "BUR3"
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "entity": {
-        'HFCs (AR4GWP100)': 'HFCS (AR4GWP100)',
-        'PFCs (AR4GWP100)': 'PFCS (AR4GWP100)',
-        'COVNM': 'NMVOC',
-    }
-}
-
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit"
-}
-
-#add_coords_cols = {
-#    "orig_cat_name": ["orig_cat_name", "category"],
-#}
-
-filter_remove = {
-    "f1": {
-        "entity": ['Other halogenated gases without CO2 equivalent conversion factors (2)'],
-    },
-}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/470340",
-    "rights": "XXXX",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Morocco. Biennial update report (BUR). BUR 3.",
-    "comment": "Read fom pdf file by Johannes Gütschow.",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-##### read the raw data from pdf #####
-tables = camelot.read_pdf(
-    str(input_folder / inventory_file),
-    pages=','.join([str(page) for page in pages_to_read]),
-    flavor='lattice')
-
-##### combine tables and convert to long format #####
-df_all = None
-for year in table_defs.keys():
-    current_def = table_defs[year]
-    for sector in current_def.keys():
-        sector_tables = current_def[sector]
-        # print(f"{year}, {sector}")
-        df_first = tables[sector_tables[0]].df
-        if len(sector_tables) > 1:
-            for table in sector_tables[1:]:
-                df_this_table = pd.concat([df_first, tables[table].df], axis=0,
-                                          join='outer')
-        else:
-            df_this_table = df_first
-
-        # fix the header
-        df_this_table = df_this_table.drop(df_this_table.iloc[0:2].index)
-        df_this_table.columns = header_defs[sector]
-
-        # fix 2018 agri table
-        if (year == 2018) & (sector == "Agriculture"):
-            last_shift_row = 25
-            df_temp = df_this_table.iloc[0: last_shift_row, 1:].copy()
-            df_this_table.iloc[0, 1:] = ''
-            df_this_table.iloc[1: last_shift_row + 1, 1:] = df_temp
-
-        # replace line breaks, long hyphens, double, and triple spaces in category names
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("\n", " ")
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("   ", " ")
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("  ", " ")
-        df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("–", "-")
-
-        # set index and convert to long format
-        df_this_table = df_this_table.set_index(index_cols)
-        df_this_table_long = pm2.pm2io.nir_convert_df_to_long(df_this_table, year,
-                                                              header_long)
-
-        # print(df_this_table_long.head())
-        if df_all is None:
-            df_all = df_this_table_long
-        else:
-            df_all = pd.concat([df_all, df_this_table_long], axis=0, join='outer')
-
-df_all = df_all.reset_index(drop=True)
-
-##### conversion to PRIMAP2 interchange format #####
-# drop the rows with memo items etc
-for cat in cats_remove:
-    df_all = df_all.drop(df_all[df_all["orig_cat_name"] == cat].index)
-
-# make a copy of the categories row
-df_all["category"] = df_all["orig_cat_name"]
-
-# replace cat names by codes in col "category"
-# first the manual replacements
-df_all["category"] = df_all["category"].replace(cat_codes_manual)
-# then the regex replacements
-repl = lambda m: m.group('UNFCCC_GHG_data')
-df_all["category"] = df_all["category"].str.replace(cat_code_regexp, repl, regex=True)
-df_all = df_all.reset_index(drop=True)
-
-# prepare numbers for pd.to_numeric
-df_all.loc[:, "data"] = df_all.loc[:, "data"].str.replace(' ', '')
-repl = lambda m: m.group('part1') + '.' +  m.group('part2')
-df_all.loc[:, 'data'] = df_all.loc[:, 'data'].str.replace(
-    '(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-df_all['data'][df_all['data'].isnull()] = 'NaN'
-
-# add GWP information to entity
-for entity in df_all["entity"].unique():
-    df_all["entity"][(df_all["entity"] == entity) & (
-                df_all["unit"] == "GgCO2eq")] = f"{entity} ({gwp_to_use})"
-
-# drop "original_cat_name" as it has non-unique values per category
-df_all = df_all.drop(columns="orig_cat_name")
-
-data_if = pm2.pm2io.convert_long_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True
-)
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-# conversion to PRIMAP2 native format
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-
-entities_to_convert = ['CO2'] #['N2O', 'SF6', 'CO2', 'CH4'] # CO2 is not converted on
-# conversion to IF as data with and without GWP exists. needs to be fixed in primap2
-entities_to_convert = [f"{entity} (AR4GWP100)" for entity in entities_to_convert]
-
-# convert GWP units to mass units
-for entity in entities_to_convert:
-    converted = data_pm2[entity].pr.convert_to_mass()
-    basic_entity = entity.split(" ")[0]
-    converted = converted.to_dataset(name=basic_entity)
-    data_pm2 = data_pm2.pr.merge(converted)
-    data_pm2[basic_entity].attrs["entity"] = basic_entity
-
-# drop the GWP data
-data_pm2 = data_pm2.drop_vars(entities_to_convert)
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# convert to IPCC2006 categories
-# ###
-data_if_2006 = copy.deepcopy(data_if)
-data_if_2006.attrs = copy.deepcopy(data_if.attrs)
-
-filter_remove_cats = {
-    "cat": {
-        f"category ({coords_terminologies['category']})":
-    remove_cats
-    },
-}
-
-filter_data(data_if_2006, filter_remove=filter_remove_cats)
-
-# map categories
-data_if_2006 = data_if_2006.replace(
-    {f"category ({coords_terminologies['category']})": cat_mapping})
-data_if_2006[f"category ({coords_terminologies['category']})"].unique()
-
-# rename the category col
-data_if_2006.rename(columns={
-    f"category ({coords_terminologies['category']})": 'category (IPCC2006_PRIMAP)'},
-                    inplace=True)
-data_if_2006.attrs['attrs']['cat'] = 'category (IPCC2006_PRIMAP)'
-data_if_2006.attrs['dimensions']['*'] = [
-    'category (IPCC2006_PRIMAP)' if item == f"category ({coords_terminologies['category']})"
-    else item for item in data_if_2006.attrs['dimensions']['*']]
-# aggregate categories
-time_format = '%Y'
-time_columns = [
-    col
-    for col in data_if_2006.columns.values
-    if matches_time_format(col, time_format)
-]
-
-for cat_to_agg in aggregate_cats:
-    mask = data_if_2006["category (IPCC2006_PRIMAP)"].isin(
-        aggregate_cats[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-    # print(df_test)
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, "category (IPCC2006_PRIMAP)", cat_to_agg)
-        # df_combine.insert(1, "cat_name_translation", aggregate_cats[cat_to_agg]["name"])
-        # df_combine.insert(2, "orig_cat_name", "computed")
-
-        df_combine = df_combine.reset_index()
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine], axis=0, join='outer')
-        data_if_2006 = data_if_2006.reset_index(drop=True)
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-for cat in zero_cats:
-    entities = data_if_2006["entity"].unique()
-    data_zero = data_if_2006[data_if_2006["category (IPCC2006_PRIMAP)"]=="1"].copy(
-        deep=True)
-    data_zero["category (IPCC2006_PRIMAP)"] = cat
-    for col in time_columns:
-        data_zero[col] = 0
-
-    data_if_2006 = pd.concat([data_if_2006, data_zero])
-
-# conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-
-# convert back to IF to have units in the fixed format
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-
-# data in original categories
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + ".nc"),
-    encoding=encoding)
-
-# data in 2006 categories
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + "IPCC2006_PRIMAP"), data_if_2006)
-
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(
-    output_folder / (output_filename + "IPCC2006_PRIMAP" + ".nc"), encoding=encoding)

+ 0 - 458
UNFCCC_GHG_data/UNFCCC_reader/Nigeria/config_NGA_BUR2.py

@@ -1,458 +0,0 @@
-gwp_to_use = 'AR5GWP100'
-
-tables_trends = {
-    '70': { # GHG by main sector
-        'page': '70',
-        'area': ['177,430,450,142'],
-        'cols': ['208,260,311,355,406'],
-        'coords_defaults': {
-            'unit': 'GgCO2eq',
-        },
-        'coords_cols': {
-            "category": "Year",
-            "entity": "entity",
-        },
-        'copy_cols': {
-            # to: from
-            'entity': 'Year',
-        },
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total emissions': '0',
-                'Energy': '1',
-                'IPPU': '2',
-                'AFOLU': '3',
-                'Waste': '4',
-            },
-            'entity': {
-                'Total emissions': f'KYOTOGHG emissions ({gwp_to_use})',
-                'Energy': f'KYOTOGHG ({gwp_to_use})',
-                'IPPU': f'KYOTOGHG ({gwp_to_use})',
-                'AFOLU': f'KYOTOGHG emissions ({gwp_to_use})',
-                'Waste': f'KYOTOGHG ({gwp_to_use})',
-            },
-        },
-        'label_rows': [0, 1, 2],
-    },
-    '71': { # main gases by sector
-    'page': '71',
-        'area': ['82,760,509,454'],
-        'cols': ['124,186,249,326,388,454'],
-        'coords_defaults': {
-            'category': '0',
-            'unit': 'GgCO2eq',
-        },
-        'coords_cols': {
-            "entity": "Year",
-        },
-        'remove_cols': [],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'entity': {
-                'Total GHG emissions (CO₂-eq)': f'KYOTOGHG emissions ({gwp_to_use})',
-                'Removals (CO₂) (CO₂-eq)': 'CO2 removals',
-                'Net emissions (CO₂-eq)': f'KYOTOGHG ({gwp_to_use})',
-                'CO₂ (Gg)': 'CO2 emissions',
-                'CH₄ (CO₂-eq)': f'CH4 ({gwp_to_use})',
-                'N₂O (CO₂-eq)': f'N2O ({gwp_to_use})',
-            },
-        },
-        'label_rows':  [0, 1, 2, 3, 4],
-    },
-    '72_1': { # CO2 by main sector
-    'page': '72',
-        'area': ['122,760,496,472'],
-        'cols': ['159,212,265,311,355,406,456'],
-        'coords_defaults': {
-            #'entity': 'CO2',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-            'entity': 'entity',
-        },
-        'remove_cols': ['Total emissions'],
-        'copy_cols': {
-            # to: from
-            'entity': 'Year',
-        },
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total net emissions': '0',
-                'Energy': '1',
-                'IPPU': '2',
-                'AFOLU - emissions': '3',
-                'AFOLU - removals': '3',
-                'Waste': '4',
-            },
-            'entity': {
-                'Total net emissions': 'CO2',
-                'Energy': 'CO2',
-                'IPPU': 'CO2',
-                'AFOLU - emissions': 'CO2 emissions',
-                'AFOLU - removals': 'CO2 removals',
-                'Waste': 'CO2',
-            },
-        },
-        'label_rows':  [0, 1, 2],
-    },
-    '72_2': { # CH4 by sector
-    'page': '72',
-        'area': ['133,333,483,41'],
-        'cols': ['172,230,280,333,384,439'],
-        'coords_defaults': {
-            'entity': 'CH4',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-        },
-        'remove_cols': ['Total (Gg CO₂-eq)'],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total': '0',
-                'Energy': '1',
-                'IPPU': '2',
-                'AFOLU - emissions': '3',
-                'Waste': '4',
-            },
-        },
-        'label_rows':  [0, 1, 2],
-    },
-    '73': { # N2O by sector
-    'page': '73',
-        'area': ['155,666,643,364'],
-        'cols': ['194,265,309,366,419'],
-        'coords_defaults': {
-            'entity': 'N2O',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-        },
-        'remove_cols': ['Total emissions (Gg CO₂-eq)'],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total': '0',
-                'Energy': '1',
-                'AFOLU': '3',
-                'Waste': '4',
-            },
-        },
-        'label_rows':  [0, 1, 2],
-    },
-    '74': { # NOx by sector
-    'page': '74',
-        'area': ['148,457,467,166'],
-        'cols': ['190,254,304,359,421'],
-        'coords_defaults': {
-            'entity': 'NOX',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-        },
-        #'remove_cols': [],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total emissions': '0',
-                'Energy': '1',
-                'IPPU': '2',
-                'AFOLU': '3',
-                'Waste': '4',
-            },
-        },
-        'label_rows':  [0, 1, 2],
-    },
-    '75': { # CO by sector
-    'page': '75',
-        'area': ['161,763,456,472'],
-        'cols': ['199,256,307,359,410'],
-        'coords_defaults': {
-            'entity': 'CO',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-        },
-        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total emissions': '0',
-                'Energy': '1',
-                'IPPU': '2',
-                'AFOLU': '3',
-                'Waste': '4',
-            },
-        },
-        'label_rows':  [0, 1, 2],
-    },
-    '75_2': { # NMVOC by sector
-    'page': '75',
-        'area': ['177,325,441,50'],
-        'cols': ['219,287,340,395'],
-        'coords_defaults': {
-            'entity': 'NMVOC',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-        },
-        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total emissions': '0',
-                'Energy': '1',
-                'IPPU': '2',
-                'Waste': '4',
-            },
-        },
-        'label_rows':  [0, 1, 2],
-    },
-    '76_1': { # NMVOC by sector
-    'page': '76',
-        'area': ['175,782,448,675'],
-        'cols': ['216,282,340,390'],
-        'coords_defaults': {
-            'entity': 'NMVOC',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-        },
-        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total emissions': '0',
-                'Energy': '1',
-                'IPPU': '2',
-                'Waste': '4',
-            },
-        },
-        'label_rows':  [0, 1, 2],
-    },
-    '76_2': { # SO2 by sector
-    'page': '76',
-        'area': ['197,562,421,226'],
-        'cols': ['243,331,381'],
-        'coords_defaults': {
-            'entity': 'SO2',
-            'unit': 'Gg',
-        },
-        'coords_cols': {
-            "category": "Year",
-        },
-        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
-        'coords_value_mapping': {
-            "unit": "PRIMAP1",
-            'category': {
-                'Total emissions': '0',
-                'Energy': '1',
-                'Waste': '4',
-            },
-        },
-        'label_rows':  [0],
-    },
-}
-
-pages_inventory = {
-    '78': 1,
-    '79': 0,
-    '80': 0,
-    '81': 0,
-    '82': 0,
-}
-
-year_inventory = 2017
-entity_row = 1
-unit_row = 0
-
-
-###
-index_cols = "Categories"
-units_inv = {
-    'Emissions (Gg)': 'Gg',
-    'Emissions CO2 Equivalents (Gg)': 'GgCO2eq',
-}
-# special header as category UNFCCC_GHG_data and name in one column
-header_long = ["category", "entity", "unit", "time", "data"]
-
-
-# manual category codes
-cat_codes_manual = {
-    'Total National Emissions and Removals': '0',
-    'International Bunkers': 'M.BK',
-}
-
-cat_code_regexp = r'(?P<code>^[a-zA-Z0-9\.]{1,9})\s.*'
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-# add_coords_cols = {
-#     "orig_cat_name": ["orig_cat_name", "category"],
-# }
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "NGA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "NGA",
-    "scenario": "BUR2",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-    "entity": {
-        'Net CO2 (1)(2)': 'CO2',
-        'CH4': f"CH4",
-        'N2O': f"N2O",
-        'HFCs': f"HFCS ({gwp_to_use})",
-        'PFCs': f"PFCS ({gwp_to_use})",
-        'SF6': f"SF6 ({gwp_to_use})",
-        #'NOx': 'NOX',
-        'CO': 'CO', # no mapping, just added for completeness here
-        'NMVOCs': 'NMVOC',
-        'SO2': 'SO2', # no mapping, just added for completeness here
-        'Other halogenated gases with CO2 eq conversion factors (3)':
-            f"UnspMixOfHFCs ({gwp_to_use})",
-    },
-}
-
-
-filter_remove = {
-    'f1': {
-        'entity': ['Other halogenated gases without CO2 eq conversion factors (4)']
-    },
-    'f2': {
-        'category': 'Memo'
-    },
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/307085",
-    "rights": "",
-    "contact": "mail@johannes-guestchow.de",
-    "title": "Nigeria. Second Biennial Update Report (BUR2) to the United Nations "
-             "Framework Convention on Climate Change",
-    "comment": "Read fom pdf by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
-
-# convert to mass units where possible
-entities_to_convert_to_mass = [
-    'CH4', 'N2O', 'SF6'
-]
-
-# CO2 equivalents don't make sense for these substances, so unit has to be Gg instead of Gg CO2 equivalents as indicated in the table
-entities_to_fix_unit = [
-    'NOx', 'CO', 'NMVOCs', 'SO2'
-]
-
-### processing
-
-processing_info_step1 = {
-    'aggregate_cats': {
-        '2.F': {'sources': ['2.F.2', '2.F.6'], # all 0, but for completeness
-              'name': 'Product uses as Substitutes for Ozone Depleting Substances'},
-        '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G'],
-              'name': 'IPPU'}, # for HFCs, PFCs, SO2, SF6, N2O (all 0)
-    },
-}
-
-processing_info_step2 =  {
-    'aggregate_cats': {
-        'M.AG.ELV': {'sources': ['3.C'], 'name': 'Agriculture excluding livestock emissions'},
-        'M.AG': {'sources': ['M.AG.ELV', '3.A'], 'name': 'Agriculture'},
-        'M.LULUCF': {'sources': ['3.B', '3.D'],
-                     'name': 'Land Use, Land Use Change, and Forestry'},
-        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'], 'name': 'National Total Excluding LULUCF'},
-        '0': {'sources': ['1', '2', '3', '4', '5'], 'name': 'National Total'},
-    },
-    'downscale': {
-        'sectors': {
-            '1': {
-                'basket': '1',
-                'basket_contents': ['1.A', '1.B', '1.C'],
-                'entities': ['CO2', 'N2O', 'CH4'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            '1.A': {
-                'basket': '1.A',
-                'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                'entities': ['CO2', 'N2O', 'CH4'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            '1.B': {
-                'basket': '1.B',
-                'basket_contents': ['1.B.1', '1.B.2', '1.B.3'],
-                'entities': ['CO2', 'N2O', 'CH4'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            'IPPU': {
-                'basket': '2',
-                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.E',
-                                    '2.F', '2.G', '2.H'],
-                'entities': ['CO2', 'N2O', 'CH4'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            '3': {
-                'basket': '3',
-                'basket_contents': ['3.A', '3.B', '3.C', '3.D'],
-                'entities': ['CO2', 'CH4', 'N2O'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            # '3A': {
-            #     'basket': '3.A',
-            #     'basket_contents': ['3.A.1', '3.A.2'],
-            #     'entities': ['CH4', 'N2O'],
-            #     'dim': 'category (IPCC2006_PRIMAP)',
-            # },
-            # '3C': {
-            #     'basket': '3.C',
-            #     'basket_contents': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
-            #                         '3.C.6', '3.C.7', '3.C.8'],
-            #     'entities': ['CO2', 'CH4', 'N2O'],
-            #     'dim': 'category (IPCC2006_PRIMAP)',
-            # },
-            # '3D': {
-            #     'basket': '3.D',
-            #     'basket_contents': ['3.D.1', '3.D.2'],
-            #     'entities': ['CO2', 'CH4', 'N2O'],
-            #     'dim': 'category (IPCC2006_PRIMAP)',
-            # },
-        },
-    },
-    'remove_ts': {
-        'fgases': { # unnecessary and complicates aggregation for
-            # other gases
-            'category': ['5'],
-            'entities': [f'HFCS ({gwp_to_use})', f'PFCS ({gwp_to_use})', 'SF6',
-                         f'UnspMixOfHFCs ({gwp_to_use})'],
-        },
-    },
-    'basket_copy': {
-        'GWPs_to_add': ["SARGWP100", "AR4GWP100", "AR6GWP100"],
-        'entities': ["HFCS", "PFCS", "UnspMixOfHFCs"],
-        'source_GWP': gwp_to_use,
-    },
-}

+ 0 - 260
UNFCCC_GHG_data/UNFCCC_reader/Nigeria/read_NGA_BUR2_from_pdf.py

@@ -1,260 +0,0 @@
-# this script reads data from Nigeria's BUR2
-# Data is read from the pdf file
-
-import pandas as pd
-import primap2 as pm2
-import xarray as xr
-import numpy as np
-import camelot
-import locale
-from copy import deepcopy
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from UNFCCC_GHG_data.helper import process_data_for_country, gas_baskets
-from config_NGA_BUR2 import tables_trends
-from config_NGA_BUR2 import pages_inventory, year_inventory, entity_row, unit_row, \
-   index_cols, header_long, units_inv
-from config_NGA_BUR2 import cat_code_regexp, cat_codes_manual
-from config_NGA_BUR2 import coords_cols, coords_defaults, coords_terminologies, \
-    coords_value_mapping, meta_data, filter_remove #, add_coords_cols
-from config_NGA_BUR2 import processing_info_step1, processing_info_step2
-
-# ###
-# configuration
-# ###
-# define locale to use for str to float conversion
-locale_to_use = 'en_NG.UTF-8'
-locale.setlocale(locale.LC_NUMERIC, locale_to_use)
-
-input_folder = downloaded_data_path / 'UNFCCC' / 'Nigeria' / 'BUR2'
-output_folder = extracted_data_path / 'UNFCCC' / 'Nigeria'
-if not output_folder.exists():
-   output_folder.mkdir()
-
-output_filename = 'NGA_BUR2_2021_'
-compression = dict(zlib=True, complevel=9)
-inventory_file = 'NIGERIA_BUR_2_-_Second_Biennial_Update_Report_%28BUR2%29.pdf'
-
-## read 2019 inventory
-df_inventory = None
-for page in pages_inventory.keys():
-    tables = camelot.read_pdf(str(input_folder / inventory_file), pages=str(page),
-                              flavor='lattice')
-    df_this_table = tables[pages_inventory[page]].df
-    # replace line breaks, double, and triple spaces in category names
-    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("\n", " ")
-    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("   ", " ")
-    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("  ", " ")
-    # replace line breaks in units and entities
-    df_this_table.iloc[entity_row] = df_this_table.iloc[entity_row].str.replace('\n',
-                                                                                '')
-    df_this_table.iloc[unit_row] = df_this_table.iloc[unit_row].str.replace('\n', '')
-
-    # fillna in unit row
-    df_this_table.iloc[unit_row][df_this_table.iloc[unit_row]==""] = np.nan
-    df_this_table.iloc[unit_row] = df_this_table.iloc[unit_row].fillna(
-        method='ffill')
-    df_this_table = pm2.pm2io.nir_add_unit_information(df_this_table, unit_row=unit_row,
-                                                       entity_row=entity_row,
-                                                       regexp_entity=".*",
-                                                       manual_repl_unit=units_inv,
-                                                       default_unit="")
-
-    # set index and convert to long format
-    df_this_table = df_this_table.set_index(index_cols)
-    df_this_table_long = pm2.pm2io.nir_convert_df_to_long(df_this_table, year_inventory,
-                                                          header_long)
-
-    # combine with tables for other sectors (merge not append)
-    if df_inventory is None:
-        df_inventory = df_this_table_long
-    else:
-        df_inventory = pd.concat([df_inventory, df_this_table_long], axis=0, join='outer')
-
-# replace cat names by codes in col "category"
-# first the manual replacements
-df_inventory["category"] = df_inventory["category"].replace(cat_codes_manual)
-# then the regex replacements
-repl = lambda m: m.group('code')
-df_inventory["category"] = df_inventory["category"].str.replace(cat_code_regexp, repl, regex=True)
-df_inventory = df_inventory.reset_index(drop=True)
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_inv_if = pm2.pm2io.convert_long_dataframe_if(
-    df_inventory,
-    coords_cols=coords_cols,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    filter_remove=filter_remove,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format='%Y',
-    )
-
-data_inv_pm2 = pm2.pm2io.from_interchange_format(data_inv_if)
-
-## trend tables
-data_trend_pm2 = None
-for table in tables_trends.keys():
-    print(table)
-    current_table = deepcopy(tables_trends[table])
-    tables = camelot.read_pdf(str(input_folder / inventory_file),
-                              pages=current_table["page"],
-                              table_areas=current_table["area"],
-                              columns=current_table["cols"],
-                              flavor='stream',
-                              split_text=True)
-    df_this_table = tables[0].df
-
-    # merge rows for entity and unit
-    rows_to_merge = df_this_table.iloc[current_table["label_rows"]]
-    indices_to_merge = rows_to_merge.index
-    # join the three rows
-    new_row = rows_to_merge.agg(' '.join)
-    df_this_table.loc[indices_to_merge[0]] = new_row
-    df_this_table = df_this_table.drop(indices_to_merge)
-    new_row = new_row.str.replace("  ", " ")
-    new_row = new_row.str.replace("   ", " ")
-    new_row = new_row.str.strip()
-
-    df_this_table.columns = new_row
-
-    # remove columns not needed
-    if 'remove_cols' in current_table.keys():
-        df_this_table = df_this_table.drop(columns=current_table["remove_cols"])
-
-    df_this_table = df_this_table.set_index("Year")
-
-    # transpose to wide format
-    df_this_table = df_this_table.transpose()
-
-    # remove "," (thousand sep) from data
-    for col in df_this_table.columns:
-        df_this_table.loc[:, col] = df_this_table.loc[:, col].str.strip()
-        repl = lambda m: m.group('part1') + m.group('part2')
-        df_this_table.loc[:, col] = df_this_table.loc[:, col].str.replace(
-            '(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-        df_this_table[col][df_this_table[col].isnull()] = 'NaN'
-
-
-    # metadta in forst col instread of index
-    df_this_table = df_this_table.reset_index()
-    df_this_table = df_this_table.rename(columns={"index": "Year"})
-
-    # make sure we have str not a number format for the dates
-    df_this_table.columns = df_this_table.columns.map(str)
-
-    # make copy of columns if a column is used twice for metadata
-    if 'copy_cols' in current_table.keys():
-        for col in current_table["copy_cols"]:
-            df_this_table[col] = df_this_table[current_table["copy_cols"][col]]
-
-    current_table["coords_defaults"].update(coords_defaults)
-    # convert to interchange format
-    data_current_if = pm2.pm2io.convert_wide_dataframe_if(
-        df_this_table,
-        coords_cols=current_table["coords_cols"],
-        coords_defaults=current_table["coords_defaults"],
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=current_table["coords_value_mapping"],
-        meta_data=meta_data,
-        convert_str=True,
-        time_format='%Y',
-    )
-
-    data_current_pm2 = pm2.pm2io.from_interchange_format(data_current_if)
-    if data_trend_pm2 is None:
-        data_trend_pm2 = data_current_pm2
-    else:
-        data_trend_pm2 = data_trend_pm2.pr.merge(data_current_pm2)
-
-data_pm2 = data_inv_pm2.pr.merge(data_trend_pm2, tolerance=0.02) # some rounding in
-# trends needs higher tolerance
-
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save raw data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] +
-                     "_raw.nc"),
-    encoding=encoding)
-
-
-#### processing
-data_proc_pm2 = data_pm2
-terminology_proc = coords_terminologies["category"]
-
-# combine CO2 emissions and removals
-temp_CO2 = data_proc_pm2[["CO2 emissions", "CO2 removals"]].pr.sum\
-    (dim="entity", skipna=True, min_count=1)
-data_proc_pm2["CO2"] = data_proc_pm2["CO2"].fillna(temp_CO2)
-
-# create net KYOTOGHG for 0 and 3
-data_proc_pm2["KYOTOGHG removals (AR5GWP100)"] \
-    = xr.full_like(data_proc_pm2["CO2 removals"],
-                   np.nan).pr.quantify(units="Gg CO2 / year")
-
-data_proc_pm2["KYOTOGHG removals (AR5GWP100)"].attrs = {"entity": "KYOTOGHG",
-                                                        "gwp_context": "AR5GWP100"}
-data_proc_pm2["KYOTOGHG removals (AR5GWP100)"] \
-    = data_proc_pm2.pr.gas_basket_contents_sum(
-    basket="KYOTOGHG removals (AR5GWP100)", basket_contents=['CO2 removals'],
-    skipna=True, min_count=1)
-temp_KYOTOGHG = data_proc_pm2[["KYOTOGHG emissions (AR5GWP100)",
-                               "KYOTOGHG removals (AR5GWP100)"]].pr.sum\
-    (dim="entity", skipna=True, min_count=1)
-data_proc_pm2["KYOTOGHG (AR5GWP100)"] \
-    = data_proc_pm2["KYOTOGHG (AR5GWP100)"].fillna(temp_KYOTOGHG)
-
-
-# actual processing
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=['CO2 emissions', 'CO2 removals',
-                        'KYOTOGHG emissions (AR5GWP100)',
-                        'KYOTOGHG removals (AR5GWP100)'],
-    gas_baskets={},
-    processing_info_country=processing_info_step1,
-)
-
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=processing_info_step2,
-    cat_terminology_out = terminology_proc,
-    #category_conversion = None,
-    #sectors_out = None,
-)
-
-# adapt source and metadata
-# TODO: processing info is present twice
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"),
-    encoding=encoding)

+ 0 - 290
UNFCCC_GHG_data/UNFCCC_reader/Peru/read_PER_BUR3_from_pdf.py

@@ -1,290 +0,0 @@
-# read Singapore fifth BUR from pdf
-
-
-import camelot
-import primap2 as pm2
-import pandas as pd
-
-import locale
-
-from UNFCCC_GHG_data.helper import process_data_for_country, gas_baskets
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from UNFCCC_GHG_data.helper import fix_rows
-from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
-from config_PER_BUR3 import table_def_templates, table_defs, index_cols
-from config_PER_BUR3 import values_replacement, header_long, cats_remove
-from config_PER_BUR3 import cat_codes_manual, cat_code_regexp, cat_names_fix
-from config_PER_BUR3 import coords_cols, coords_terminologies, coords_defaults
-from config_PER_BUR3 import coords_terminologies_2006
-from config_PER_BUR3 import coords_value_mapping, meta_data, filter_remove
-from config_PER_BUR3 import processing_info, cat_conversion
-
-### general configuration
-input_folder = downloaded_data_path / "UNFCCC" / "Peru" / "BUR3"
-output_folder = extracted_data_path / "UNFCCC" / "Peru"
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = "PER_BUR3_2023_"
-inventory_file_pdf = "Tercer_BUR_Per%C3%BA_Jun2023.pdf"
-# years_to_read = range(1990, 2018 + 1)
-
-# define locale to use for str to float conversion
-locale_to_use = "es_PE.UTF-8"
-locale.setlocale(locale.LC_NUMERIC, locale_to_use)
-
-pagesToRead = table_defs.keys()
-
-compression = dict(zlib=True, complevel=9)
-
-## part 1: read the data from pdf
-### part 1.a: 2016 inventory
-
-data_pm2 = None
-for page in pagesToRead:
-    print(f"++++++++++++++++++++++++++++++++")
-    print(f"+++++ Working on page {page} ++++++")
-    print(f"++++++++++++++++++++++++++++++++")
-
-    df_this_page = None
-    for table_on_page in table_defs[page]["templates"]:
-        print(f"Reading table {table_on_page}")
-        area = table_def_templates[table_on_page]["area"]
-        cols = table_def_templates[table_on_page]["cols"]
-        tables = camelot.read_pdf(
-            str(input_folder / inventory_file_pdf),
-            pages=str(page),
-            flavor="stream",
-            table_areas=area,
-            columns=cols,
-        )
-
-        df_current = tables[0].df.copy(deep=True)
-        # drop the old header
-        if "drop_rows" in table_defs[page].keys():
-            df_current = df_current.drop(table_defs[page]["drop_rows"])
-        elif "drop_rows" in table_def_templates[table_on_page].keys():
-            df_current = df_current.drop(
-                table_def_templates[table_on_page]["drop_rows"]
-            )
-        # add new header
-        if "header" in table_defs[page].keys():
-            df_current.columns = pd.MultiIndex.from_tuples(
-                zip(
-                    table_defs[page]["header"]["entity"],
-                    table_defs[page]["header"]["unit"],
-                )
-            )
-        else:
-            df_current.columns = pd.MultiIndex.from_tuples(
-                zip(
-                    table_def_templates[table_on_page]["header"]["entity"],
-                    table_def_templates[table_on_page]["header"]["unit"],
-                )
-            )
-
-        # drop cols if necessary
-        if "drop_cols" in table_defs[page].keys():
-            # print(df_current.columns.values)
-            df_current = df_current.drop(columns=table_defs[page]["drop_cols"])
-        elif "drop_cols" in table_def_templates[table_on_page].keys():
-            df_current = df_current.drop(columns=table_defs[page]["drop_cols"])
-
-        # rename category column
-        df_current.rename(
-            columns={table_defs[page]["category_col"]: index_cols[0]}, inplace=True
-        )
-
-        # replace double \n
-        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("\n", " ")
-        # replace double and triple spaces
-        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("   ", " ")
-        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("  ", " ")
-
-        # fix the split rows
-        for n_rows in table_def_templates[table_on_page]["rows_to_fix"].keys():
-            df_current = fix_rows(
-                df_current,
-                table_def_templates[table_on_page]["rows_to_fix"][n_rows],
-                index_cols[0],
-                n_rows,
-            )
-
-        # replace category names with typos
-        df_current[index_cols[0]] = df_current[index_cols[0]].replace(cat_names_fix)
-
-        # replace empty stings
-        df_current = df_current.replace(values_replacement)
-
-        # set index
-        # df_current = df_current.set_index(index_cols)
-        # strip trailing and leading  and remove "^"
-        for col in df_current.columns.values:
-            df_current[col] = df_current[col].str.strip()
-            df_current[col] = df_current[col].str.replace("^", "")
-
-        # print(df_current)
-        # aggregate dfs for this page
-        if df_this_page is None:
-            df_this_page = df_current.copy(deep=True)
-        else:
-            # find intersecting cols
-            cols_this_page = df_this_page.columns.values
-            # print(f"cols this page: {cols_this_page}")
-            cols_current = df_current.columns.values
-            # print(f"cols current: {cols_current}")
-            cols_both = list(set(cols_this_page).intersection(set(cols_current)))
-            # print(f"cols both: {cols_both}")
-            if len(cols_both) > 0:
-                df_this_page = df_this_page.merge(
-                    df_current, how="outer", on=cols_both, suffixes=(None, None)
-                )
-            else:
-                df_this_page = df_this_page.merge(
-                    df_current,
-                    how="outer",
-                    left_index=True,
-                    right_index=True,
-                    suffixes=(None, None),
-                )
-
-            df_this_page = df_this_page.groupby(index_cols).first().reset_index()
-            # print(df_this_page)
-            # df_all = df_all.join(df_current, how='outer')
-
-    # set index and convert to long format
-    df_this_page = df_this_page.set_index(index_cols)
-    df_this_page_long = pm2.pm2io.nir_convert_df_to_long(
-        df_this_page, table_defs[page]["year"], header_long
-    )
-
-    # drop the rows with memo items etc
-    for cat in cats_remove:
-        df_this_page_long = df_this_page_long.drop(
-            df_this_page_long.loc[df_this_page_long.loc[:, index_cols[0]] == cat].index
-        )
-
-    # make a copy of the categories row
-    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:, index_cols[0]]
-
-    # replace cat names by codes in col "Categories"
-    # first the manual replacements
-    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:, "category"].replace(
-        cat_codes_manual
-    )
-    # then the regex replacements
-    repl = lambda m: convert_ipcc_code_primap_to_primap2("IPC" + m.group("code"))
-    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[
-        :, "category"
-    ].str.replace(cat_code_regexp, repl, regex=True)
-    df_this_page_long.loc[:, "category"].unique()
-
-    # strip spaces in data col
-    df_this_page_long.loc[:, "data"] = df_this_page_long.loc[:, "data"].str.strip()
-
-    df_this_page_long = df_this_page_long.reset_index(drop=True)
-
-    # make sure all col headers are str
-    df_this_page_long.columns = df_this_page_long.columns.map(str)
-
-    # remove thousands separators as pd.to_numeric can't deal with that
-    df_this_page_long.loc[:, "data"] = df_this_page_long.loc[:, "data"].str.replace(
-        ".", ""
-    )
-    pat = r"^(?P<first>[0-9\.,]*),(?P<last>[0-9\.,]*)$"
-    repl = lambda m: f"{m.group('first')}.{m.group('last')}"
-    df_this_page_long.loc[:, "data"] = df_this_page_long.loc[:, "data"].str.replace(
-        pat, repl, regex=True
-    )
-
-    # df_this_page_long["data"] = df_this_page_long["data"].str.replace("^.$","",
-    #                                                                   regex=True)
-
-    # drop orig cat name as it's not unique over all tables (keep until here in case
-    # it's needed for debugging)
-    df_this_page_long = df_this_page_long.drop(columns="orig_cat_name")
-
-    data_page_if = pm2.pm2io.convert_long_dataframe_if(
-        df_this_page_long,
-        coords_cols=coords_cols,
-        # add_coords_cols=add_coords_cols,
-        coords_defaults=coords_defaults,
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=coords_value_mapping[
-            table_defs[page]["coords_value_mapping"]
-        ],
-        # coords_value_filling=coords_value_filling,
-        filter_remove=filter_remove,
-        # filter_keep=filter_keep,
-        meta_data=meta_data,
-        convert_str=True,
-        time_format="%Y",
-    )
-
-    # conversion to PRIMAP2 native format
-    data_page_pm2 = pm2.pm2io.from_interchange_format(data_page_if)
-
-    # combine with tables from other pages
-    if data_pm2 is None:
-        data_pm2 = data_page_pm2
-    else:
-        data_pm2 = data_pm2.pr.merge(data_page_pm2)
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_if,
-)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding,
-)
-
-#### continue here
-
-# ###
-# ## process the data
-# ###
-data_proc_pm2 = data_pm2
-
-# actual processing
-
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=processing_info,
-    cat_terminology_out=coords_terminologies_2006["category"],
-    category_conversion=cat_conversion,
-)
-
-# adapt source and metadata
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", "BUR_NIR", data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies_2006["category"]),
-    data_proc_if,
-)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies_2006["category"] + ".nc"),
-    encoding=encoding,
-)

+ 0 - 448
UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/config_KOR_BUR4.py

@@ -1,448 +0,0 @@
-original_names = [
-    '총배출량',
-    '순배출량',
-    '에너지',
-    'A. 연료연소',
-    '1. 에너지산업',
-    'a. 공공전기 및 열 생산',
-    'b. 석유정제',
-    'c. 고체연료 제조 및 기타 에너지 산업',
-    '2. 제조업 및 건설업',
-    'a. 철강',
-    'b. 비철금속',
-    'c. 화학',
-    'd. 펄프, 제지 및 인쇄',
-    'e. 식음료품 가공 및 담배 제조',
-    'f. 기타',
-    '  1. 비금속',
-    '  2. 조립금속',
-    '  3. 나무 및 목재',
-    '  4. 건설',
-    '  5. 섬유 및 가죽',
-    '  6. 기타제조',
-    '3. 수송',
-    'a. 민간항공',
-    'b. 도로수송',
-    'c. 철도',
-    'd. 해운',
-    'e. 기타수송',
-    '4. 기타',
-    'a. 상업/공공',
-    'b. 가정',
-    'c. 농업/임업/어업',
-    '5. 미분류',
-    'B. 탈루',
-    '1. 고체연료',
-    '2.  석유 및 천연가스',
-    'a.  석유',
-    'b. 천연가스',
-    '산업공정',
-    'A. 광물산업',
-    '1. 시멘트생산',
-    '2. 석회생산',
-    '3. 석회석 및 백운석 소비',
-    '4. 소다회 생산 및 소비',
-    '5. 아스팔트 루핑',
-    '6. 아스팔트 도로포장',
-    'B. 화학산업',
-    'C. 금속산업',
-    '1. 철강생산',
-    '2. 합금철 생산',
-    '3. 알루미늄 생산',
-    '4. 마그네슘 생산의 SF6 소비',
-    'D. 기타산업',
-    'E. 할로카본 및 육불화황 생산',
-    '1. 부산물 배출',
-    '2. 탈루 배출',
-    'F. 할로카본 및 육불화황 소비',
-    '1.  냉장 및 냉방',
-    '2.  발포제',
-    '3.  소화기',
-    '4.  에어로졸',
-    '5.  용매',
-    '6.  기타 용도의 ODS 대체물질 사용',
-    '7.  반도체 제조',
-    '8.  중전기기',
-    '9.  기타(잠재배출량)',
-    '농업',
-    'A.  장내발효',
-    '1. 소',
-    '2. 물소',
-    '3. 양(면양)',
-    '4. 양(산양)',
-    '5. 낙타 및 라마',
-    '6. 말',
-    '7. 노새 및 당나귀',
-    '8. 돼지',
-    '9. 가금류',
-    '10. 기타 가축(사슴)',
-    'B.  가축분뇨처리',
-    '1. 소',
-    '2. 물소',
-    '3. 양(면양)',
-    '4. 양(산양)',
-    '5. 낙타 및 라마',
-    '6. 말',
-    '7. 노새 및 당나귀',
-    '8. 돼지',
-    '9. 가금류',
-    '10. 기타 가축(사슴)',
-    'C.  벼재배',
-    '1. 관개',
-    '2. 천수답',
-    'D. 농경지토양',
-    '1. 직접배출',
-    '2. 목장, 방목구역, 분료(거름)',
-    '3. 간접배출',
-    'E. 사바나 소각',
-    'F. 작물잔사소각',
-    '1. 곡류',
-    '2. 두류(콩)',
-    '3. 근채류',
-    '4. 사탕수수',
-    '5. 기타',
-    'LULUCF',
-    'A. 산림지',
-    '1. 산림지로 유지된 산림지',
-    '2. 타토지에서 전용된 산림지',
-    '3. 산림지에서 질소 시비로 인한 N2O 배출',
-    '4. 산림지에서 배수로 인한 Non-CO2 배출',
-    '5. 산림지에서 바이오매스 연소에 의한 배출',
-    'B. 농경지',
-    '1. 농경지로 유지된 농경지',
-    '2. 타토지에서 전용된 농경지',
-    '3. 농경지로의 전용에 따른 N2O 배출',
-    '4. 농경지에서 농업용 석회시용으로 인한 CO2 배출',
-    '5. 농경지에서 바이오매스 연소에 의한 배출',
-    'C. 초지',
-    '1. 초지로 유지된 초지',
-    '2. 타토지에서 전용된 초지',
-    '3. 초지에서 농업용 석회시용으로 인한 CO2 배출',
-    '4. 초지에서 바이오매스 연소에 의한 배출',
-    'D. 습지',
-    '1. 습지로 유지된 습지',
-    '2. 타토지에서 전용된 습지',
-    '3. 습지에서 배수로 인한 Non-CO2 배출',
-    '4. 습지에서 바이오매스 연소에 의한 배출',
-    'E. 정주지',
-    'F. 기타토지',
-    '폐기물',
-    'A. 폐기물매립',
-    '1. 관리형 매립',
-    '2. 비관리형 매립',
-    'B. 하폐수처리',
-    '1. 폐수처리',
-    '2. 하수처리',
-    'C. 폐기물소각',
-    'D. 기타',
-    '별도항목(Memo Item)',
-    '분야·부문/연도',
-    'C. 국제벙커링 및 다국적 작전',
-    '1. 벙커링',
-    'a. 국제 항공',
-    'b. 국제 해운',
-    '2. 다국적 작전',
-    '* 참고 : NO = 배출활동 및 공정이 없는 경우, NE = 산정하지 아니하는 경우, NA = 자연적, 이론적으로 발생하지 않는 활동 및 공정의 경우, IE = 다른 항목에 포함하여 보고하는 경우, C = 기밀정보인 경우',
-    '3. 타토지로 전용된 농경지', # start of new codes in 2021 inventory
-    '4. 농경지로의 전용에 따른 N2O 배출',
-    '5. 농경지에서 농업용 석회시용으로 인한 CO2 배출',
-    '6. 농경지에서 바이오매스 연소에 의한 배출',
-    'G. 기타',
-]
-translations = [
-    ['Total emissions', 'M.0.EL'],
-    ['Net emissions', '0'],
-    ['energy', '1'],
-    ['A. Fuel combustion', '1.A'],
-    ['1. Energy industry', '1.A.1'],
-    ['a. Public electricity and heat production', '1.A.1.a'],
-    ['b. Oil refining', '1.A.1.b'],
-    ['c. Solid fuel manufacturing and other energy industries', '1.A.1.c'],
-    ['2. Manufacturing and construction', '1.A.2'],
-    ['a. steel', '1.A.2.a'],
-    ['b. Non-ferrous metal', '1.A.2.b'],
-    ['c. chemistry', '1.A.2.c'],
-    ['d. Pulp, paper and printing', '1.A.2.d'],
-    ['e. Food and beverage processing and tobacco manufacturing', '1.A.2.e'],
-    ['f. Etc', '1.A.2.f'],
-    ['  1. Non-metal', '1.A.2.f.1'],
-    ['  2. Assembly metal', '1.A.2.f.2'],
-    ['  3. Wood and timber', '1.A.2.f.3'],
-    ['  4. Construction', '1.A.2.f.4'],
-    ['  5. Textile and leather', '1.A.2.f.5'],
-    ['  6. Other manufacturing', '1.A.2.f.6'],
-    ['3. Transportation', '1.A.3'],
-    ['a. Civil aviation', '1.A.3.a.2'],
-    ['b. Road transport', '1.A.3.b'],
-    ['c. railroad', '1.A.3.c'],
-    ['d. shipping', '1.A.3.d.2'],
-    ['e. Other transport', '1.A.3.e'],
-    ['4. Other', '1.A.4'],
-    ['a. Commercial/Public', '1.A.4.a'],
-    ['b. home', '1.A.4.b'],
-    ['c. Agriculture/Forestry/Fishing', '1.A.4.c'],
-    ['5. Uncategorized', '1.A.5'],
-    ['B. Talu', '1.B'],
-    ['1. Solid fuel', '1.B.1'],
-    ['2. Oil and natural gas', '1.B.2'],
-    ['a. oil', '1.B.2.a'],
-    ['b. Natural gas', '1.B.2.b'],
-    ['Industrial process', '2'],
-    ['A. Mineral industry', '2.A'],
-    ['1. Cement production', '2.A.1'],
-    ['2. Lime production', '2.A.2'],
-    ['3. Limestone and Dolomite Consumption', '2.A.3'],
-    ['4. Soda ash production and consumption', '2.A.4'],
-    ['5. Asphalt roofing', '2.A.5'],
-    ['6. Asphalt road pavement', '2.A.6'],
-    ['B. Chemical industry', '2.B'],
-    ['C. Metal Industry', '2.C'],
-    ['1. Steel production', '2.C.1'],
-    ['2. Ferroalloy production', '2.C.2'],
-    ['3. Aluminum production', '2.C.3'],
-    ['4. SF6 consumption in magnesium production', '2.C.4'],
-    ['D. Other industries', '2.D'],
-    ['E. Production of halocarbons and sulfur hexafluoride', '2.E'],
-    ['1. Emission of by-products', '2.E.1'],
-    ['2. Fugitive discharge', '2.E.2'],
-    ['F. Consumption of halocarbons and sulfur hexafluoride', '2.F'],
-    ['1. Refrigeration and cooling', '2.F.1'],
-    ['2. Foaming agent', '2.F.2'],
-    ['3. Fire extinguisher', '2.F.3'],
-    ['4. Aerosol', '2.F.4'],
-    ['5. Solvent', '2.F.5'],
-    ['6. Use of ODS substitutes for other purposes', '2.F.6'],
-    ['7. Semiconductor manufacturing', '2.F.7'],
-    ['8. Heavy electric machine', '2.F.8'],
-    ['9. Others (potential emissions)', '2.F.9'],
-    ['Agriculture', '4'],
-    ['A. Intestinal fermentation', '4.A'],
-    ['1. cow', '4.A.1'],
-    ['2. Water buffalo', '4.A.2'],
-    ['3. Sheep (Cotton Sheep)', '4.A.3'],
-    ['4. Sheep (Goat)', '4.A.4'],
-    ['5. Camel and Llama', '4.A.5'],
-    ['6. Horse', '4.A.6'],
-    ['7. Mules and Donkeys', '4.A.7'],
-    ['8. Pig', '4.A.8'],
-    ['9. Poultry', '4.A.9'],
-    ['10. Other livestock (deer)', '4.A.10'],
-    ['B. Livestock manure treatment', '4.B'],
-    ['1. cow', '4.B.1'],
-    ['2. Water buffalo', '4.B.2'],
-    ['3. Sheep (Cotton Sheep)', '4.B.3'],
-    ['4. Sheep (Goat)', '4.B.4'],
-    ['5. Camel and Llama', '4.B.5'],
-    ['6. Horse', '4.B.6'],
-    ['7. Mules and Donkeys', '4.B.7'],
-    ['8. Pig', '4.B.8'],
-    ['9. Poultry', '4.B.9'],
-    ['10. Other livestock (deer)', '4.B.10'],
-    ['C. Rice cultivation', '4.C'],
-    ['1. irrigation', '4.C.1'],
-    ['2. Thousand answers', '4.C.4'],
-    ['D. Cropland soil', '4.D'],
-    ['1. Direct discharge', '4.D.1'],
-    ['2. Ranch, grazing area, manure (manure)', '4.D.2'],
-    ['3. Indirect emissions', '4.D.3'],
-    ['E. Savannah incineration', '4.E'],
-    ['F. Crop residue incineration', '4.F'],
-    ['1. Grains', '4.F.1'],
-    ['2. Beans (beans)', '4.F.2'],
-    ['3. Root vegetables', '4.F.3'],
-    ['4. Sugar cane', '4.F.4'],
-    ['5. Other', '4.F.5'],
-    ['LULUCF', '5'],
-    ['A. Forest land', '5.A'],
-    ['1. Forest land maintained as a forest land', '5.A.1'],  # categories differ from IPCC1996
-    ['2. Forest land converted from other lands', '5.A.2'],  # categories differ from IPCC1996
-    ['3. N2O emissions from nitrogen fertilization in forest areas', '5.A.3'],  # categories differ from IPCC1996
-    ['4. Non-CO2 emission due to drainage in forest areas', '5.A.4'],  # categories differ from IPCC1996
-    ['5. Emissions from biomass combustion in forest areas', '5.A.5'],  # categories differ from IPCC1996
-    ['B. Cropland', '5.B'],
-    ['1. Agricultural land maintained as agricultural land', '5.B.1'],  # categories differ from IPCC1996
-    ['2. Cropland converted from other lands', '5.B.2'],  # categories differ from IPCC1996
-    ['3. N2O emission due to conversion to agricultural land', '5.B.3'],  # categories differ from IPCC1996
-    ['4. CO2 emission from agricultural lime application in agricultural land', '5.B.4'],  # categories differ from IPCC1996
-    ['5. Emissions from biomass combustion in agricultural land', '5.B.5'],  # categories differ from IPCC1996
-    ['C. Grassland', '5.C'],
-    ['1. Grassland maintained as grassland', '5.C.1'],  # categories differ from IPCC1996
-    ['2. Grassland dedicated to Tatoji', '5.C.2'],  # categories differ from IPCC1996
-    ['3. CO2 emission from agricultural lime application in grassland', '5.C.3'],  # categories differ from IPCC1996
-    ['4. Emissions from biomass combustion in grassland', '5.C.4'],  # categories differ from IPCC1996
-    ['D. Wetlands', '5.D'],
-    ['1. Wetlands maintained as wetlands', '5.D.1'],  # categories differ from IPCC1996
-    ['2. Wetlands converted from Tatoji', '5.D.2'],  # categories differ from IPCC1996
-    ['3. Non-CO2 emission due to drainage in wetlands', '5.D.3'],  # categories differ from IPCC1996
-    ['4. Emissions from biomass combustion in wetlands', '5.D.4'],  # categories differ from IPCC1996
-    ['E. Jeongju-ji', '5.E'],
-    ['F. Other land', '5.F'],
-    ['waste', '6'],
-    ['A. Landfill of waste', '6.A'],
-    ['1. Managed landfill', '6.A.1'],
-    ['2. Unmanaged landfill', '6.A.2'],
-    ['B. Sewage water treatment', '6.B'],
-    ['1. Wastewater treatment', '6.B.1'],  # categories differ from IPCC1996
-    ['2. Sewage treatment', '6.B.2'],  # categories differ from IPCC1996
-    ['C. Waste incineration', '6.C'],
-    ['D. Other', '6.D'],
-    ['Memo Item', '\IGNORE'],
-    ['Field·Sector/Year', '\IGNORE'],
-    ['C. International bunkering and multinational operations', '\IGNORE'],
-    ['1. Bunkering', 'M.1'],
-    ['a. International aviation', 'M.1.A'],
-    ['b. International shipping', 'M.1.B'],
-    ['2. Multinational operations', 'M.2'],
-    ['', '\IGNORE'],
-    ['3. Farmland converted to Tato land', '5.B.3'],  # new codes in 2021 inventory start here
-    ['4. N2O emission due to conversion to agricultural land', '5.B.4'],
-    ['5. CO2 emission from agricultural lime application in agricultural land', '5.B.5'],
-    ['6. Emissions from burning biomass on agricultural land', '5.B.6'],
-    ['G. Others', '5.G'],
-]
-cat_name_translations = dict(zip(original_names, [cat[0] for cat in translations]))
-cat_codes = dict(zip(original_names, [cat[1] for cat in translations]))
-
-remove_cats = [
-    '1.A.1.a', '1.A.1.b', '1.A.1.c', '1.A.2.f',
-    '2.A', '2.D',
-    '2.F', '2.G',
-    '4.C.1', '4.C.4',
-    '4.D',
-    '4.F.1', '4.F.2', '4.F.3', '4.F.4', '4.F.5',  # detail not in 2006 categories
-    '5.A', '5.A.1', '5.A.2', '5.A.3', '5.A.4', '5.A.5',  # don't not match IPCC
-    # categories
-    '5.B', '5.B.1', '5.B.2', '5.B.3', '5.B.4', '5.B.5',
-    '5.C', '5.C.1', '5.C.2', '5.C.3', '5.C.4',
-    '5.D', '5.D.1', '5.D.2', '5.D.3', '5.D.4',
-    '5.E', '5.F',
-    '5.G', '5.B.6', # for 2021 NIR
-]
-
-aggregate_before_mapping = {
-    '2006.2.D.4': {'sources': ['2.A.5', '2.A.6'], 'name': 'Other'},
-    '2006.3.C.4': {'sources': ['4.D.1', '4.D.2'],
-                   'name': 'Direct N2O Emissions from Managed Soils'},
-    '2006.M.3C1AG': {'sources': ['4.E', '4.F'], 'name': 'Biomass burning Agriculture'},
-    '2006.1.A.2.m': {'sources': ['1.A.2.f.2', '1.A.2.f.6'], 'name': 'Other'},
-}
-
-cat_mapping = {
-    '1.A.2.f.1': '1.A.2.f',
-    '1.A.2.f.3': '1.A.2.j',
-    '1.A.2.f.4': '1.A.2.k',
-    '1.A.2.f.5': '1.A.2.l',
-    '2006.1.A.2.m': '1.A.2.m',
-    '2.A.4': '2.B.7',  # add to 2.B
-    '2.A.3': '2.A.4',
-    '2.D': '2.H',
-    '2006.2.D.4': '2.D.4',
-    '2.E': '2.B.9',  # add to 2.B
-    '2.E.1': '2.B.9.a',
-    '2.E.2': '2.B.9.b',
-    #    '2.F', # remove?
-    '2.F.1': '2.F.1',  # just added here to avoid confusion
-    #    '2.F.2', '2.F.3', '2.F.4', '2.F.5',
-    '2.F.6': '2.E_1',
-    '2.F.7': '2.E_2',
-    '2.F.8': '2.G.1',
-    '2.F.9': '2.G.2',
-    '4': 'M.AG',
-    '4.A': '3.A.1',
-    '4.A.1': '3.A.1.a',
-    '4.A.2': '3.A.1.b',
-    '4.A.3': '3.A.1.c',
-    '4.A.4': '3.A.1.d',
-    '4.A.5': '3.A.1.e',
-    '4.A.6': '3.A.1.f',
-    '4.A.7': '3.A.1.g',
-    '4.A.8': '3.A.1.h',
-    '4.A.9': '3.A.1.i',
-    '4.A.10': '3.A.1.j',
-    '4.B': '3.A.2',
-    '4.B.1': '3.A.2.a',
-    '4.B.2': '3.A.2.b',
-    '4.B.3': '3.A.2.c',
-    '4.B.4': '3.A.2.d',
-    '4.B.5': '3.A.2.e',
-    '4.B.6': '3.A.2.f',
-    '4.B.7': '3.A.2.g',
-    '4.B.8': '3.A.2.h',
-    '4.B.9': '3.A.2.i',
-    '4.B.10': '3.A.2.j',
-    '4.C': '3.C.7',
-    '2006.3.C.4': '3.C.4',
-    '4.D.3': '3.C.5',
-    '2006.M.3C1AG': 'M.3.C.1.AG',
-    '5': 'M.LULUCF',
-    '6': '4',
-    '6.A': '4.A',
-    '6.A.1': '4.A.1',
-    '6.A.2': '4.A.2',
-    '6.B': '4.D',
-    '6.B.1': '4.D.1',
-    '6.B.2': '4.D.2',
-    '6.C': '4.C.1',
-    '6.D': '4.E',
-    'M.1': 'M.BK',
-    'M.1.A': 'M.BK.A',
-    'M.1.B': 'M.BK.M',
-}
-
-aggregate_after_mapping = {
-    '1.A.3.a': {'sources': ['1.A.3.a.2'], 'name': 'Civil Aviation'},  # aviation
-    '1.A.3.d': {'sources': ['1.A.3.d.2'], 'name': 'Water-borne Navigation'},  # shipping
-    '2.A': {'sources': ['2.A.1', '2.A.2', '2.A.4', '2.A.5', '2.A.6'],
-            'name': 'Mineral Industry'},
-    '2.B': {'sources': ['2.B', '2.B.7', '2.B.9'], 'name': 'Chemical Industry'},
-    '2.D': {'sources': ['2.D.4'], 'name': 'Other'},
-    '2.E': {'sources': ['2.E_1', '2.E_2'], 'name': 'Electronics Industry'},
-    '2.F': {'sources': ['2.F.1', '2.F.2', '2.F.3', '2.F.4', '2.F.5'],
-            'name': 'Product uses as Substitutes for Ozone Depleting Substances'},
-    '2.G': {'sources': ['2.G.1', '2.G.2'], 'name': 'Other Product Manufacture and Use'},
-    '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-    '3.C': {'sources': ['3.C.4', '3.C.5', '3.C.7'],
-                 'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-    'M.3.C.AG': {'sources': ['3.C.4', '3.C.5', '3.C.7'],
-                 'name': 'Aggregate sources and non-CO2 emissions sources on land ('
-                         'Agriculture)'},
-    'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock'},
-    '4.C': {'sources': ['4.C.1'], 'name': 'Incineration and Open Burning of Waste'},
-}
-
-coords_terminologies_2006 = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-filter_remove_2006 = {
-    "f1": {
-        "category (IPCC2006_PRIMAP)": "\IGNORE",
-    },
-    "livestock": {  # temp until double cat name problem is solved
-        "category (IPCC2006_PRIMAP)": [
-            '4.B.1', '4.B.10', '4.B.2', '4.B.3', '4.B.4',
-            '4.B.5', '4.B.6', '4.B.7', '4.B.8', '4.B.9',
-        ]
-    },
-    "fmap": {
-        "category (IPCC2006_PRIMAP)": remove_cats
-    },
-    "f_bef_map": {
-        "category (IPCC2006_PRIMAP)": [
-            '2.A.5', '2.A.6',  # combined to 2006.2.D.4
-            '4.D.1', '4.D.2',  # combined to 2006.3.C.4
-            '4.E', '4.F',  # 2006.M.3.C.1.AG
-            '1.A.2.f.2', '1.A.2.f.6',  # 2006.1.A.2.m
-        ]
-    }
-}
-
-filter_remove_after_agg = {
-    "tempCats": {
-        "category (IPCC2006_PRIMAP)": [
-            "2.E_1", "2.E_2"
-        ],
-    },
-}

+ 0 - 497
UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/config_KOR_INV2023.py

@@ -1,497 +0,0 @@
-original_names = [
-    '총배출량',
-    '순배출량',
-    '에너지',
-    'A. 연료연소',
-    '1. 에너지산업',
-    'a. 공공전기 및 열 생산',
-    'b. 석유정제',
-    'c. 고체연료 제조 및 기타 에너지 산업',
-    '2. 제조업 및 건설업',
-    'a. 철강',
-    'b. 비철금속',
-    'c. 화학',
-    'd. 펄프, 제지 및 인쇄',
-    'e. 식음료품 가공 및 담배 제조',
-    'f. 기타',
-    '  1. 비금속',
-    '  2. 조립금속',
-    '  3. 나무 및 목재',
-    '  4. 건설',
-    '  5. 섬유 및 가죽',
-    '  6. 기타제조',
-    '3. 수송',
-    'a. 민간항공',
-    'b. 도로수송',
-    'c. 철도',
-    'd. 해운',
-    'e. 기타수송',
-    '4. 기타',
-    'a. 상업/공공',
-    'b. 가정',
-    'c. 농업/임업/어업',
-    '5. 미분류',
-    'B. 탈루',
-    '1. 고체연료',
-    '2.  석유 및 천연가스',
-    'a.  석유',
-    'b. 천연가스',
-    '산업공정',
-    'A. 광물산업',
-    '1. 시멘트생산',
-    '2. 석회생산',
-    '3. 석회석 및 백운석 소비',
-    '4. 소다회 생산 및 소비',
-    '5. 아스팔트 루핑',
-    '6. 아스팔트 도로포장',
-    'B. 화학산업',
-    'C. 금속산업',
-    '1. 철강생산',
-    '2. 합금철 생산',
-    '3. 알루미늄 생산',
-    '4. 마그네슘 생산의 SF6 소비',
-    'D. 기타산업',
-    'E. 할로카본 및 육불화황 생산',
-    '1. 부산물 배출',
-    '2. 탈루 배출',
-    'F. 할로카본 및 육불화황 소비',
-    '1.  냉장 및 냉방',
-    '2.  발포제',
-    '3.  소화기',
-    '4.  에어로졸',
-    '5.  용매',
-    '6.  기타 용도의 ODS 대체물질 사용',
-    '7.  반도체 제조',
-    '8.  중전기기',
-    '9.  기타(잠재배출량)',
-    '농업',
-    'A.  장내발효',
-    'A.1. 소',
-    'A.2. 물소',
-    'A.3. 양(면양)',
-    'A.4. 양(산양)',
-    'A.5. 낙타 및 라마',
-    'A.6. 말',
-    'A.7. 노새 및 당나귀',
-    'A.8. 돼지',
-    'A.9. 가금류',
-    'A.10. 기타 가축(사슴)',
-    'B.  가축분뇨처리',
-    'B.1. 소',
-    'B.2. 물소',
-    'B.3. 양(면양)',
-    'B.4. 양(산양)',
-    'B.5. 낙타 및 라마',
-    'B.6. 말',
-    'B.7. 노새 및 당나귀',
-    'B.8. 돼지',
-    'B.9. 가금류',
-    'B.10. 기타 가축(사슴)',
-    'B1. 소',
-    'B2. 물소',
-    'B3. 양(면양)',
-    'B4. 양(산양)',
-    'B5. 낙타 및 라마',
-    'B6. 말',
-    'B7. 노새 및 당나귀',
-    'B8. 돼지',
-    'B9. 가금류',
-    'B10. 기타 가축(사슴)',
-    'C.  벼재배',
-    '1. 관개',
-    '2. 천수답',
-    'D. 농경지토양',
-    '1. 직접배출',
-    '2. 목장, 방목구역, 분료(거름)',
-    '3. 간접배출',
-    'E. 사바나 소각',
-    'F. 작물잔사소각',
-    '1. 곡류',
-    '2. 두류(콩)',
-    '3. 근채류',
-    '4. 사탕수수',
-    '5. 기타',
-    'LULUCF',
-    'A. 산림지',
-    '1. 산림지로 유지된 산림지',
-    '2. 타토지에서 전용된 산림지',
-    '3. 산림지에서 질소 시비로 인한 N2O 배출',
-    '4. 산림지에서 배수로 인한 Non-CO2 배출',
-    '5. 산림지에서 바이오매스 연소에 의한 배출',
-    'B. 농경지',
-    '1. 농경지로 유지된 농경지',
-    '2. 타토지에서 전용된 농경지',
-    '3. 농경지로의 전용에 따른 N2O 배출',
-    '4. 농경지에서 농업용 석회시용으로 인한 CO2 배출',
-    '5. 농경지에서 바이오매스 연소에 의한 배출',
-    'C. 초지',
-    '1. 초지로 유지된 초지',
-    '2. 타토지에서 전용된 초지',
-    '3. 초지에서 농업용 석회시용으로 인한 CO2 배출',
-    '4. 초지에서 바이오매스 연소에 의한 배출',
-    'D. 습지',
-    '1. 습지로 유지된 습지',
-    '2. 타토지에서 전용된 습지',
-    '3. 습지에서 배수로 인한 Non-CO2 배출',
-    '4. 습지에서 바이오매스 연소에 의한 배출',
-    'E. 정주지',
-    'F. 기타토지',
-    '폐기물',
-    'A. 폐기물매립',
-    '1. 관리형 매립',
-    '2. 비관리형 매립',
-    'B. 하폐수처리',
-    '1. 폐수처리',
-    '2. 하수처리',
-    'C. 폐기물소각',
-    'D. 기타',
-    '별도항목(Memo Item)',
-    '분야·부문/연도',
-    'C. 국제벙커링 및 다국적 작전',
-    '1. 벙커링',
-    'a. 국제 항공',
-    'b. 국제 해운',
-    '2. 다국적 작전',
-    '* 참고 : NO = 배출활동 및 공정이 없는 경우, NE = 산정하지 아니하는 경우, NA = 자연적, 이론적으로 발생하지 않는 활동 및 공정의 경우, IE = 다른 항목에 포함하여 보고하는 경우, C = 기밀정보인 경우',
-    '3. 타토지로 전용된 농경지', # start of new codes in 2021 inventory
-    '4. 농경지로의 전용에 따른 N2O 배출',
-    '5. 농경지에서 농업용 석회시용으로 인한 CO2 배출',
-    '6. 농경지에서 바이오매스 연소에 의한 배출',
-    'G. 기타',
-    '3. 습지에서 배수로 인한 Non-CH4 배출', # new codes in 2022 inventory start here
-    '3. 초지에서 농업용 석회시용으로 인한 CH4 배출',
-    '4. 산림지에서 배수로 인한 Non-CH4 배출',
-    '5. 농경지에서 농업용 석회시용으로 인한 CH4 배출',
-]
-translations = [
-    ['Total emissions', 'M.0.EL'],
-    ['Net emissions', '0'],
-    ['energy', '1'],
-    ['A. Fuel combustion', '1.A'],
-    ['1. Energy industry', '1.A.1'],
-    ['a. Public electricity and heat production', '1.A.1.a'],
-    ['b. Oil refining', '1.A.1.b'],
-    ['c. Solid fuel manufacturing and other energy industries', '1.A.1.c'],
-    ['2. Manufacturing and construction', '1.A.2'],
-    ['a. steel', '1.A.2.a'],
-    ['b. Non-ferrous metal', '1.A.2.b'],
-    ['c. chemistry', '1.A.2.c'],
-    ['d. Pulp, paper and printing', '1.A.2.d'],
-    ['e. Food and beverage processing and tobacco manufacturing', '1.A.2.e'],
-    ['f. Etc', '1.A.2.f'],
-    ['  1. Non-metal', '1.A.2.f.1'],
-    ['  2. Assembly metal', '1.A.2.f.2'],
-    ['  3. Wood and timber', '1.A.2.f.3'],
-    ['  4. Construction', '1.A.2.f.4'],
-    ['  5. Textile and leather', '1.A.2.f.5'],
-    ['  6. Other manufacturing', '1.A.2.f.6'],
-    ['3. Transportation', '1.A.3'],
-    ['a. Civil aviation', '1.A.3.a.2'],
-    ['b. Road transport', '1.A.3.b'],
-    ['c. railroad', '1.A.3.c'],
-    ['d. shipping', '1.A.3.d.2'],
-    ['e. Other transport', '1.A.3.e'],
-    ['4. Other', '1.A.4'],
-    ['a. Commercial/Public', '1.A.4.a'],
-    ['b. home', '1.A.4.b'],
-    ['c. Agriculture/Forestry/Fishing', '1.A.4.c'],
-    ['5. Uncategorized', '1.A.5'],
-    ['B. Talu', '1.B'],
-    ['1. Solid fuel', '1.B.1'],
-    ['2. Oil and natural gas', '1.B.2'],
-    ['a. oil', '1.B.2.a'],
-    ['b. Natural gas', '1.B.2.b'],
-    ['Industrial process', '2'],
-    ['A. Mineral industry', '2.A'],
-    ['1. Cement production', '2.A.1'],
-    ['2. Lime production', '2.A.2'],
-    ['3. Limestone and Dolomite Consumption', '2.A.3'],
-    ['4. Soda ash production and consumption', '2.A.4'],
-    ['5. Asphalt roofing', '2.A.5'],
-    ['6. Asphalt road pavement', '2.A.6'],
-    ['B. Chemical industry', '2.B'],
-    ['C. Metal Industry', '2.C'],
-    ['1. Steel production', '2.C.1'],
-    ['2. Ferroalloy production', '2.C.2'],
-    ['3. Aluminum production', '2.C.3'],
-    ['4. SF6 consumption in magnesium production', '2.C.4'],
-    ['D. Other industries', '2.D'],
-    ['E. Production of halocarbons and sulfur hexafluoride', '2.E'],
-    ['1. Emission of by-products', '2.E.1'],
-    ['2. Fugitive discharge', '2.E.2'],
-    ['F. Consumption of halocarbons and sulfur hexafluoride', '2.F'],
-    ['1. Refrigeration and cooling', '2.F.1'],
-    ['2. Foaming agent', '2.F.2'],
-    ['3. Fire extinguisher', '2.F.3'],
-    ['4. Aerosol', '2.F.4'],
-    ['5. Solvent', '2.F.5'],
-    ['6. Use of ODS substitutes for other purposes', '2.F.6'],
-    ['7. Semiconductor manufacturing', '2.F.7'],
-    ['8. Heavy electric machine', '2.F.8'],
-    ['9. Others (potential emissions)', '2.F.9'],
-    ['Agriculture', '4'],
-    ['A. Intestinal fermentation', '4.A'],
-    ['A.1. cow', '4.A.1'],
-    ['A.2. Water buffalo', '4.A.2'],
-    ['A.3. Sheep (Cotton Sheep)', '4.A.3'],
-    ['A.4. Sheep (Goat)', '4.A.4'],
-    ['A.5. Camel and Llama', '4.A.5'],
-    ['A.6. Horse', '4.A.6'],
-    ['A.7. Mules and Donkeys', '4.A.7'],
-    ['A.8. Pig', '4.A.8'],
-    ['A.9. Poultry', '4.A.9'],
-    ['A.10. Other livestock (deer)', '4.A.10'],
-    ['B. Livestock manure treatment', '4.B'],
-    ['B.1. cow', '4.B.1'],
-    ['B.2. Water buffalo', '4.B.2'],
-    ['B.3. Sheep (Cotton Sheep)', '4.B.3'],
-    ['B.4. Sheep (Goat)', '4.B.4'],
-    ['B.5. Camel and Llama', '4.B.5'],
-    ['B.6. Horse', '4.B.6'],
-    ['B.7. Mules and Donkeys', '4.B.7'],
-    ['B.8. Pig', '4.B.8'],
-    ['B.9. Poultry', '4.B.9'],
-    ['B.10. Other livestock (deer)', '4.B.10'],
-    ['B.1. cow', '4.B.1'],
-    ['B.2. Water buffalo', '4.B.2'],
-    ['B.3. Sheep (Cotton Sheep)', '4.B.3'],
-    ['B.4. Sheep (Goat)', '4.B.4'],
-    ['B.5. Camel and Llama', '4.B.5'],
-    ['B.6. Horse', '4.B.6'],
-    ['B.7. Mules and Donkeys', '4.B.7'],
-    ['B.8. Pig', '4.B.8'],
-    ['B.9. Poultry', '4.B.9'],
-    ['B.10. Other livestock (deer)', '4.B.10'],
-    ['C. Rice cultivation', '4.C'],
-    ['1. irrigation', '4.C.1'],
-    ['2. Thousand answers', '4.C.4'],
-    ['D. Cropland soil', '4.D'],
-    ['1. Direct discharge', '4.D.1'],
-    ['2. Ranch, grazing area, manure (manure)', '4.D.2'],
-    ['3. Indirect emissions', '4.D.3'],
-    ['E. Savannah incineration', '4.E'],
-    ['F. Crop residue incineration', '4.F'],
-    ['1. Grains', '4.F.1'],
-    ['2. Beans (beans)', '4.F.2'],
-    ['3. Root vegetables', '4.F.3'],
-    ['4. Sugar cane', '4.F.4'],
-    ['5. Other', '4.F.5'],
-    ['LULUCF', '5'],
-    ['A. Forest land', '5.A'],
-    ['1. Forest land maintained as a forest land', '5.A.1'],  # categories differ from IPCC1996
-    ['2. Forest land converted from other lands', '5.A.2'],  # categories differ from IPCC1996
-    ['3. N2O emissions from nitrogen fertilization in forest areas', '5.A.3'],  # categories differ from IPCC1996
-    ['4. Non-CO2 emission due to drainage in forest areas', '5.A.4'],  # categories differ from IPCC1996
-    ['5. Emissions from biomass combustion in forest areas', '5.A.5'],  # categories differ from IPCC1996
-    ['B. Cropland', '5.B'],
-    ['1. Agricultural land maintained as agricultural land', '5.B.1'],  # categories differ from IPCC1996
-    ['2. Cropland converted from other lands', '5.B.2'],  # categories differ from IPCC1996
-    ['3. N2O emission due to conversion to agricultural land', '5.B.3'],  # categories differ from IPCC1996
-    ['4. CO2 emission from agricultural lime application in agricultural land', '5.B.4'],  # categories differ from IPCC1996
-    ['5. Emissions from biomass combustion in agricultural land', '5.B.5'],  # categories differ from IPCC1996
-    ['C. Grassland', '5.C'],
-    ['1. Grassland maintained as grassland', '5.C.1'],  # categories differ from IPCC1996
-    ['2. Grassland dedicated to Tatoji', '5.C.2'],  # categories differ from IPCC1996
-    ['3. CO2 emission from agricultural lime application in grassland', '5.C.3'],  # categories differ from IPCC1996
-    ['4. Emissions from biomass combustion in grassland', '5.C.4'],  # categories differ from IPCC1996
-    ['D. Wetlands', '5.D'],
-    ['1. Wetlands maintained as wetlands', '5.D.1'],  # categories differ from IPCC1996
-    ['2. Wetlands converted from Tatoji', '5.D.2'],  # categories differ from IPCC1996
-    ['3. Non-CO2 emission due to drainage in wetlands', '5.D.3'],  # categories differ from IPCC1996
-    ['4. Emissions from biomass combustion in wetlands', '5.D.4'],  # categories differ from IPCC1996
-    ['E. Jeongju-ji', '5.E'],
-    ['F. Other land', '5.F'],
-    ['waste', '6'],
-    ['A. Landfill of waste', '6.A'],
-    ['1. Managed landfill', '6.A.1'],
-    ['2. Unmanaged landfill', '6.A.2'],
-    ['B. Sewage water treatment', '6.B'],
-    ['1. Wastewater treatment', '6.B.1'],  # categories differ from IPCC1996
-    ['2. Sewage treatment', '6.B.2'],  # categories differ from IPCC1996
-    ['C. Waste incineration', '6.C'],
-    ['D. Other', '6.D'],
-    ['Memo Item', '\IGNORE'],
-    ['Field·Sector/Year', '\IGNORE'],
-    ['C. International bunkering and multinational operations', '\IGNORE'],
-    ['1. Bunkering', 'M.1'],
-    ['a. International aviation', 'M.1.A'],
-    ['b. International shipping', 'M.1.B'],
-    ['2. Multinational operations', 'M.2'],
-    ['', '\IGNORE'],
-    ['3. Farmland converted to Tato land', '5.B.3'],  # new codes in 2021 inventory start here
-    ['4. N2O emission due to conversion to agricultural land', '5.B.4'],
-    ['5. CO2 emission from agricultural lime application in agricultural land', '5.B.5'],
-    ['6. Emissions from burning biomass on agricultural land', '5.B.6'],
-    ['G. Others', '5.G'],
-    ['3. Non-CH4 emissions due to drainage from wetlands', 'M.5.D.1.drain'], # new codes in 2022 inventory start here
-    ['4. CH4 emissions from agricultural lime application on grassland',
-     'M.4.D.1.lime.grass'],
-    ['5. Non-CH4 emissions from drainage from forest land', 'M.5.A.1.drain'],
-    ['6. CH4 emissions from agricultural lime application in agricultural fields',
-     'M.4.D.1.lime.field'],
-]
-cat_name_translations = dict(zip(original_names, [cat[0] for cat in translations]))
-cat_codes = dict(zip(original_names, [cat[1] for cat in translations]))
-
-fix_rows = [
-    '1. 소',
-    '2. 물소',
-    '3. 양(면양)',
-    '4. 양(산양)',
-    '5. 낙타 및 라마',
-    '6. 말',
-    '7. 노새 및 당나귀',
-    '8. 돼지',
-    '9. 가금류',
-    '10. 기타 가축(사슴)',
-]
-
-remove_cats = [
-    '1.A.1.a', '1.A.1.b', '1.A.1.c', '1.A.2.f',
-    '2.A', '2.D',
-    '2.F', '2.G',
-    '4.C.1', '4.C.4',
-    '4.D',
-    '4.F.1', '4.F.2', '4.F.3', '4.F.4', '4.F.5',  # detail not in 2006 categories
-    '5.A', '5.A.1', '5.A.2', '5.A.3', '5.A.4', '5.A.5',  # don't not match IPCC
-    # categories
-    '5.B', '5.B.1', '5.B.2', '5.B.3', '5.B.4', '5.B.5',
-    '5.C', '5.C.1', '5.C.2', '5.C.3', '5.C.4',
-    '5.D', '5.D.1', '5.D.2', '5.D.3', '5.D.4',
-    '5.E', '5.F',
-    '5.G', '5.B.6', # for 2021 NIR
-    'M.5.A.1.drain', 'M.5.D.1.drain', # for 2022 / 2023 NIR
-    'M.4.D.1.lime.field', 'M.4.D.1.lime.grass',
-]
-
-aggregate_before_mapping = {
-    '2006.2.D.4': {'sources': ['2.A.5', '2.A.6'], 'name': 'Other'},
-    '2006.3.C.2': {'sources': ['M.4.D.1.lime.grass', 'M.4.D.1.lime.grass'],
-                   'name': 'Liming'},
-    '2006.3.C.4': {'sources': ['4.D.1', '4.D.2'],
-                   'name': 'Direct N2O Emissions from Managed Soils'},
-    '2006.M.3C1AG': {'sources': ['4.E', '4.F'], 'name': 'Biomass burning Agriculture'},
-    '2006.1.A.2.m': {'sources': ['1.A.2.f.2', '1.A.2.f.6'], 'name': 'Other'},
-}
-
-cat_mapping = {
-    '1.A.2.f.1': '1.A.2.f',
-    '1.A.2.f.3': '1.A.2.j',
-    '1.A.2.f.4': '1.A.2.k',
-    '1.A.2.f.5': '1.A.2.l',
-    '2006.1.A.2.m': '1.A.2.m',
-    '2.A.4': '2.B.7',  # add to 2.B
-    '2.A.3': '2.A.4',
-    '2.D': '2.H',
-    '2006.2.D.4': '2.D.4',
-    '2.E': '2.B.9',  # add to 2.B
-    '2.E.1': '2.B.9.a',
-    '2.E.2': '2.B.9.b',
-    #    '2.F', # remove?
-    '2.F.1': '2.F.1',  # just added here to avoid confusion
-    #    '2.F.2', '2.F.3', '2.F.4', '2.F.5',
-    '2.F.6': '2.E_1',
-    '2.F.7': '2.E_2',
-    '2.F.8': '2.G.1',
-    '2.F.9': '2.G.2',
-    '4': 'M.AG',
-    '4.A': '3.A.1',
-    '4.A.1': '3.A.1.a',
-    '4.A.2': '3.A.1.b',
-    '4.A.3': '3.A.1.c',
-    '4.A.4': '3.A.1.d',
-    '4.A.5': '3.A.1.e',
-    '4.A.6': '3.A.1.f',
-    '4.A.7': '3.A.1.g',
-    '4.A.8': '3.A.1.h',
-    '4.A.9': '3.A.1.i',
-    '4.A.10': '3.A.1.j',
-    '4.B': '3.A.2',
-    '4.B.1': '3.A.2.a',
-    '4.B.2': '3.A.2.b',
-    '4.B.3': '3.A.2.c',
-    '4.B.4': '3.A.2.d',
-    '4.B.5': '3.A.2.e',
-    '4.B.6': '3.A.2.f',
-    '4.B.7': '3.A.2.g',
-    '4.B.8': '3.A.2.h',
-    '4.B.9': '3.A.2.i',
-    '4.B.10': '3.A.2.j',
-    '4.C': '3.C.7',
-    '2006.3.C.2': '3.C.2',
-    '2006.3.C.4': '3.C.4',
-    '4.D.3': '3.C.5',
-    '2006.M.3C1AG': 'M.3.C.1.AG',
-    '5': 'M.LULUCF',
-    '6': '4',
-    '6.A': '4.A',
-    '6.A.1': '4.A.1',
-    '6.A.2': '4.A.2',
-    '6.B': '4.D',
-    '6.B.1': '4.D.1',
-    '6.B.2': '4.D.2',
-    '6.C': '4.C.1',
-    '6.D': '4.E',
-    'M.1': 'M.BK',
-    'M.1.A': 'M.BK.A',
-    'M.1.B': 'M.BK.M',
-}
-
-aggregate_after_mapping = {
-    '1.A.3.a': {'sources': ['1.A.3.a.2'], 'name': 'Civil Aviation'},  # aviation
-    '1.A.3.d': {'sources': ['1.A.3.d.2'], 'name': 'Water-borne Navigation'},  # shipping
-    '2.A': {'sources': ['2.A.1', '2.A.2', '2.A.4', '2.A.5', '2.A.6'],
-            'name': 'Mineral Industry'},
-    '2.B': {'sources': ['2.B', '2.B.7', '2.B.9'], 'name': 'Chemical Industry'},
-    '2.D': {'sources': ['2.D.4'], 'name': 'Other'},
-    '2.E': {'sources': ['2.E_1', '2.E_2'], 'name': 'Electronics Industry'},
-    '2.F': {'sources': ['2.F.1', '2.F.2', '2.F.3', '2.F.4', '2.F.5'],
-            'name': 'Product uses as Substitutes for Ozone Depleting Substances'},
-    '2.G': {'sources': ['2.G.1', '2.G.2'], 'name': 'Other Product Manufacture and Use'},
-    '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-    '3.C.1': {'sources': ['M.3.C.1.AG'], 'name': 'Emissions from Biomass Burning'},
-    '3.C': {'sources': ['3.C.1', '3.C.2', '3.C.4', '3.C.5', '3.C.7'],
-                 'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-    'M.3.C.AG': {'sources': ['M.3.C.1.AG', '3.C.2', '3.C.4', '3.C.5', '3.C.7'],
-                 'name': 'Aggregate sources and non-CO2 emissions sources on land ('
-                         'Agriculture)'},
-    'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock'},
-    '4.C': {'sources': ['4.C.1'], 'name': 'Incineration and Open Burning of Waste'},
-}
-
-coords_terminologies_2006 = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP",
-    "scenario": "PRIMAP",
-}
-
-filter_remove_2006 = {
-    "f1": {
-        "category (IPCC2006_PRIMAP)": "\IGNORE",
-    },
-    # "livestock": {  # temp until double cat name problem is solved
-    #     "category (IPCC2006_PRIMAP)": [
-    #         '4.B.1', '4.B.10', '4.B.2', '4.B.3', '4.B.4',
-    #         '4.B.5', '4.B.6', '4.B.7', '4.B.8', '4.B.9',
-    #     ]
-    # },
-    "fmap": {
-        "category (IPCC2006_PRIMAP)": remove_cats
-    },
-    "f_bef_map": {
-        "category (IPCC2006_PRIMAP)": [
-            '2.A.5', '2.A.6',  # combined to 2006.2.D.4
-            '4.D.1', '4.D.2',  # combined to 2006.3.C.4
-            '4.E', '4.F',  # 2006.M.3.C.1.AG
-            '1.A.2.f.2', '1.A.2.f.6',  # 2006.1.A.2.m
-        ]
-    }
-}
-
-filter_remove_after_agg = {
-    "tempCats": {
-        "category (IPCC2006_PRIMAP)": [
-            "2.E_1", "2.E_2"
-        ],
-    },
-}

+ 0 - 313
UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_2021-Inventory_from_xlsx.py

@@ -1,313 +0,0 @@
-# this script reads data from Korea's 2021 national inventory which is underlying BUR4
-# Data is read from the xlsx file
-
-import os
-import sys
-import pandas as pd
-import primap2 as pm2
-
-from config_KOR_BUR4 import cat_name_translations, cat_codes
-from config_KOR_BUR4 import remove_cats, aggregate_before_mapping, cat_mapping, \
-    aggregate_after_mapping, coords_terminologies_2006, filter_remove_2006, \
-    filter_remove_after_agg
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from primap2.pm2io._data_reading import filter_data, matches_time_format
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'non-UNFCCC' / 'Republic_of_Korea' / \
-               '2021-Inventory'
-output_folder = extracted_data_path / 'non-UNFCCC' / 'Republic_of_Korea'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'KOR_2021-Inventory_2021_'
-
-inventory_file = 'Republic_of_Korea_National_GHG_Inventory_(1990_2019).xlsx'
-years_to_read = range(1990, 2019 + 1)
-
-sheets_to_read = ['온실가스', 'CO2', 'CH4', 'N2O', 'HFCs', 'PFCs', 'SF6']
-cols_to_read = range(1, 2019 - 1990 + 3)
-
-# columns for category UNFCCC_GHG_data and original category name
-index_cols = ['분야·부문/연도']
-
-sheet_metadata = {
-    'entity': {
-        '온실가스': 'KYOTOGHG (SARGWP100)',
-        'CO2': 'CO2',
-        'CH4': 'CH4 (SARGWP100)',
-        'N2O': 'N2O (SARGWP100)',
-        'HFCs': 'HFCS (SARGWP100)',
-        'PFCs': 'PFCS (SARGWP100)',
-        'SF6': 'SF6 (SARGWP100)',
-    },
-    'unit': {
-        '온실가스': 'Gg CO2 / yr',
-        'CO2': 'Gg CO2 / yr',
-        'CH4': 'Gg CO2 / yr',
-        'N2O': 'Gg CO2 / yr',
-        'HFCs': 'Gg CO2 / yr',
-        'PFCs': 'Gg CO2 / yr',
-        'SF6': 'Gg CO2 / yr',
-    }
-}
-
-# definitions for conversion to interchange format
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-    "cat_name_translation": ["cat_name_translation", "category"]
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_KOR_INV",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "KOR-GHG-Inventory",
-    "provenance": "measured",
-    "area": "KOR",
-    "scenario": "INV2021",
-}
-
-coords_value_mapping = {
-    "cat_name_translation": cat_name_translations,
-    "category": cat_codes,
-}
-
-# filtering after IF creation to be able to use the IPCC codes
-filter_remove = {
-    "f1": {
-        "category (IPCC1996_KOR_INV)": "\IGNORE",
-    },
-    "livestock": { # temp until double cat name problem is solved
-        "category (IPCC1996_KOR_INV)": [
-            '4.B.1', '4.B.10', '4.B.2', '4.B.3', '4.B.4',
-            '4.B.5', '4.B.6', '4.B.7', '4.B.8', '4.B.9',
-        ]
-    }
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "http://www.gir.go.kr/home/file/readDownloadFile.do?fileId=5240&fileSeq=1",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Republic of Korea: National Greenhouse Gas Inventory Report 2021",
-    "comment": "Read fom xlsx file by Johannes Gütschow",
-    "institution": "Republic of Korea, Ministry of Environment, Greenhouse Gas Inventory and Research Center",
-}
-
-cols_for_space_stripping = []
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# start data reading
-# ###
-
-# change working directory to script directory for proper folder names
-script_path = os.path.abspath(sys.argv[0])
-script_dir_name = os.path.dirname(script_path)
-os.chdir(script_dir_name)
-
-df_all = None
-
-for sheet in sheets_to_read:
-    # read current sheet (one sheet per gas)
-    df_current = pd.read_excel(input_folder / inventory_file, sheet_name=sheet, skiprows=3, nrows=146, usecols=cols_to_read,
-                               engine="openpyxl")
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols, inplace=True)
-    # set index. necessary for the stack operation in the conversion to long format
-    # df_current = df_current.set_index(index_cols)
-    # add columns
-    for col in sheet_metadata.keys():
-        df_current.insert(1, col, sheet_metadata[col][sheet])
-    # aggregate to one df
-    if df_all is None:
-        df_all = df_current
-    else:
-        df_all = pd.concat([df_all, df_current])
-
-df_all = df_all.reset_index(drop=True)
-# rename category col because filtering produces problems with korean col names
-df_all.rename(columns={"분야·부문/연도": "category"}, inplace=True)
-
-# create copies of category col for further processing
-df_all["orig_cat_name"] = df_all["category"]
-df_all["cat_name_translation"] = df_all["category"]
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-copy_df=True, # we need the unchanged DF for the conversion step
-    )
-
-filter_data(data_if, filter_remove=filter_remove)
-
-#conversion to PRIMAP2 native format
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-# convert back to IF to have units in the fixed format
-data_pm2 = data_pm2.reset_coords(["orig_cat_name", "cat_name_translation"], drop=True)
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-#pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-#data_pm2.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)
-
-# ###
-# conversion to ipcc 2006 categories
-# ###
-
-
-data_if_2006 = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies_2006,
-    coords_value_mapping=coords_value_mapping,
-    meta_data=meta_data,
-    convert_str=True,
-    copy_df=True,  # don't mess up the dataframe when testing
-)
-
-cat_label = 'category (' + coords_terminologies_2006["category"] + ')'
-# agg before mapping
-
-for cat_to_agg in aggregate_before_mapping:
-    mask = data_if_2006[cat_label].isin(aggregate_before_mapping[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum()
-
-
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name",
-                          aggregate_before_mapping[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        if cat_to_agg in aggregate_before_mapping[cat_to_agg]["sources"]:
-            filter_this_cat = {
-                "f": {cat_label: cat_to_agg}
-            }
-            filter_data(data_if_2006, filter_remove=filter_this_cat)
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# filtering
-filter_data(data_if_2006, filter_remove=filter_remove_2006)
-
-# map 1 to 1 categories
-data_if_2006 = data_if_2006.replace({cat_label: cat_mapping})
-data_if_2006[cat_label].unique()
-
-# agg after mapping
-
-for cat_to_agg in aggregate_after_mapping:
-    mask = data_if_2006[cat_label].isin(aggregate_after_mapping[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum()
-
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name",
-                          aggregate_after_mapping[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        if cat_to_agg in aggregate_after_mapping[cat_to_agg]["sources"]:
-            filter_this_cat = {
-                "f": {cat_label: cat_to_agg}
-            }
-            filter_data(data_if_2006, filter_remove=filter_this_cat)
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-
-#conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-# convert back to IF to have units in the fixed format
-data_pm2_2006 = data_pm2_2006.reset_coords(["orig_cat_name", "cat_name_translation"],
-                                       drop=True)
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-# save IPCC2006 data
-
-filter_data(data_if_2006, filter_remove=filter_remove_after_agg)
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies_2006["category"]), data_if_2006)
-
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(output_folder / (output_filename + coords_terminologies_2006["category"] + ".nc"), encoding=encoding)

+ 0 - 318
UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_2022-Inventory_from_xlsx.py

@@ -1,318 +0,0 @@
-# this script reads data from Korea's 2021 national inventory which is underlying BUR4
-# Data is read from the xlsx file
-
-import os
-import sys
-import pandas as pd
-import primap2 as pm2
-
-from config_KOR_BUR4 import cat_name_translations, cat_codes
-from config_KOR_BUR4 import remove_cats, aggregate_before_mapping, cat_mapping, \
-    aggregate_after_mapping, coords_terminologies_2006, filter_remove_2006, \
-    filter_remove_after_agg
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from primap2.pm2io._data_reading import filter_data, matches_time_format
-from UNFCCC_GHG_data.helper import process_data_for_country
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'non-UNFCCC' / 'Republic_of_Korea' / \
-               '2022-Inventory'
-output_folder = extracted_data_path / 'non-UNFCCC' / 'Republic_of_Korea'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'KOR_2022-Inventory_2022_'
-
-inventory_file = 'Republic_of_Korea_National_GHG_Inventory_(1990_2020).xlsx'
-years_to_read = range(1990, 2020 + 1)
-
-sheets_to_read = ['온실가스', 'CO2', 'CH4', 'N2O', 'HFCs', 'PFCs', 'SF6']
-cols_to_read = range(1, 2020 - 1990 + 3)
-
-# columns for category UNFCCC_GHG_data and original category name
-index_cols = ['분야·부문/연도']
-
-sheet_metadata = {
-    'entity': {
-        '온실가스': 'KYOTOGHG (SARGWP100)',
-        'CO2': 'CO2',
-        'CH4': 'CH4 (SARGWP100)',
-        'N2O': 'N2O (SARGWP100)',
-        'HFCs': 'HFCS (SARGWP100)',
-        'PFCs': 'PFCS (SARGWP100)',
-        'SF6': 'SF6 (SARGWP100)',
-    },
-    'unit': {
-        '온실가스': 'Gg CO2 / yr',
-        'CO2': 'Gg CO2 / yr',
-        'CH4': 'Gg CO2 / yr',
-        'N2O': 'Gg CO2 / yr',
-        'HFCs': 'Gg CO2 / yr',
-        'PFCs': 'Gg CO2 / yr',
-        'SF6': 'Gg CO2 / yr',
-    }
-}
-
-# definitions for conversion to interchange format
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-    "cat_name_translation": ["cat_name_translation", "category"]
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_KOR_INV",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "KOR-GHG-Inventory",
-    "provenance": "measured",
-    "area": "KOR",
-    "scenario": "INV2022",
-}
-
-coords_value_mapping = {
-    "cat_name_translation": cat_name_translations,
-    "category": cat_codes,
-}
-
-# filtering after IF creation to be able to use the IPCC codes
-filter_remove = {
-    "f1": {
-        "category (IPCC1996_KOR_INV)": "\IGNORE",
-    },
-    "livestock": { # temp until double cat name problem is solved
-        "category (IPCC1996_KOR_INV)": [
-            '4.B.1', '4.B.10', '4.B.2', '4.B.3', '4.B.4',
-            '4.B.5', '4.B.6', '4.B.7', '4.B.8', '4.B.9',
-        ]
-    }
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "http://www.gir.go.kr/home/file/readDownloadFile.do?fileId=5810&fileSeq=3",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Republic of Korea: National Greenhouse Gas Inventory Report 2022",
-    "comment": "Read fom xlsx file by Johannes Gütschow",
-    "institution": "Republic of Korea, Ministry of Environment, Greenhouse Gas Inventory and Research Center",
-}
-
-
-
-cols_for_space_stripping = []
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# start data reading
-# ###
-
-# change working directory to script directory for proper folder names
-script_path = os.path.abspath(sys.argv[0])
-script_dir_name = os.path.dirname(script_path)
-os.chdir(script_dir_name)
-
-df_all = None
-
-for sheet in sheets_to_read:
-    # read current sheet (one sheet per gas)
-    df_current = pd.read_excel(input_folder / inventory_file, sheet_name=sheet, skiprows=3, nrows=146, usecols=cols_to_read,
-                               engine="openpyxl")
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols, inplace=True)
-    # set index. necessary for the stack operation in the conversion to long format
-    # df_current = df_current.set_index(index_cols)
-    # make sure all col headers are str
-    df_current.columns = df_current.columns.map(str)
-    # add columns
-    for col in sheet_metadata.keys():
-        df_current.insert(1, col, sheet_metadata[col][sheet])
-    # aggregate to one df
-    if df_all is None:
-        df_all = df_current
-    else:
-        df_all = pd.concat([df_all, df_current])
-
-df_all = df_all.reset_index(drop=True)
-# rename category col because filtering produces problems with korean col names
-df_all.rename(columns={"분야·부문/연도": "category"}, inplace=True)
-
-# create copies of category col for further processing
-df_all["orig_cat_name"] = df_all["category"]
-df_all["cat_name_translation"] = df_all["category"]
-
-
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    copy_df=True, # we need the unchanged DF for the conversion step
-    )
-
-filter_data(data_if, filter_remove=filter_remove)
-
-#conversion to PRIMAP2 native format
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-# convert back to IF to have units in the fixed format
-data_pm2 = data_pm2.reset_coords(["orig_cat_name", "cat_name_translation"], drop=True)
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)
-
-# ###
-# conversion to ipcc 2006 categories
-# ###
-
-
-data_if_2006 = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies_2006,
-    coords_value_mapping=coords_value_mapping,
-    meta_data=meta_data,
-    convert_str=True,
-    copy_df=True,  # don't mess up the dataframe when testing
-)
-
-cat_label = 'category (' + coords_terminologies_2006["category"] + ')'
-# agg before mapping
-
-for cat_to_agg in aggregate_before_mapping:
-    mask = data_if_2006[cat_label].isin(aggregate_before_mapping[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum()
-
-        df_combine = df_combine.drop(columns=["category (IPCC2006_PRIMAP)", "orig_cat_name", "cat_name_translation"])
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name",
-                          aggregate_before_mapping[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        if cat_to_agg in aggregate_before_mapping[cat_to_agg]["sources"]:
-            filter_this_cat = {
-                "f": {cat_label: cat_to_agg}
-            }
-            filter_data(data_if_2006, filter_remove=filter_this_cat)
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# filtering
-filter_data(data_if_2006, filter_remove=filter_remove_2006)
-
-# map 1 to 1 categories
-data_if_2006 = data_if_2006.replace({cat_label: cat_mapping})
-data_if_2006[cat_label].unique()
-
-# agg after mapping
-
-for cat_to_agg in aggregate_after_mapping:
-    mask = data_if_2006[cat_label].isin(aggregate_after_mapping[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum()
-
-        df_combine = df_combine.drop(columns=["category (IPCC2006_PRIMAP)", "orig_cat_name", "cat_name_translation"])
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name",
-                          aggregate_after_mapping[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        if cat_to_agg in aggregate_after_mapping[cat_to_agg]["sources"]:
-            filter_this_cat = {
-                "f": {cat_label: cat_to_agg}
-            }
-            filter_data(data_if_2006, filter_remove=filter_this_cat)
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-
-#conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-# convert back to IF to have units in the fixed format
-data_pm2_2006 = data_pm2_2006.reset_coords(["orig_cat_name", "cat_name_translation"],
-                                       drop=True)
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-# save IPCC2006 data
-
-filter_data(data_if_2006, filter_remove=filter_remove_after_agg)
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies_2006["category"]), data_if_2006)
-
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(output_folder / (output_filename + coords_terminologies_2006["category"] + ".nc"), encoding=encoding)

+ 0 - 333
UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_2023-Inventory_from_xlsx.py

@@ -1,333 +0,0 @@
-# this script reads data from Korea's 2023 national inventory
-# Data is read from the xlsx file
-
-import os
-import sys
-import pandas as pd
-import primap2 as pm2
-
-from config_KOR_INV2023 import cat_name_translations, cat_codes, fix_rows
-from config_KOR_INV2023 import remove_cats, aggregate_before_mapping, cat_mapping, \
-    aggregate_after_mapping, coords_terminologies_2006, filter_remove_2006, \
-    filter_remove_after_agg
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from primap2.pm2io._data_reading import filter_data, matches_time_format
-from UNFCCC_GHG_data.helper import process_data_for_country
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'non-UNFCCC' / 'Republic_of_Korea' / \
-               '2023-Inventory'
-output_folder = extracted_data_path / 'non-UNFCCC' / 'Republic_of_Korea'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'KOR_2023-Inventory_2023_'
-
-inventory_file = 'Republic_of_Korea_National_GHG_Inventory_(1990_2021).xlsx'
-years_to_read = range(1990, 2020 + 1)
-
-sheets_to_read = ['온실가스', 'CO2', 'CH4', 'N2O', 'HFCs', 'PFCs', 'SF6']
-cols_to_read = range(1, 2021 - 1990 + 3)
-
-# columns for category UNFCCC_GHG_data and original category name
-index_cols = ['분야·부문/연도']
-
-sheet_metadata = {
-    'entity': {
-        '온실가스': 'KYOTOGHG (SARGWP100)',
-        'CO2': 'CO2',
-        'CH4': 'CH4 (SARGWP100)',
-        'N2O': 'N2O (SARGWP100)',
-        'HFCs': 'HFCS (SARGWP100)',
-        'PFCs': 'PFCS (SARGWP100)',
-        'SF6': 'SF6 (SARGWP100)',
-    },
-    'unit': {
-        '온실가스': 'Gg CO2 / yr',
-        'CO2': 'Gg CO2 / yr',
-        'CH4': 'Gg CO2 / yr',
-        'N2O': 'Gg CO2 / yr',
-        'HFCs': 'Gg CO2 / yr',
-        'PFCs': 'Gg CO2 / yr',
-        'SF6': 'Gg CO2 / yr',
-    }
-}
-
-# definitions for conversion to interchange format
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-    "cat_name_translation": ["cat_name_translation", "category"]
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_KOR_INV",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "KOR-GHG-Inventory",
-    "provenance": "measured",
-    "area": "KOR",
-    "scenario": "INV2023",
-}
-
-coords_value_mapping = {
-    "cat_name_translation": cat_name_translations,
-    "category": cat_codes,
-}
-
-# filtering after IF creation to be able to use the IPCC codes
-filter_remove = {
-    "f1": {
-        "category (IPCC1996_KOR_INV)": "\IGNORE",
-    },
-    # "livestock": { # temp until double cat name problem is solved
-    #     "category (IPCC1996_KOR_INV)": [
-    #         '4.B.1', '4.B.10', '4.B.2', '4.B.3', '4.B.4',
-    #         '4.B.5', '4.B.6', '4.B.7', '4.B.8', '4.B.9',
-    #     ]
-    # }
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "http://www.gir.go.kr/home/board/read.do?pagerOffset=0&maxPageItems=10&maxIndexPages="
-                  "10&searchKey=&searchValue=&menuId=36&boardId=62&boardMasterId=2&boardCategoryId=",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Republic of Korea: National Greenhouse Gas Inventory Report 2023",
-    "comment": "Read fom xlsx file by Johannes Gütschow",
-    "institution": "Republic of Korea, Ministry of Environment, Greenhouse Gas Inventory and Research Center",
-}
-
-
-
-cols_for_space_stripping = []
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# start data reading
-# ###
-
-# change working directory to script directory for proper folder names
-script_path = os.path.abspath(sys.argv[0])
-script_dir_name = os.path.dirname(script_path)
-os.chdir(script_dir_name)
-
-df_all = None
-
-for sheet in sheets_to_read:
-    print(f"Reading sheet {sheet}.")
-    # read current sheet (one sheet per gas)
-    df_current = pd.read_excel(input_folder / inventory_file, sheet_name=sheet, skiprows=3, nrows=146, usecols=cols_to_read,
-                               engine="openpyxl")
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols, inplace=True)
-    # set index. necessary for the stack operation in the conversion to long format
-    # df_current = df_current.set_index(index_cols)
-    # make sure all col headers are str
-    df_current.columns = df_current.columns.map(str)
-
-    # fix the double category issue in livestock
-    lastrow = None
-    for i, row in df_current.iterrows():
-        if row["분야·부문/연도"] in fix_rows:
-            if lastrow == 'A.  장내발효':
-                df_current.iloc[i]["분야·부문/연도"] = f'A.{df_current.iloc[i]["분야·부문/연도"]}'
-            elif lastrow == 'B.  가축분뇨처리':
-                df_current.iloc[i]["분야·부문/연도"] = f'B.{df_current.iloc[i]["분야·부문/연도"]}'
-            else:
-                raise ValueError(f'Row to fix, but no fix defined {lastrow}, {row["분야·부문/연도"]}')
-        else:
-            lastrow = row["분야·부문/연도"]
-    # add columns
-    for col in sheet_metadata.keys():
-        df_current.insert(1, col, sheet_metadata[col][sheet])
-    # aggregate to one df
-    if df_all is None:
-        df_all = df_current
-    else:
-        df_all = pd.concat([df_all, df_current])
-
-df_all = df_all.reset_index(drop=True)
-# rename category col because filtering produces problems with korean col names
-df_all.rename(columns={"분야·부문/연도": "category"}, inplace=True)
-
-# create copies of category col for further processing
-df_all["orig_cat_name"] = df_all["category"]
-df_all["cat_name_translation"] = df_all["category"]
-
-
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    copy_df=True, # we need the unchanged DF for the conversion step
-    )
-
-filter_data(data_if, filter_remove=filter_remove)
-
-#conversion to PRIMAP2 native format
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-# convert back to IF to have units in the fixed format
-data_pm2 = data_pm2.reset_coords(["orig_cat_name", "cat_name_translation"], drop=True)
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)
-
-# ###
-# conversion to ipcc 2006 categories
-# ###
-
-
-data_if_2006 = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies_2006,
-    coords_value_mapping=coords_value_mapping,
-    meta_data=meta_data,
-    convert_str=True,
-    copy_df=True,  # don't mess up the dataframe when testing
-)
-
-cat_label = 'category (' + coords_terminologies_2006["category"] + ')'
-# agg before mapping
-
-for cat_to_agg in aggregate_before_mapping:
-    mask = data_if_2006[cat_label].isin(aggregate_before_mapping[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum()
-
-        df_combine = df_combine.drop(columns=["category (IPCC2006_PRIMAP)", "orig_cat_name", "cat_name_translation"])
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name",
-                          aggregate_before_mapping[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        if cat_to_agg in aggregate_before_mapping[cat_to_agg]["sources"]:
-            filter_this_cat = {
-                "f": {cat_label: cat_to_agg}
-            }
-            filter_data(data_if_2006, filter_remove=filter_this_cat)
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# filtering
-filter_data(data_if_2006, filter_remove=filter_remove_2006)
-
-# map 1 to 1 categories
-data_if_2006 = data_if_2006.replace({cat_label: cat_mapping})
-data_if_2006[cat_label].unique()
-
-# agg after mapping
-
-for cat_to_agg in aggregate_after_mapping:
-    mask = data_if_2006[cat_label].isin(aggregate_after_mapping[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum()
-
-        df_combine = df_combine.drop(columns=["category (IPCC2006_PRIMAP)", "orig_cat_name", "cat_name_translation"])
-        df_combine.insert(0, cat_label, cat_to_agg)
-        df_combine.insert(1, "orig_cat_name",
-                          aggregate_after_mapping[cat_to_agg]["name"])
-
-        df_combine = df_combine.reset_index()
-
-        if cat_to_agg in aggregate_after_mapping[cat_to_agg]["sources"]:
-            filter_this_cat = {
-                "f": {cat_label: cat_to_agg}
-            }
-            filter_data(data_if_2006, filter_remove=filter_this_cat)
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine])
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-
-#conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-# convert back to IF to have units in the fixed format
-data_pm2_2006 = data_pm2_2006.reset_coords(["orig_cat_name", "cat_name_translation"],
-                                       drop=True)
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-# save IPCC2006 data
-
-filter_data(data_if_2006, filter_remove=filter_remove_after_agg)
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies_2006["category"]), data_if_2006)
-
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(output_folder / (output_filename + coords_terminologies_2006["category"] + ".nc"), encoding=encoding)

+ 0 - 185
UNFCCC_GHG_data/UNFCCC_reader/Republic_of_Korea/read_KOR_BUR4_from_xlsx.py

@@ -1,185 +0,0 @@
-# this script reads data from Korea's BUR4
-# Data is read from the xlsx file
-
-import os
-import sys
-import pandas as pd
-import primap2 as pm2
-
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from config_KOR_BUR4 import cat_name_translations, cat_codes
-from primap2.pm2io._data_reading import filter_data
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'non-UNFCCC' / 'Republic_of_Korea' / \
-               '2020-Inventory'
-output_folder = extracted_data_path / 'UNFCCC' / 'Republic_of_Korea'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'KOR_BUR4_2021_'
-
-inventory_file = 'Republic_of_Korea_National_GHG_Inventory_(1990_2018).xlsx'
-years_to_read = range(1990, 2018 + 1)
-
-sheets_to_read = ['온실가스', 'CO2', 'CH4', 'N2O', 'HFCs', 'PFCs', 'SF6']
-cols_to_read = range(1, 2018 - 1990 + 3)
-
-# columns for category UNFCCC_GHG_data and original category name
-index_cols = ['분야·부문/연도']
-
-sheet_metadata = {
-    'entity': {
-        '온실가스': 'KYOTOGHG (SARGWP100)',
-        'CO2': 'CO2',
-        'CH4': 'CH4 (SARGWP100)',
-        'N2O': 'N2O (SARGWP100)',
-        'HFCs': 'HFCS (SARGWP100)',
-        'PFCs': 'PFCS (SARGWP100)',
-        'SF6': 'SF6 (SARGWP100)',
-    },
-    'unit': {
-        '온실가스': 'Gg CO2 / yr',
-        'CO2': 'Gg CO2 / yr',
-        'CH4': 'Gg CO2 / yr',
-        'N2O': 'Gg CO2 / yr',
-        'HFCs': 'Gg CO2 / yr',
-        'PFCs': 'Gg CO2 / yr',
-        'SF6': 'Gg CO2 / yr',
-    }
-}
-
-# definitions for conversion to interchange format
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-    "cat_name_translation": ["cat_name_translation", "category"]
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_KOR_INV",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "KOR-GHG-Inventory",
-    "provenance": "measured",
-    "area": "KOR",
-    "scenario": "BUR4",
-}
-
-coords_value_mapping = {
-    "cat_name_translation": cat_name_translations,
-    "category": cat_codes,
-}
-
-# filtering after IF creation to be able to use the IPCC codes
-filter_remove = {
-    "f1": {
-        "category (IPCC1996_KOR_INV)": "\IGNORE",
-    },
-    "livestock": { # temp until double cat name problem is solved
-        "category (IPCC1996_KOR_INV)": {
-            '4.B.1', '4.B.10', '4.B.2', '4.B.3', '4.B.4',
-            '4.B.5', '4.B.6', '4.B.7', '4.B.8', '4.B.9',
-        }
-    }
-}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/418616, http://www.gir.go.kr/home/file/readDownloadFile.do?fileId=4856&fileSeq=2",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de.de",
-    "title": "Republic of Korea: BUR4 / National Greenhouse Gas Inventory Report 2020",
-    "comment": "Read fom xlsx file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-cols_for_space_stripping = []
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# start data reading
-# ###
-
-# change working directory to script directory for proper folder names
-script_path = os.path.abspath(sys.argv[0])
-script_dir_name = os.path.dirname(script_path)
-os.chdir(script_dir_name)
-
-df_all = None
-
-for sheet in sheets_to_read:
-    # read current sheet (one sheet per gas)
-    df_current = pd.read_excel(input_folder / inventory_file, sheet_name=sheet, skiprows=3, nrows=144, usecols=cols_to_read,
-                               engine="openpyxl")
-    # drop all rows where the index cols (category UNFCCC_GHG_data and name) are both NaN
-    # as without one of them there is no category information
-    df_current.dropna(axis=0, how='all', subset=index_cols, inplace=True)
-    # set index. necessary for the stack operation in the conversion to long format
-    # df_current = df_current.set_index(index_cols)
-    # add columns
-    for col in sheet_metadata.keys():
-        df_current.insert(1, col, sheet_metadata[col][sheet])
-    # aggregate to one df
-    if df_all is None:
-        df_all = df_current
-    else:
-        df_all = pd.concat([df_all, df_current])
-
-df_all = df_all.reset_index(drop=True)
-# rename category col because filtering produces problems with korean col names
-df_all.rename(columns={"분야·부문/연도": "category"}, inplace=True)
-
-# create copies of category col for further processing
-df_all["orig_cat_name"] = df_all["category"]
-df_all["cat_name_translation"] = df_all["category"]
-
-# make sure all col headers are str
-df_all.columns = df_all.columns.map(str)
-
-# ###
-# convert to PRIMAP2 interchange format
-# ###
-data_if = pm2.pm2io.convert_wide_dataframe_if(
-    df_all,
-    coords_cols=coords_cols,
-    add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True
-    )
-
-filter_data(data_if, filter_remove=filter_remove)
-
-data_pm2 = pm2.pm2io.from_interchange_format(data_if)
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)

+ 0 - 493
UNFCCC_GHG_data/UNFCCC_reader/Singapore/config_SGP_BUR5.py

@@ -1,493 +0,0 @@
-table_def_templates = {
-    '66_1': {  # 66
-        "area": ['68,743,522,157'],
-        "cols": ['224,280,319,359,399,445,481'],
-        "rows_to_fix": {
-            # 2: ['and Sink Categories',],
-            3: ['1A2 Manufacturing Industries',
-                '1B3 Other Emissions from', '1C - Carbon Dioxide Transport',
-                '2 — INDUSTRIAL PROCESSES AND', '2D - Non-Energy Products from',
-                '2F - Product Uses as Substitutes for',
-                '2G - Other Product Manufacture'],
-        },
-    },
-    '66_2': {  # 66
-        "area": ['671,744,1117,265'],
-        "cols": ['824,875,912,954,996,1040,1082'],
-        "rows_to_fix": {
-            3: ['3 — AGRICULTURE, FORESTRY AND', '3C - Aggregate Sources and Non-CO2',
-                '4C - Incineration and Open Burning',
-                '4D -  Wastewater Treatment',
-                '5A - Indirect N2O emissions from the', 'CO2 from Biomass Combustion',
-                ],
-        },
-    },
-    '67_1': {  # 67
-        "area": ['70,727,554,159'],
-        "cols": ['207,254,291,319,356,400,442,468,503'],
-        "rows_to_fix": {
-            2: ['2 — INDUSTRIAL PROCESSES', '2A4 Other Process Uses',
-                '2B4 Caprolactam, Glyoxal and', '2B8 Petrochemical and',
-                ],
-            3: ['Total National Emissions',
-                ],
-        },
-    },
-    '67_2': {  # 67
-        "area": ['666,725,1150,119'],
-        "cols": ['801,847,889,915,952,996,1036,1063,1098'],
-        "rows_to_fix": {
-            2: ['2D - Non-Energy Products from', '2G - Other Product',
-                '2G2 SF6 and PFCs from', '2H2 Food and Beverages',
-                ],
-            3: ['Total National Emissions', '2E1 Integrated Circuit',
-                '2F - Product Uses as Substitutes for', '2F1 Refrigeration and',
-                ],
-        },
-    },
-    '68_1': {  # 68
-        "area": ['66,787,524,217'],
-        "cols": ['205,261,315,366,415,473'],
-        "rows_to_fix": {
-            2: ['2 — INDUSTRIAL PROCESSES', '2A4 Other Process Uses',
-                '2B4 Caprolactam, Glyoxal and', '2B8 Petrochemical and',
-                ],
-            3: ['Total National Emissions',
-                ],
-        },
-    },
-    '68_2': {  # 68
-        "area": ['666,787,1119,180'],
-        "cols": ['808,854,910,961,1017,1066'],
-        "rows_to_fix": {
-            2: ['2D - Non-Energy Products from',
-                '2F - Product Uses as Substitutes for', '2F1 Refrigeration and Air',
-                '2G2 SF6 and PFCs from Other', '2H2 Food and Beverages',
-                ],
-            3: ['Total National Emissions', '2E1 Integrated Circuit or',
-                '2G - Other Product Manufacture',
-                ],
-        },
-    },
-    '84_1': {  # 84
-        "area": ['70,667,525,112'],
-        "cols": ['193,291,345,396,440,480'],
-        "rows_to_fix": {},
-    },
-    '84_2': {  # 84
-        "area": ['668,667,1115,83'],
-        "cols": ['854,908,954,1001,1038,1073'],
-        "rows_to_fix": { },
-    },
-    '85_1': {  # 85
-        "area": ['70,680,531,170'],
-        "cols": ['275,328,375,414,456,489'],
-        "rows_to_fix": {},
-    },
-    '85_2': {  # 85
-        "area": ['663,675,1117,175'],
-        "cols": ['849,908,954,1001,1045,1073'],
-        "rows_to_fix": {
-            3: ['3C — Aggregate Sources and Non-CO2',
-                '3C4 - Direct N2O Emissions from', '3C5 - Indirect N2O Emissions from',
-                '3C6 - Indirect N2O Emissions from']
-        },
-    },
-    '92': {  # 92
-        "area": ['72,672,514,333'],
-        "cols": ['228,275,319,361,398,438,489'],
-        "rows_to_fix": {
-            3: ['4A1 Managed Waste',
-                '4A2 Unmanaged Waste', '4A3 Uncategorised Waste',
-                '4C - Incineration and', '4D - Wastewater Treatment',
-                '4D1 Domestic Wastewater', '4D2 Industrial Wastewater']
-        },
-    },
-    '95_1': {  # 95
-        "area": ['70,731,507,149'],
-        "cols": ['233,307,375,452'],
-        "drop_rows": [0, 1, 2, 3],
-        "rows_to_fix": {
-            3: ['Total (Net)', '1A2 Manufacturing Industries',
-                '2 — INDUSTRIAL PROCESSES', '3 — AGRICULTURE, FORESTRY',
-                '3C - Aggregate Sources and Non-CO2', '4C - Incineration and Open',
-                'Clinical Waste', '4D - Wastewater Treatment',
-                'CO2 from Biomass Combustion for']
-        },
-        "header": {
-            'entity': ['Greenhouse Gas Source and Sink Categories',
-                       'Net CO2', 'CH4', 'N2O', 'HFCs'],
-            'unit': ['', 'Gg', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq'],
-        },
-    },
-    '95_2': {  # 95
-        "area": ['666,731,1103,149'],
-        "cols": ['829,903,971,1048'],
-        "drop_rows": [0, 1, 2, 3, 4, 5],
-        "rows_to_fix": {
-            3: ['Total (Net)', '1A2 Manufacturing Industries',
-                '2 — INDUSTRIAL PROCESSES', '3 — AGRICULTURE, FORESTRY',
-                '3C - Aggregate Sources and Non-CO2', '4C - Incineration and Open',
-                'Clinical Waste', '4D - Wastewater Treatment',
-                'CO2 from Biomass Combustion for']
-        },
-        "header": {
-            'entity': ['Greenhouse Gas Source and Sink Categories',
-                       'PFCs', 'SF6', 'NF3', 'Total (Net) National Emissions'],
-            'unit': ['', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq'],
-        },
-    },
-}
-
-table_defs = {
-    '66': {
-        "templates": ['66_1', '66_2'],
-        # "header_rows": [0, 1],
-        "header": {
-            'entity': ['Greenhouse Gas Source and Sink Categories', 'Net CO2',
-                       'CH4', 'N2O', 'HFCs', 'PFCs', 'SF6', 'NF3'],
-            'unit': ['', 'Gg', 'Gg', 'Gg', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq'],
-        },
-        "drop_rows": [0, 1, 2, 3],
-        # "drop_cols": ['NF3', 'SF6'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2018,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "2018",
-    },
-    '67': {
-        "templates": ['67_1', '67_2'],
-        "header": {
-            'entity': ['Greenhouse Gas Source and Sink Categories', 'HFC-23', 'HFC-32',
-                       'HFC-41', 'HFC-125', 'HFC-134a', 'HFC-143a', 'HFC-152a',
-                       'HFC-227ea', 'HFC-43-10mee'],
-            'unit': ['', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg'],
-        },
-        "drop_rows": [0, 1, 2, 3],
-        # "drop_cols": ['NF3', 'SF6'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2018,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "2018_fgases",
-    },
-    '68': {
-        "templates": ['68_1', '68_2'],
-        "header": {
-            'entity': ['Greenhouse Gas Source and Sink Categories', 'PFC-14',
-                       'PFC-116', 'PFC-218', 'PFC-318', 'SF6', 'NF3'],
-            'unit': ['', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg'],
-        },
-        "drop_rows": [0, 1, 2],
-         "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2018,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "2018_fgases",
-    },
-    '84': {
-        "templates": ['84_1', '84_2'],
-        "header": {
-            'entity': ['Categories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'NMVOC'],
-            'unit': ['', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg'],
-        },
-        "drop_rows": [0, 1, 2, 3, 4, 5],
-        "category_col": "Categories",
-        "year": 2018,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "2018",
-    },
-    '85': {
-        "templates": ['85_1', '85_2'],
-        "header": {
-            'entity': ['Categories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'NMVOC'],
-            'unit': ['', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg'],
-        },
-        "drop_rows": [0, 1, 2, 3, 4, 5],
-        "category_col": "Categories",
-        "year": 2018,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "2018",
-    },
-    '92': {
-        "templates": ['92'],
-        "header": {
-            'entity': ['Categories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'NMVOC', 'SO2'],
-            'unit': ['', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg'],
-        },
-        "drop_rows": [0, 1, 2],
-        "category_col": "Categories",
-        "year": 2018,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "2018",
-    },
-    '95': {
-        "templates": ['95_1', '95_2'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2016,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "other",
-    },
-    '96': {
-        "templates": ['95_1', '95_2'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2014,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "other",
-    },
-    '97': {
-        "templates": ['95_1', '95_2'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2012,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "other",
-    },
-    '98': {
-        "templates": ['95_1', '95_2'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2010,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "other",
-    },
-    '99': {
-        "templates": ['95_1', '95_2'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 2000,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "other",
-    },
-    '100': {
-        "templates": ['95_1', '95_2'],
-        "category_col": "Greenhouse Gas Source and Sink Categories",
-        "year": 1994,
-        # "unit_info": unit_info_2018,
-        "coords_value_mapping": "other",
-    },
-}
-
-cat_names_fix = {
-    '14Ab Residential': '1A4b Residential',
-}
-
-values_replacement = {
-#    '': '-',
-    ' ': '',
-}
-
-gwp_to_use = "AR5GWP100"
-
-index_cols = ["orig_cat_name"]
-cols_for_space_stripping = index_cols
-
-unit_row = "header"
-
-## parameters part 2: conversion to PRIMAP2 interchnage format
-
-cats_remove = ['Information items']
-
-cat_codes_manual = {
-    'CO2 from Biomass Combustion for Energy Production': 'M.BIO',
-    'Total National Emissions and Removals': '0',
-    'Total (Net) National Emissions': '0',
-    'Clinical Waste Incineration': 'M.4.C.1',
-    'Hazardous Waste Incineration': 'M.4.C.2',
-    #'3 AGRICULTURE': 'M.AG',
-    '3 AGRICULTURE, FORESTRY AND OTHER LAND USE': '3',
-    #'3 LAND USE, LAND-USE CHANGE AND FORESTRY': 'M.LULUCF',
-}
-
-
-cat_code_regexp = r'(?P<code>^[A-Za-z0-9]{1,7})\s.*'
-
-# special header as category code and name in one column
-header_long = ["orig_cat_name", "entity", "unit", "time", "data"]
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_PRIMAP", #two extra categories
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "SGP-GHG-inventory ",
-    "provenance": "measured",
-    "area": "SGP",
-    "scenario": "BUR5"
-}
-
-coords_value_mapping = {
-    "2018": {
-        "unit": "PRIMAP1",
-        "entity": {
-            'HFCs': f'HFCS ({gwp_to_use})',
-            'PFCs': f'PFCS ({gwp_to_use})',
-            'CH4': 'CH4',
-            'N2O': 'N2O',
-            'NF3': f'NF3 ({gwp_to_use})',
-            'Net CO2': 'CO2',
-            'SF6': f'SF6 ({gwp_to_use})',
-            'Total (Net) National Emissions': 'KYOTOGHG (AR5GWP100)',
-        },
-    },
-    "2018_fgases": {
-        "unit": "PRIMAP1",
-        "entity": {
-            'HFC-125': 'HFC125',
-            'HFC-134a': 'HFC134a',
-            'HFC-143a': 'HFC143a',
-            'HFC-152a': 'HFC152a',
-            'HFC-227ea': 'HFC227ea',
-            'HFC-23': 'HFC23',
-            'HFC-32': 'HFC32',
-            'HFC-41': 'HFC41',
-            'HFC-43-10mee': 'HFC4310mee',
-            'NF3': 'NF3',
-            'PFC-116': 'C2F6',
-            'PFC-14': 'CF4',
-            'PFC-218': 'C3F8',
-            'PFC-318': 'cC4F8',
-            'SF6': 'SF6',
-        },
-    },
-    "other": {
-        "unit": "PRIMAP1",
-        "entity": {
-            'HFCs': f'HFCS ({gwp_to_use})',
-            'CH4': f'CH4 ({gwp_to_use})',
-            'N2O': f'N2O ({gwp_to_use})',
-            'NF3': f'NF3 ({gwp_to_use})',
-            'Net CO2': 'CO2',
-            'PFCs': f'PFCS ({gwp_to_use})',
-            'SF6': f'SF6 ({gwp_to_use})',
-            'Total (Net) National Emissions': f'KYOTOGHG ({gwp_to_use})',
-        },
-    },
-}
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit"
-}
-
-add_coords_cols = {
-    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-filter_remove = {
-    # "f1" :{
-    #     "entity": ["HFC-125", "HFC-134a", "HFC-143a", "HFC-152a", "HFC-227ea",
-    #                "HFC-23", "HFC-32", "HFC-41", "HFC-43-10mee", "PFC-116",
-    #                "PFC-14", "PFC-218", "PFC-318", "NF3", "SF6"],
-    #     "category": "2"
-    # }
-}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/621650",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Singapore's Fifth National Communication and Fifth Biannial Update "
-             "Report",
-    "comment": "Read fom pdf file by Johannes Gütschow",
-    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
-}
-
-
-## processing
-aggregate_sectors = {
-    '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G', '2.H'],
-          'name': 'IPPU'},
-    'M.3.C.1.AG': {'sources': ['3.C.1.b', '3.C.1.c'], 'name': 'Emissions from Biomass Burning (Agriculture)'},
-    'M.3.C.1.LU': {'sources': ['3.C.1.a', '3.C.1.d'], 'name': 'Emissions from Biomass Burning (LULUCF)'},
-    'M.3.C.AG': {'sources': ['M.3.C.1.AG', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
-                             '3.C.6', '3.C.7', '3.C.8'],
-                 'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
-    'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock emissions'},
-    'M.AG': {'sources': ['M.AG.ELV', '3.A'], 'name': 'Agriculture'},
-    'M.LULUCF': {'sources': ['M.3.C.1.LU', '3.B', '3.D'],
-                 'name': 'Land Use, Land Use Change, and Forestry'},
-    'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'], 'name': 'National Total Excluding LULUCF'},
-    '0': {'sources': ['1', '2', '3', '4', '5'], 'name': 'National Total'},
-}
-
-
-processing_info_step1 = {
-    # aggregate IPPU which is missing for individual fgases so it can be used in the
-    # next step (downscaling)
-    'aggregate_cats': {
-        '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G', '2.H'],
-              'name': 'IPPU'},
-    },
-    'tolerance': 1, # because ch4 is inconsistent
-}
-
-processing_info_step2 =  {
-    'aggregate_cats': aggregate_sectors,
-    'downscale': {
-        'sectors': {
-            'IPPU': {
-                'basket': '2',
-                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.E',
-                                    '2.F', '2.G', '2.H'],
-                'entities': ['CO2', 'N2O', f'PFCS ({gwp_to_use})',
-                             f'HFCS ({gwp_to_use})', 'SF6', 'NF3'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            # AFOLU downscaling. Most is zero anyway
-            '3C': {
-                'basket': '3.C',
-                'basket_contents': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
-                                    '3.C.6', '3.C.7', '3.C.8'],
-                'entities': ['CO2', 'CH4', 'N2O'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            '3C1': {
-                'basket': '3.C.1',
-                'basket_contents': ['3.C.1.a', '3.C.1.b', '3.C.1.c', '3.C.1.d'],
-                'entities': ['CO2', 'CH4', 'N2O'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-            '3D': {
-                'basket': '3.D',
-                'basket_contents': ['3.D.1', '3.D.2'],
-                'entities': ['CO2', 'CH4', 'N2O'],
-                'dim': 'category (IPCC2006_PRIMAP)',
-            },
-        },
-        'entities': {
-            'HFCS': {
-                'basket': f'HFCS ({gwp_to_use})',
-                'basket_contents': ['HFC125', 'HFC134a', 'HFC143a', 'HFC23',
-                                    'HFC32', 'HFC4310mee', 'HFC227ea'],
-                'sel': {'category (IPCC2006_PRIMAP)':
-                            ['0', '2', '2.C', '2.E',
-                             '2.F', '2.G', '2.H']},
-            },
-            'PFCS': {
-                'basket': f'PFCS ({gwp_to_use})',
-                'basket_contents': ['C2F6', 'C3F8', 'CF4', 'cC4F8'],
-                'sel': {'category (IPCC2006_PRIMAP)':
-                            ['0', '2', '2.C', '2.E',
-                             '2.F', '2.G', '2.H']},
-            },
-        }
-    },
-    'remove_ts': {
-        'fgases': { # unnecessary and complicates aggregation for
-            # other gases
-            'category': ['5', '5.B'],
-            'entities': [f'HFCS ({gwp_to_use})', f'PFCS ({gwp_to_use})', 'SF6', 'NF3'],
-        },
-        'CH4': { # inconsistent with IPPU sector
-            'category': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G', '2.H'],
-            'entities': ['CH4'],
-        },
-    },
-    # 'basket_copy': {
-    #     'GWPs_to_add': ["SARGWP100", "AR4GWP100", "AR6GWP100"],
-    #     'entities': ["HFCS", "PFCS"],
-    #     'source_GWP': gwp_to_use,
-    # },
-}
-
-
-

+ 0 - 260
UNFCCC_GHG_data/UNFCCC_reader/Singapore/read_SGP_BUR5_from_pdf.py

@@ -1,260 +0,0 @@
-# read Singapore fifth BUR from pdf
-
-
-import camelot
-import primap2 as pm2
-import pandas as pd
-#import numpy as np
-from pathlib import Path
-import locale
-
-from UNFCCC_GHG_data.helper import process_data_for_country, gas_baskets
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from UNFCCC_GHG_data.helper import fix_rows
-from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
-from config_SGP_BUR5 import table_def_templates, table_defs, index_cols
-from config_SGP_BUR5 import values_replacement, header_long, cats_remove, \
-    cat_codes_manual, cat_code_regexp, cat_names_fix
-from config_SGP_BUR5 import coords_cols, coords_terminologies, coords_defaults, \
-    coords_value_mapping, meta_data, add_coords_cols, filter_remove
-from config_SGP_BUR5 import processing_info_step1, processing_info_step2
-
-### genral configuration
-input_folder = downloaded_data_path / 'UNFCCC' / 'Singapore' / 'BUR5'
-output_folder = extracted_data_path / 'UNFCCC' / 'Singapore'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'SGP_BUR5_2022_'
-inventory_file_pdf = 'Singapore_-_NC5BUR5.pdf'
-#years_to_read = range(1990, 2018 + 1)
-
-# define locale to use for str to float conversion
-locale_to_use = 'en_SG.UTF-8'
-locale.setlocale(locale.LC_NUMERIC, locale_to_use)
-
-pagesToRead = table_defs.keys()
-
-compression = dict(zlib=True, complevel=9)
-
-## part 1: read the data from pdf
-### part 1.a: 2016 inventory
-
-data_pm2 = None
-for page in pagesToRead:
-    print(f"++++++++++++++++++++++++++++++++")
-    print(f"+++++ Working on page {page} ++++++")
-    print(f"++++++++++++++++++++++++++++++++")
-
-    df_this_page = None
-    for table_on_page in table_defs[page]["templates"]:
-        print(f"Reading table {table_on_page}")
-        area = table_def_templates[table_on_page]["area"]
-        cols = table_def_templates[table_on_page]["cols"]
-        tables = camelot.read_pdf(str(input_folder / inventory_file_pdf),
-                                  pages=str(page), flavor='stream',
-                                  table_areas=area, columns=cols, split_text=True)
-
-        df_current = tables[0].df.copy(deep=True)
-        # drop the old header
-        if "drop_rows" in table_defs[page].keys():
-            df_current = df_current.drop(table_defs[page]["drop_rows"])
-        elif "drop_rows" in table_def_templates[table_on_page].keys():
-            df_current = df_current.drop(
-                table_def_templates[table_on_page]["drop_rows"])
-        # add new header
-        if 'header' in table_defs[page].keys():
-            df_current.columns = pd.MultiIndex.from_tuples(
-                zip(table_defs[page]['header']['entity'],
-                    table_defs[page]['header']['unit']))
-        else:
-            df_current.columns = pd.MultiIndex.from_tuples(
-                zip(table_def_templates[table_on_page]['header']['entity'],
-                    table_def_templates[table_on_page]['header']['unit']))
-
-        # drop cols if necessary
-        if "drop_cols" in table_defs[page].keys():
-            # print(df_current.columns.values)
-            df_current = df_current.drop(columns=table_defs[page]["drop_cols"])
-        elif "drop_cols" in table_def_templates[table_on_page].keys():
-            df_current = df_current.drop(columns=table_defs[page]["drop_cols"])
-
-        # rename category column
-        df_current.rename(columns={table_defs[page]["category_col"]: index_cols[0]},
-                          inplace=True)
-
-        # replace double \n
-        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("\n", " ")
-        # replace double and triple spaces
-        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("   ", " ")
-        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("  ", " ")
-
-        # fix the split rows
-        for n_rows in table_def_templates[table_on_page]["rows_to_fix"].keys():
-            df_current = fix_rows(df_current,
-                                  table_def_templates[table_on_page]["rows_to_fix"][
-                                      n_rows], index_cols[0], n_rows)
-
-        # replace category names with typos
-        df_current[index_cols[0]] = df_current[index_cols[0]].replace(cat_names_fix)
-
-        # replace empty stings
-        df_current = df_current.replace(values_replacement)
-
-        # set index
-        # df_current = df_current.set_index(index_cols)
-        # strip trailing and leading  and remove "^"
-        for col in df_current.columns.values:
-            df_current[col] = df_current[col].str.strip()
-            df_current[col] = df_current[col].str.replace("^", "")
-
-        # print(df_current)
-        # aggregate dfs for this page
-        if df_this_page is None:
-            df_this_page = df_current.copy(deep=True)
-        else:
-            # find intersecting cols
-            cols_this_page = df_this_page.columns.values
-            # print(f"cols this page: {cols_this_page}")
-            cols_current = df_current.columns.values
-            # print(f"cols current: {cols_current}")
-            cols_both = list(set(cols_this_page).intersection(set(cols_current)))
-            # print(f"cols both: {cols_both}")
-            if len(cols_both) > 0:
-                df_this_page = df_this_page.merge(df_current, how='outer', on=cols_both,
-                                                  suffixes=(None, None))
-            else:
-                df_this_page = df_this_page.merge(df_current, how='outer',
-                                                  left_index=True, right_index=True,
-                                                  suffixes=(None, None))
-
-            df_this_page = df_this_page.groupby(index_cols).first().reset_index()
-            # print(df_this_page)
-            # df_all = df_all.join(df_current, how='outer')
-
-    # set index and convert to long format
-    df_this_page = df_this_page.set_index(index_cols)
-    df_this_page_long = pm2.pm2io.nir_convert_df_to_long(df_this_page,
-                                                         table_defs[page]["year"],
-                                                         header_long)
-
-    # drop the rows with memo items etc
-    for cat in cats_remove:
-        df_this_page_long = df_this_page_long.drop(
-            df_this_page_long.loc[df_this_page_long.loc[:, index_cols[0]] == cat].index)
-
-    # make a copy of the categories row
-    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:, index_cols[0]]
-
-    # replace cat names by codes in col "Categories"
-    # first the manual replacements
-    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:, "category"].replace(
-        cat_codes_manual)
-    # then the regex repalcements
-    repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('code'))
-    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:,
-                                           "category"].str.replace(cat_code_regexp,
-                                                                   repl, regex=True)
-    df_this_page_long.loc[:, "category"].unique()
-
-    # strip spaces in data col
-    df_this_page_long.loc[:, "data"] = df_this_page_long.loc[:, "data"].str.strip()
-
-    df_this_page_long = df_this_page_long.reset_index(drop=True)
-
-    # make sure all col headers are str
-    df_this_page_long.columns = df_this_page_long.columns.map(str)
-
-    # remove thousands separators as pd.to_numeric can't deal with that
-    df_this_page_long.loc[:, "data"] = df_this_page_long.loc[:, "data"].str.replace(',',
-                                                                                    '')
-
-    # drop orig cat name as it's not unique over all tables (keep until here in case
-    # it's needed for debugging)
-    df_this_page_long = df_this_page_long.drop(columns='orig_cat_name')
-
-    data_page_if = pm2.pm2io.convert_long_dataframe_if(
-        df_this_page_long,
-        coords_cols=coords_cols,
-        #add_coords_cols=add_coords_cols,
-        coords_defaults=coords_defaults,
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=coords_value_mapping[
-            table_defs[page]["coords_value_mapping"]],
-        # coords_value_filling=coords_value_filling,
-        filter_remove=filter_remove,
-        # filter_keep=filter_keep,
-        meta_data=meta_data,
-        convert_str=True,
-        time_format='%Y',
-    )
-
-    # conversion to PRIMAP2 native format
-    data_page_pm2 = pm2.pm2io.from_interchange_format(data_page_if)
-
-    # combine with tables from other pages
-    if data_pm2 is None:
-        data_pm2 = data_page_pm2
-    else:
-        data_pm2 = data_pm2.pr.merge(data_page_pm2)
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"), data_if)
-
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding)
-
-
-#### processing
-data_proc_pm2 = data_pm2
-terminology_proc = coords_terminologies["category"]
-
-# actual processing
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets={},
-    processing_info_country=processing_info_step1,
-)
-
-
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=processing_info_step2,
-    cat_terminology_out = terminology_proc,
-    #category_conversion = None,
-    #sectors_out = None,
-)
-
-# adapt source and metadata
-# TODO: processing info is present twice
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"),
-    encoding=encoding)
-

+ 0 - 329
UNFCCC_GHG_data/UNFCCC_reader/Taiwan/config_TWN_NIR2022.py

@@ -1,329 +0,0 @@
-# config and functions for Taiwan NIR 2022
-
-from typing import Union, List
-import pandas as pd
-
-gwp_to_use = "AR4GWP100"
-
-def fix_rows(data: pd.DataFrame, rows_to_fix: list, col_to_use: str, n_rows: int)->pd.DataFrame:
-    for row in rows_to_fix:
-        #print(row)
-        # find the row number and collect the row and the next two rows
-        index = data.loc[data[col_to_use] == row].index
-        if not list(index):
-            print(f"Can't merge split row {row}")
-            print(data[col_to_use])
-        print(f"Merging split row {row}")
-        indices_to_drop = []
-        ####print(index)
-        for item in index:
-            loc = data.index.get_loc(item)
-            ####print(data[col_to_use].loc[loc + 1])
-            if n_rows == -2:
-                locs_to_merge = list(range(loc - 1, loc + 1))
-                loc_to_check = loc - 1
-            #if n_rows == -3:
-            #    locs_to_merge = list(range(loc - 1, loc + 2))
-            #elif n_rows == -5:
-            #    locs_to_merge = list(range(loc - 1, loc + 4))
-            else:
-                locs_to_merge = list(range(loc, loc + n_rows))
-                loc_to_check = loc + 1
-            
-            if data[col_to_use].loc[loc_to_check] == '':
-                rows_to_merge = data.iloc[locs_to_merge]
-                indices_to_merge = rows_to_merge.index
-                # replace numerical NaN values
-                ####print(rows_to_merge)
-                rows_to_merge = rows_to_merge.fillna('')
-                ####print("fillna")
-                ####print(rows_to_merge)
-                # join the three rows
-                new_row = rows_to_merge.agg(' '.join)
-                # replace the double spaces that are created 
-                # must be done here and not at the end as splits are not always 
-                # the same and join would produce different col values
-                new_row = new_row.str.replace("  ", " ")  
-                new_row = new_row.str.strip()
-                #new_row = new_row.str.replace("N O", "NO") 
-                #new_row = new_row.str.replace(", N", ",N")
-                #new_row = new_row.str.replace("- ", "-")
-                data.loc[indices_to_merge[0]] = new_row
-                indices_to_drop = indices_to_drop + list(indices_to_merge[1:])
-        
-        data = data.drop(indices_to_drop)
-        data = data.reset_index(drop=True)
-    return data
-
-# page defs tp hold information on reading the table
-page_defs = {
-    '5': { 
-        "table_areas": ['36,523,563,68'],
-        "split_text": False,
-        "flavor": "stream",
-    },
-    '6': {
-        "table_areas": ['34,562,563,53'],
-        #"columns": ['195,228,263,295,328,363,395,428,462,495,529'], # works without
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '7': {
-        "table_areas": ['36,740,499,482', '36,430,564,53'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '8': {
-        "table_areas": ['35,748,503,567'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '9': {
-        "table_areas": ['35,747,565,315', '36,273,565,50'],
-        "split_text": False,
-        "flavor": "stream",
-    },
-    '11': {
-        "table_areas": ['35,744,563,434'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '12': {
-        "table_areas": ['33,747,562,86'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '13': {
-        "table_areas": ['34,303,564,54'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '14': {
-        "table_areas": ['34,754,564,256'],
-        "columns": ['220,251,283,314,344,371,406,438,470,500,530'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '15': {
-        "table_areas": ['34,487,564,42'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '16': {
-        "table_areas": ['34,418,564,125'],
-        #"columns": ['107,209,241,273,306,338,369,402,433,466,498,533'],
-        "split_text": True,
-        "flavor": "lattice",
-    }, # with stream the row index is messed up with lattice the column index ... red with lattice and fix col header manualy
-    '17': {
-        "table_areas": ['34,534,564,49'],
-        "columns": ['188,232,263,298,331,362,398,432,464,497,530'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-}
-
-# table defs to hold information on how to process the tables
-table_defs = {
-    'ES2.2': { # 1990-2020 Carbon Dioxide Emissions and Sequestration in Taiwan
-        "tables": [1, 2],
-        "rows_to_fix": {
-            0: { 
-                3: ['1.A.4.c Agriculture, Forestry, Fishery, and',
-                    '2.D Non-Energy Products from Fuels and', 
-                    '4. Land Use, Land Use Change and Forestry'],
-            },
-        },
-        "index_cols": ['GHG Emission Source and Sinks'],
-        "wide_keyword": 'GHG Emission Source and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": "CO2",
-        "unit": "kt",
-        "cat_codes_manual": {
-            'Net GHG Emission (including LULUCF)': '0',
-            'Total GHG Emission (excluding LULUCF)': 'M.0.EL',
-        },            
-    },
-    'ES2.3': { # 1990-2020 Methane Emissions in Taiwan
-        "tables": [3, 4],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": f"CH4 ({gwp_to_use})",
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total Methane Emissions': '0',
-        },
-    },
-    'ES2.4': { # 1990-2020 Nitrous Oxide Emissions in Taiwan
-        "tables": [5],
-        "fix_cats": {
-            0: {
-                "Total Nitrous Oxide Emissionsl": "Total Nitrous Oxide Emissions",
-            },
-        },            
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": f"N2O ({gwp_to_use})",
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total Nitrous Oxide Emissions': '0',
-        },        
-    },
-    'ES3.1': { # 1990-2020 Greenhouse Gas Emission in Taiwan by Sector
-        "tables": [7],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": f"KYOTOGHG ({gwp_to_use})",
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Net GHG Emission (including LULUCF)': '0',
-            'Total GHG Emission (excluding LULUCF)': 'M.0.EL',
-        },
-    },
-    'ES3.2': { # 1990-2020 Greenhouse Gas Emissions Produced by Energy Sector in Taiwan
-        "tables": [8],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "gas_splitting": {
-            "Total CO2 Emission": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total Emission from Energy Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission': '1',
-            'Total CH4 Emission': '1',
-            'Total N2O Emission': '1',
-            'Total Emission from Energy Sector': '1',
-        },
-    },
-    'ES3.3': { # 1990-2020 Greenhouse Gas Emissions Produced by Industrial Process and Product Use Sector (IPPU) in Taiwan
-        "tables": [9,10],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "gas_splitting": {
-            "Total CO2 Emission": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total HFCs Emission": f"HFCS ({gwp_to_use})",
-            "Total PFCs Emission (2.E Electronics Industry)": f"PFCS ({gwp_to_use})",
-            "Total SF6 Emission": f"SF6 ({gwp_to_use})",
-            "Total NF3 Emission (2.E Electronics Industry)": f"NF3 ({gwp_to_use})",
-            "Total Emission from IPPU Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission': '2',
-            'Total CH4 Emission': '2',
-            'Total N2O Emission': '2',
-            'Total HFCs Emission': '2',
-            'Total PFCs Emission (2.E Electronics Industry)': '2.E',
-            'Total SF6 Emission': '2',
-            'Total NF3 Emission (2.E Electronics Industry)': '2.E',
-            'Total Emission from IPPU Sector': '2',
-        },
-        "drop_rows": [
-            ("2.D Non-Energy Products from Fuels and Solvent Use", "CO2"), # has lower significant digits than in table ES2.2
-        ]
-    }, 
-    'ES3.4': { # 1990-2020 Greenhouse Gas Emissions Produced by Agriculture Sector in Taiwan
-        "tables": [11],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "gas_splitting": {
-            "Total CO2 Emission (3.H Urea applied)": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total Emission From Agriculture Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission (3.H Urea applied)': '3.H',
-            'Total CH4 Emission': '3',
-            'Total N2O Emission': '3',
-            'Total Emission From Agriculture Sector': '3',
-        },
-    }, 
-    'ES3.6': { # 1990-2020 Greenhouse Gas Emissions in Taiwan by Waste Sector
-        "tables": [13],
-        "rows_to_fix": {
-            0: {
-                3: ["Total CO2 Emission"],
-            },
-        }, 
-        "index_cols": ['GHG Emission Sources and Sinks'], 
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, # two column header
-        "gas_splitting": {
-            "Total CO2 Emission (5.C Incineration and Open Burning of Waste)": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total Emission from Waste Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission (5.C Incineration and Open Burning of Waste)': '5.C',
-            'Total CH4 Emission': '5',
-            'Total N2O Emission': '5',
-            'Total Emission from Waste Sector': '5',
-        },
-    }, 
-}
-
-table_defs_skip = {
-    'ES2.1': { # 1990-2020 Greenhouse Gas Emissions and Sequestration in Taiwan by Type
-        "tables": [0],
-        "rows_to_fix": {
-            0: { 
-                3: ['CO2'],
-            },
-            1: {  # wherte col 0 is empty
-                3: ['Net GHG Emission', 'Total GHG Emission'],
-            },
-        },
-        "index_cols": ['GHG', 'GWP'],
-        "wide_keyword": 'GHG',
-        "col_wide_kwd": 0, 
-        "unit": "ktCO2eq",
-    },
-    'ES2.5': { # 1990-2020 Fluoride-Containing Gas Emissions in Taiwan
-        "tables": [6],
-        "rows_to_fix": {
-            0: {
-                -2: ['Total SF6 Emissions', 
-                     'Total NF3 Emissions'],
-            },
-        },
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        #"entity": "CO2",
-        "unit": "ktCO2eq",
-    },
-    'ES3.5': { # skip for now: 1990-2020 Changes in Carbon Sequestration by LULUCF Sector in Taiwan2],
-        "tables": [12],
-        "rows_to_fix": {}, 
-        "index_cols": ['GHG Emission Sources and Sinks'], #header is merged col :-(
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, # two column header
-        "unit": "kt",
-        "entity": "CO2",
-    }, # need to consider the two columns specially (merge?)
-}

+ 0 - 447
UNFCCC_GHG_data/UNFCCC_reader/Taiwan/config_TWN_NIR2023.py

@@ -1,447 +0,0 @@
-# config and functions for Taiwan NIR 2022
-
-from typing import Union, List
-import pandas as pd
-import xarray as xr
-from typing import Optional, Any
-
-gwp_to_use = "AR4GWP100"
-terminology_proc = 'IPCC2006_PRIMAP'
-
-##### Table definitions
-# page defs to hold information on reading the table
-page_defs = {
-    '5': { 
-        "table_areas": ['36,523,563,68'],
-        "split_text": False,
-        "flavor": "stream",
-    },
-    '6': {
-        "table_areas": ['34,562,563,53'],
-        #"columns": ['195,228,263,295,328,363,395,428,462,495,529'], # works without
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '7': {
-        "table_areas": ['36,743,531,482', '36,425,564,54'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '8': {
-        "table_areas": ['35,748,534,567'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '9': {
-        "table_areas": ['34,753,565,286', '34,235,565,63'],
-        "split_text": False,
-        "flavor": "stream",
-    },
-    '10': {
-        "table_areas": ['34,753,565,449'],
-        "split_text": False,
-        "flavor": "stream",
-    },
-    '11': {
-        "table_areas": ['32,522,566,208'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '12': {
-        "table_areas": ['33,549,562,64'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '13': {
-        "table_areas": ['31,761,532,517'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '14': {
-        "table_areas": ['32,751,563,70'],
-        "columns": ['217,250,282,313,344,374,406,437,468,501,531'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '15': {
-        "table_areas": ['32,345,565,53'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '16': {
-        "table_areas": ['32,745,532,597'],
-        "split_text": True,
-        "flavor": "stream",
-    },
-    '18': {
-        "table_areas": ['30,747,564,260'],
-        "columns": ['188,232,263,298,331,362,398,432,464,497,530'],
-        "split_text": True,
-        "flavor": "stream",
-    }, # correct mistakes later
-}
-
-# table defs to hold information on how to process the tables
-table_defs = {
-    'ES2.2': { # 1990-2021 Carbon Dioxide Emissions and Sequestration in Taiwan
-        "tables": [1, 2],
-        "rows_to_fix": {
-            0: { 
-                3: ['1.A.4.c Agriculture, Forestry, Fishery, and',
-                    '2.D Non-Energy Products from Fuels and', 
-                    '4. Land Use, Land Use Change and Forestry'],
-            },
-        },
-        "index_cols": ['GHG Emission Source and Sinks'],
-        "wide_keyword": 'GHG Emission Source and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": "CO2",
-        "unit": "kt",
-        "cat_codes_manual": {
-            'Net GHG Emission (including LULUCF)': '0',
-            'Total GHG Emission (excluding LULUCF)': 'M.0.EL',
-        },            
-    },
-    'ES2.3': { # 1990-2021 Methane Emissions in Taiwan
-        "tables": [3, 4],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": f"CH4 ({gwp_to_use})",
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total Methane Emissions': '0',
-        },
-        "drop_rows": [
-            "5.B Garbage Biological Treatment", # has lower significant digits than in table ES3.6
-            "2. Industrial Process and Product Use Sector",  # inconsistent with subsector sum (rounding)
-        ],
-    },
-    'ES2.4': { # 1990-2021 Nitrous Oxide Emissions in Taiwan
-        "tables": [5],
-        "fix_cats": {
-            0: {
-                "Total Nitrous Oxide Emissionsl": "Total Nitrous Oxide Emissions",
-            },
-        },            
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": f"N2O ({gwp_to_use})",
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total Nitrous Oxide Emissions': '0',
-        },
-        "drop_rows": [
-            "3.F Field Burning of Agricultural Residues", # has lower significant digits than in table ES3.4
-            "5. Waste Sector", # error in 1996 data
-        ],
-    },
-    'ES2.5': { # 1990-2021 Fluoride-Containing Gas Emissions in Taiwan
-        "tables": [6,7],
-        "fix_cats": {},
-        "rows_to_fix": {
-            0: {
-                -2: ['Total PFCs Emissions (2.E Electronics Industry)',
-                    'Total SF6 Emissions',
-                    'Total NF3 Emissions (2.E Electronics Industry)'],
-            },
-        },
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0,
-        "gas_splitting": {
-            "Total HFCs Emissions": f"HFCS ({gwp_to_use})",
-            "Total PFCs Emissions (2.E Electronics Industry)": f"PFCS ({gwp_to_use})",
-            "Total SF6 Emissions": f"SF6 ({gwp_to_use})",
-            "Total NF3 Emissions (2.E Electronics Industry)": f"NF3 ({gwp_to_use})",
-            "Total Fluoride-Containing Gas Emissions": f"FGASES ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            "Total HFCs Emissions": "2",
-            "Total PFCs Emissions (2.E Electronics Industry)": "2.E",
-            "Total SF6 Emissions": "2",
-            "Total NF3 Emissions (2.E Electronics Industry)": "2.E",
-            "Total Fluoride-Containing Gas Emissions": "2",
-        },
-    },
-    'ES3.1': { # 1990-2021 Greenhouse Gas Emission in Taiwan by Sector
-        "tables": [8],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "entity": f"KYOTOGHG ({gwp_to_use})",
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Net GHG Emission (including LULUCF)': '0',
-            'Total GHG Emission (excluding LULUCF)': 'M.0.EL',
-        },
-    },
-    'ES3.2': { # 1990-2021 Greenhouse Gas Emissions Produced by Energy Sector in Taiwan
-        "tables": [9,10],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "gas_splitting": {
-            "Total CO2 Emission": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total Emission from Energy Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission': '1',
-            'Total CH4 Emission': '1',
-            'Total N2O Emission': '1',
-            'Total Emission from Energy Sector': '1',
-        },
-    },
-    'ES3.3': { # 1990-2021 Greenhouse Gas Emissions Produced by Industrial Process and Product Use Sector (IPPU) in Taiwan
-        "tables": [11],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "gas_splitting": {
-            "Total CO2 Emission": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total HFCs Emission": f"HFCS ({gwp_to_use})",
-            "Total PFCs Emission (2.E Electronics Industry)": f"PFCS ({gwp_to_use})",
-            "Total SF6 Emission": f"SF6 ({gwp_to_use})",
-            "Total NF3 Emission (2.E Electronics Industry)": f"NF3 ({gwp_to_use})",
-            "Total Emission from IPPU Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission': '2',
-            'Total CH4 Emission': '2',
-            'Total N2O Emission': '2',
-            'Total HFCs Emission': '2',
-            'Total PFCs Emission (2.E Electronics Industry)': '2.E',
-            'Total SF6 Emission': '2',
-            'Total NF3 Emission (2.E Electronics Industry)': '2.E',
-            'Total Emission from IPPU Sector': '2',
-        },
-        "drop_rows": [
-        #     ("2.D Non-Energy Products from Fuels and Solvent Use", "CO2"), # has lower significant digits than in table ES2.2
-            "Total CH4 Emission",  # inconsistent with subsectors (rounding)
-        ]
-    }, 
-    'ES3.4': { # 1990-2021 Greenhouse Gas Emissions Produced by Agriculture Sector in Taiwan
-        "tables": [12,13],
-        "rows_to_fix": {},
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        "gas_splitting": {
-            "Total CO2 Emission (3.H Urea applied)": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total Emission From Agriculture Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission (3.H Urea applied)': '3.H',
-            'Total CH4 Emission': '3',
-            'Total N2O Emission': '3',
-            'Total Emission From Agriculture Sector': '3',
-        },
-    }, 
-    'ES3.6': { # 1990-2020 Greenhouse Gas Emissions in Taiwan by Waste Sector
-        "tables": [14],
-        "rows_to_fix": {
-            0: {
-                3: ["Total CO2 Emission"],
-            },
-        }, 
-        "index_cols": ['GHG Emission Sources and Sinks'], 
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, # two column header
-        "gas_splitting": {
-            "Total CO2 Emission (5.C Incineration and Open Burning of Waste)": "CO2",
-            "Total CH4 Emission": f"CH4 ({gwp_to_use})",
-            "Total N2O Emission": f"N2O ({gwp_to_use})",
-            "Total Emission from Waste Sector": f"KYOTOGHG ({gwp_to_use})",
-            "GHG Emission Sources and Sinks": "entity",
-        },
-        "unit": "ktCO2eq",
-        "cat_codes_manual": {
-            'Total CO2 Emission (5.C Incineration and Open Burning of Waste)': '5.C',
-            'Total CH4 Emission': '5',
-            'Total N2O Emission': '5',
-            'Total Emission from Waste Sector': '5',
-        },
-    }, 
-}
-
-table_defs_skip = {
-    'ES2.1': { # 1990-2020 Greenhouse Gas Emissions and Sequestration in Taiwan by Type
-        "tables": [0],
-        "rows_to_fix": {
-            0: { 
-                3: ['CO2'],
-            },
-            1: {  # wherte col 0 is empty
-                3: ['Net GHG Emission', 'Total GHG Emission'],
-            },
-        },
-        "index_cols": ['GHG', 'GWP'],
-        "wide_keyword": 'GHG',
-        "col_wide_kwd": 0, 
-        "unit": "ktCO2eq",
-    },
-    'ES2.5': { # 1990-2020 Fluoride-Containing Gas Emissions in Taiwan
-        "tables": [6],
-        "rows_to_fix": {
-            0: {
-                -2: ['Total SF6 Emissions', 
-                     'Total NF3 Emissions'],
-            },
-        },
-        "index_cols": ['GHG Emission Sources and Sinks'],
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, 
-        #"entity": "CO2",
-        "unit": "ktCO2eq",
-    },
-    'ES3.5': { # skip for now: 1990-2020 Changes in Carbon Sequestration by LULUCF Sector in Taiwan2],
-        "tables": [12],
-        "rows_to_fix": {}, 
-        "index_cols": ['GHG Emission Sources and Sinks'], #header is merged col :-(
-        "wide_keyword": 'GHG Emission Sources and Sinks',
-        "col_wide_kwd": 0, # two column header
-        "unit": "kt",
-        "entity": "CO2",
-    }, # need to consider the two columns specially (merge?)
-}
-
-
-##### primap2 metadata
-cat_code_regexp = r'(?P<UNFCCC_GHG_data>^[a-zA-Z0-9\.]{1,7})\s.*'
-
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-    # "area": "Geo_code",
-}
-
-add_coords_cols = {
-    #    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_1996_Taiwan_Inv",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "TWN-GHG-Inventory",
-    "provenance": "measured",
-    "scenario": "2023NIR",
-    "area": "TWN",
-    # unit fill by table
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-}
-
-coords_value_filling = {}
-
-#
-filter_remove = {}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://www.cca.gov.tw/information-service/publications/national-ghg-inventory-report/1851.html",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "2023 Republic of China - National Greenhouse Gas Report",
-    "comment": "Read fom pdf file and converted to PRIMAP2 format by Johannes Gütschow",
-    "institution": "Republic of China - Environmental Protection Administration",
-}
-
-##### processing information
-cat_conversion = {
-    'mapping': {
-        '0': '0',
-        'M.0.EL': 'M.0.EL',
-        '1': '1',
-        '1.A.1': '1.A.1',
-        '1.A.2': '1.A.2',
-        '1.A.3': '1.A.3',
-        '1.A.4': '1.A.4',
-        '1.A.4.a': '1.A.4.a',
-        '1.A.4.b': '1.A.4.b',
-        '1.A.4.c': '1.A.4.c',
-        '1.B.1': '1.B.1',
-        '1.B.2': '1.B.2',
-        '2': '2',
-        '2.A': '2.A',
-        '2.B': '2.B',
-        '2.C': '2.C',
-        '2.D': '2.D',
-        '2.E': '2.E',
-        '2.F': '2.F',
-        '2.G': '2.G',
-        '2.H': '2.H',
-        '3': 'M.AG',
-        '3.A': '3.A.1',
-        '3.B': '3.A.2',
-        '3.C': '3.C.7',
-        '3.D': 'M.3.AS',
-        '3.F': '3.C.1.b',
-        '3.H': '3.C.3',
-        '4': 'M.LULUCF',
-        '5': '4',
-        '5.A': '4.A',
-        '5.B': '4.B',
-        '5.C': '4.C',
-        '5.D': '4.D',
-        '5.D.1': '4.D.1',
-        '5.D.2': '4.D.2',
-    },
-    'aggregate': {
-        '1.A': {'sources': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                'name': 'Fuel Combustion Activities'},
-        '1.B': {'sources': ['1.B.1', '1.B.2'], 'name': 'Fugitive Emissions from Fuels'},
-        '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G', '2.H'],
-              'name': 'Industrial Process and Product Use Sector'},
-        '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-        '3.B': {'sources': ['M.LULUCF'], 'name': 'Land'},
-        '3.C.1': {'sources': ['3.C.1.b'], 'name': 'Emissions from Biomass Burning'},
-        '3.C.5': {'sources': ['3.C.5.a', '3.C.5.b'],
-                  'name': 'Indirect N2O Emissions from Managed Soils'},
-        '3.C': {'sources': ['3.C.1', '3.C.3', 'M.3.AS', '3.C.7'],
-                'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-        'M.AG.ELV': {'sources': ['3.C'],
-                     'name': 'Agriculture excluding livestock emissions'},
-        'M.AG': {'sources': ['3.A', '3.C'], 'name': 'Agriculture'},
-        '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},  # consistency check
-        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4']}, # consistency check
-        '0': {'sources': ['1', '2', '3', '4']},  # consistency check
-    },
-}
-
-basket_copy = {
-    'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
-    'entities': ["HFCS", "PFCS"],
-    'source_GWP': gwp_to_use,
-}
-

+ 0 - 399
UNFCCC_GHG_data/UNFCCC_reader/Taiwan/read_TWN_2022-Inventory_from_pdf.py

@@ -1,399 +0,0 @@
-# this script reads data from Taiwan's 2022 national inventory
-# Data is read from the english summary pdf
-# TODO: add further GWPs and gas baskets
-
-import pandas as pd
-import primap2 as pm2
-import camelot
-import copy
-
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from primap2.pm2io._data_reading import matches_time_format
-
-from config_TWN_NIR2022 import table_defs, page_defs
-from config_TWN_NIR2022 import fix_rows, make_wide_table
-from config_TWN_NIR2022 import gwp_to_use
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'non-UNFCCC' / 'Taiwan'
-# TODO: move file to subfolder
-output_folder = extracted_data_path / 'non-UNFCCC' / 'Taiwan'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'TWN_inventory_2022_'
-inventory_file = '00_abstract_en.pdf'
-
-cat_code_regexp = r'(?P<UNFCCC_GHG_data>^[a-zA-Z0-9\.]{1,7})\s.*'
-
-time_format = "%Y"
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-    # "area": "Geo_code",
-}
-
-add_coords_cols = {
-    #    "orig_cat_name": ["orig_cat_name", "category"],
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC2006_1996_Taiwan_Inv",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "TWN-GHG-Inventory",
-    "provenance": "measured",
-    "scenario": "2022NIR",
-    "area": "TWN",
-    # unit fill by table
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-}
-
-coords_value_filling = {}
-
-#
-filter_remove = {}
-
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.saveoursky.org.tw/nir/tw_nir_2022.php",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "2022 Republic of China - National Greenhouse Gas Report",
-    "comment": "Read fom pdf file and converted to PRIMAP2 format by Johannes Gütschow",
-    "institution": "Republic of China - Environmental Protection Administration",
-}
-
-# config for part3: mapping to 2006 categpries
-
-cat_mapping = {
-    '3': 'M.AG',
-    '3.A': '3.A.1',
-    '3.B': '3.A.2',
-    '3.C': '3.C.7',
-    '3.D': 'M.3.AS',
-    '3.F': '3.C.1.b',
-    '3.H': '3.C.3',
-    '4': 'M.LULUCF',
-    '5': '4',
-    '5.A': '4.A',
-    '5.B': '4.B',
-    '5.C': '4.C',
-    '5.D': '4.D',
-    '5.D.1': '4.D.1',
-    '5.D.2': '4.D.2',
-}
-
-aggregate_cats = {
-    '1.A': {'sources': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-            'name': 'Fuel Combustion Activities'},
-    '1.B': {'sources': ['1.B.1', '1.B.2'], 'name': 'Fugitive Emissions from Fuels'},
-    '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-    '3.C.1': {'sources': ['3.C.1.b'], 'name': 'Emissions from Biomass Burning'},
-    '3.C.5': {'sources': ['3.C.5.a', '3.C.5.b'],
-              'name': 'Indirect N2O Emissions from Managed Soils'},
-    '3.C': {'sources': ['3.C.1', '3.C.3', 'M.3.AS', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-    '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-    'M.AG.ELV': {'sources': ['3.C'],
-                 'name': 'Agriculture excluding livestock emissions'},
-}
-
-
-# 2 for NF3, PFCs (from 2.E)
-aggregate_cats_NF3_PFC = {
-    '2': {'sources': ['2.E'], 'name': 'Industrial Process and Product Use Sector'},
-}
-
-compression = dict(zlib=True, complevel=9)
-
-# ###
-# read the tables from pdf
-# ###
-
-all_tables = []
-for page in page_defs:
-    print(f"Reading from page {page}")
-    new_tables = camelot.read_pdf(
-        str(input_folder / inventory_file),
-        pages=page,
-        **page_defs[page],
-        )
-    for table in new_tables:
-        all_tables.append(table.df)
-
-
-# ###
-# convert tables to primap2 format
-# ###
-data_pm2 = None
-for table_name in table_defs.keys():
-    print(f"Working on table: {table_name}")
-
-    table_def = copy.deepcopy(table_defs[table_name])
-    # combine all raw tables
-    df_this_table = all_tables[table_def["tables"][0]].copy(deep=True)
-    if len(table_def["tables"]) > 1:
-        for table in table_def["tables"][1:]:
-            df_this_table = pd.concat(
-                [df_this_table, all_tables[table]],
-                axis=0,
-                join='outer')
-
-    # fix for table ES3.6
-    if table_name == 'ES3.6':
-        col_idx = df_this_table[0] == "Total CO Emission"
-        df_this_table.loc[col_idx, 1:] = ''
-        df_this_table.loc[col_idx, 0] = 'Total CO2 Emission'
-
-    df_this_table = df_this_table.reset_index(drop=True)
-
-    # fix categories if necessary
-    if "fix_cats" in table_def.keys():
-        for col in table_def["fix_cats"]:
-            df_this_table[col] = df_this_table[col].replace(table_def["fix_cats"][col])
-
-    # fix rows
-    for col in table_def["rows_to_fix"].keys():
-        for n_rows in table_def["rows_to_fix"][col].keys():
-            print(f"Fixing {col}, {n_rows}")
-            # replace line breaks, long hyphens, double, and triple spaces in category names
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("\n", " ")
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("   ", " ")
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("  ", " ")
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("–", "-")
-            df_this_table = fix_rows(df_this_table,
-                                     table_def["rows_to_fix"][col][n_rows], col, n_rows)
-
-    # split by entity
-    if "gas_splitting" in table_def.keys():
-        col_entity = [''] * len(df_this_table)
-        last_entity = ''
-        for i in range(0, len(df_this_table)):
-            current_header = df_this_table[table_def["col_wide_kwd"]].iloc[i]
-            if current_header in table_def["gas_splitting"].keys():
-                last_entity = table_def["gas_splitting"][current_header]
-            col_entity[i] = last_entity
-
-        df_this_table["entity"] = col_entity
-        table_def["index_cols"].append("entity")
-
-    # make a wide table
-    df_this_table = make_wide_table(df_this_table, table_def["wide_keyword"],
-                                    table_def["col_wide_kwd"], table_def["index_cols"])
-
-    if "drop_rows" in table_def.keys():
-        df_this_table = df_this_table.drop(table_def["drop_rows"], axis=0)
-
-    # reset row index
-    df_this_table = df_this_table.reset_index(drop=False)
-
-    # add entity
-    if "entity" in table_def.keys():
-        df_this_table["entity"] = table_def["entity"]
-
-    # add unit
-    df_this_table["unit"] = table_def["unit"]
-
-    df_this_table = df_this_table.rename({table_def["index_cols"][0]: "orig_cat_name"},
-                                         axis=1)
-
-    # print(table_def["index_cols"][0])
-    # print(df_this_table.columns.values)
-
-    # make a copy of the categories row
-    df_this_table["category"] = df_this_table["orig_cat_name"]
-
-    # replace cat names by codes in col "category"
-    # first the manual replacements
-    df_this_table["category"] = df_this_table["category"].replace(
-        table_def["cat_codes_manual"])
-    # then the regex replacements
-    repl = lambda m: m.group('UNFCCC_GHG_data')
-    df_this_table["category"] = df_this_table["category"].str.replace(cat_code_regexp,
-                                                                      repl, regex=True)
-
-    ### convert to PRIMAP2 IF
-    # remove ','
-    time_format = '%Y'
-    time_columns = [
-        col
-        for col in df_this_table.columns.values
-        if matches_time_format(col, time_format)
-    ]
-
-    for col in time_columns:
-        df_this_table.loc[:, col] = df_this_table.loc[:, col].str.replace(',', '',
-                                                                          regex=False)
-
-    # drop orig_cat_name as it's not unique per category
-    df_this_table = df_this_table.drop(columns="orig_cat_name")
-
-    # coords_defaults_this_table = coords_defaults.copy()
-    # coords_defaults_this_table["unit"] = table_def["unit"]
-    df_this_table_if = pm2.pm2io.convert_wide_dataframe_if(
-        df_this_table,
-        coords_cols=coords_cols,
-        add_coords_cols=add_coords_cols,
-        coords_defaults=coords_defaults,
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=coords_value_mapping,
-        # coords_value_filling=coords_value_filling,
-        # filter_remove=filter_remove,
-        # filter_keep=filter_keep,
-        meta_data=meta_data
-    )
-
-    this_table_pm2 = pm2.pm2io.from_interchange_format(df_this_table_if)
-
-    if data_pm2 is None:
-        data_pm2 = this_table_pm2
-    else:
-        data_pm2 = data_pm2.pr.merge(this_table_pm2)
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-
-# ###
-# convert to IPCC2006 categories
-# ###
-data_if_2006 = data_if.copy(deep=True)
-data_if_2006
-# filter_data(data_if_2006, filter_remove=filter_remove_IPCC2006)
-data_if_2006 = data_if_2006.replace(
-    {'category (IPCC2006_1996_Taiwan_Inv)': cat_mapping})
-
-# rename the category col
-data_if_2006.rename(
-    columns={'category (IPCC2006_1996_Taiwan_Inv)': 'category (IPCC2006_PRIMAP)'},
-    inplace=True)
-data_if_2006.attrs['attrs']['cat'] = 'category (IPCC2006_PRIMAP)'
-data_if_2006.attrs['dimensions']['*'] = [
-    'category (IPCC2006_PRIMAP)' if item == 'category (IPCC2006_1996_Taiwan_Inv)'
-    else item for item in data_if_2006.attrs['dimensions']['*']]
-
-# aggregate categories
-for cat_to_agg in aggregate_cats:
-    mask = data_if_2006["category (IPCC2006_PRIMAP)"].isin(
-        aggregate_cats[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, "category (IPCC2006_PRIMAP)", cat_to_agg)
-        # df_combine.insert(1, "cat_name_translation", aggregate_cats[cat_to_agg]["name"])
-        # df_combine.insert(2, "orig_cat_name", "computed")
-
-        df_combine = df_combine.reset_index()
-
-        data_if_2006 = data_if_2006.append(df_combine)
-        data_if_2006 = data_if_2006.reset_index(drop=True)
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# aggregate categories
-for cat_to_agg in aggregate_cats_NF3_PFC:
-    mask = data_if_2006["category (IPCC2006_PRIMAP)"].isin(
-        aggregate_cats_NF3_PFC[cat_to_agg]["sources"])
-    mask_gas = data_if_2006["entity"].isin(
-        [f"NF3 ({gwp_to_use})", f"PFCS ({gwp_to_use})"])
-    df_test = data_if_2006[mask & mask_gas]
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, "category (IPCC2006_PRIMAP)", cat_to_agg)
-        # df_combine.insert(1, "cat_name_translation", aggregate_cats[cat_to_agg]["name"])
-        # df_combine.insert(2, "orig_cat_name", "computed")
-
-        df_combine = df_combine.reset_index()
-
-        data_if_2006 = data_if_2006.append(df_combine)
-        data_if_2006 = data_if_2006.reset_index(drop=True)
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-
-# convert to mass units from CO2eq
-entities_to_convert = ['N2O', 'SF6', 'CH4', 'NF3']
-entities_to_convert = [f"{entity} ({gwp_to_use})" for entity in entities_to_convert]
-
-for entity in entities_to_convert:
-    converted = data_pm2_2006[entity].pr.convert_to_mass()
-    basic_entity = entity.split(" ")[0]
-    converted = converted.to_dataset(name=basic_entity)
-    data_pm2_2006 = data_pm2_2006.pr.merge(converted)
-    data_pm2_2006[basic_entity].attrs["entity"] = basic_entity
-
-# drop the GWP data
-data_pm2_2006 = data_pm2_2006.drop_vars(entities_to_convert)
-
-# convert to IF
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
-
-# ###
-# save data
-# ###
-# data in original categories
-pm2.pm2io.write_interchange_format(output_folder /
-                                   (output_filename + coords_terminologies["category"]),
-                                   data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf((output_folder /
-                      (output_filename + coords_terminologies[
-                          "category"])).with_suffix(".nc"),
-                      encoding=encoding)
-
-# data in 2006 categories
-pm2.pm2io.write_interchange_format(output_folder /
-                                   (output_filename + "IPCC2006_PRIMAP"), data_if_2006)
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf((output_folder /
-                            (output_filename + "IPCC2006_PRIMAP")).with_suffix(".nc"),
-                           encoding=encoding)

+ 0 - 228
UNFCCC_GHG_data/UNFCCC_reader/Taiwan/read_TWN_2023-Inventory_from_pdf.py

@@ -1,228 +0,0 @@
-# this script reads data from Taiwan's 2023 national inventory
-# Data is read from the english summary pdf
-# TODO: add further GWPs and gas baskets
-
-import pandas as pd
-import primap2 as pm2
-import camelot
-import copy
-
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from UNFCCC_GHG_data.helper import compression, make_wide_table
-from UNFCCC_GHG_data.helper import process_data_for_country, gas_baskets
-from primap2.pm2io._data_reading import matches_time_format
-
-from config_TWN_NIR2022 import fix_rows
-from config_TWN_NIR2023 import table_defs, page_defs, cat_code_regexp
-from config_TWN_NIR2023 import terminology_proc
-from config_TWN_NIR2023 import gwp_to_use, basket_copy
-from config_TWN_NIR2023 import coords_cols, add_coords_cols, coords_defaults
-from config_TWN_NIR2023 import coords_terminologies, coords_value_mapping
-from config_TWN_NIR2023 import meta_data, cat_conversion
-
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'non-UNFCCC' / 'Taiwan' / '2023_NIR'
-output_folder = extracted_data_path / 'non-UNFCCC' / 'Taiwan'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-output_filename = 'TWN_inventory_2023_'
-inventory_file = '2023_NIR_executive_summary_english.pdf'
-
-# ###
-# read the tables from pdf
-# ###
-
-all_tables = []
-for page in page_defs:
-    print(f"Reading from page {page}")
-    new_tables = camelot.read_pdf(
-        str(input_folder / inventory_file),
-        pages=page,
-        **page_defs[page],
-        )
-    for table in new_tables:
-        all_tables.append(table.df)
-
-
-# ###
-# convert tables to primap2 format
-# ###
-data_pm2 = None
-for table_name in table_defs.keys():
-    print(f"Working on table: {table_name}")
-
-    table_def = copy.deepcopy(table_defs[table_name])
-    # combine all raw tables
-    df_this_table = all_tables[table_def["tables"][0]].copy(deep=True)
-    if len(table_def["tables"]) > 1:
-        for table in table_def["tables"][1:]:
-            df_this_table = pd.concat(
-                [df_this_table, all_tables[table]],
-                axis=0,
-                join='outer')
-
-    # fix for table ES3.6
-    if table_name == 'ES3.6':
-        col_idx = df_this_table[0] == "Total CO Emission"
-        df_this_table.loc[col_idx, 1:] = ''
-        df_this_table.loc[col_idx, 0] = 'Total CO2 Emission'
-
-    df_this_table = df_this_table.reset_index(drop=True)
-
-    # fix categories if necessary
-    if "fix_cats" in table_def.keys():
-        for col in table_def["fix_cats"]:
-            df_this_table[col] = df_this_table[col].replace(table_def["fix_cats"][col])
-
-    # fix rows
-    for col in table_def["rows_to_fix"].keys():
-        for n_rows in table_def["rows_to_fix"][col].keys():
-            print(f"Fixing {col}, {n_rows}")
-            # replace line breaks, long hyphens, double, and triple spaces in category names
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("\n", " ")
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("   ", " ")
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("  ", " ")
-            df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("–", "-")
-            df_this_table = fix_rows(df_this_table,
-                                     table_def["rows_to_fix"][col][n_rows], col, n_rows)
-
-    # split by entity
-    if "gas_splitting" in table_def.keys():
-        col_entity = [''] * len(df_this_table)
-        last_entity = ''
-        for i in range(0, len(df_this_table)):
-            current_header = df_this_table[table_def["col_wide_kwd"]].iloc[i]
-            if current_header in table_def["gas_splitting"].keys():
-                last_entity = table_def["gas_splitting"][current_header]
-            col_entity[i] = last_entity
-
-        df_this_table["entity"] = col_entity
-        table_def["index_cols"].append("entity")
-
-    # make a wide table
-    df_this_table = make_wide_table(df_this_table, table_def["wide_keyword"],
-                                    table_def["col_wide_kwd"], table_def["index_cols"])
-
-    if "drop_rows" in table_def.keys():
-        df_this_table = df_this_table.drop(table_def["drop_rows"], axis=0)
-
-    # reset row index
-    df_this_table = df_this_table.reset_index(drop=False)
-
-    # add entity
-    if "entity" in table_def.keys():
-        df_this_table["entity"] = table_def["entity"]
-
-    # add unit
-    df_this_table["unit"] = table_def["unit"]
-
-    df_this_table = df_this_table.rename({table_def["index_cols"][0]: "orig_cat_name"},
-                                         axis=1)
-
-    # print(table_def["index_cols"][0])
-    # print(df_this_table.columns.values)
-
-    # make a copy of the categories row
-    df_this_table["category"] = df_this_table["orig_cat_name"]
-
-    # replace cat names by codes in col "category"
-    # first the manual replacements
-    df_this_table["category"] = df_this_table["category"].replace(
-        table_def["cat_codes_manual"])
-    # then the regex replacements
-    repl = lambda m: m.group('UNFCCC_GHG_data')
-    df_this_table["category"] = df_this_table["category"].str.replace(cat_code_regexp,
-                                                                      repl, regex=True)
-
-    ### convert to PRIMAP2 IF
-    # remove ','
-    time_format = '%Y'
-    time_columns = [
-        col
-        for col in df_this_table.columns.values
-        if matches_time_format(col, time_format)
-    ]
-
-    for col in time_columns:
-        df_this_table.loc[:, col] = df_this_table.loc[:, col].str.replace(',', '',
-                                                                          regex=False)
-
-    # drop orig_cat_name as it's not unique per category
-    df_this_table = df_this_table.drop(columns="orig_cat_name")
-
-    # coords_defaults_this_table = coords_defaults.copy()
-    # coords_defaults_this_table["unit"] = table_def["unit"]
-    df_this_table_if = pm2.pm2io.convert_wide_dataframe_if(
-        df_this_table,
-        coords_cols=coords_cols,
-        add_coords_cols=add_coords_cols,
-        coords_defaults=coords_defaults,
-        coords_terminologies=coords_terminologies,
-        coords_value_mapping=coords_value_mapping,
-        # coords_value_filling=coords_value_filling,
-        # filter_remove=filter_remove,
-        # filter_keep=filter_keep,
-        meta_data=meta_data
-    )
-
-    this_table_pm2 = pm2.pm2io.from_interchange_format(df_this_table_if)
-
-    if data_pm2 is None:
-        data_pm2 = this_table_pm2
-    else:
-        data_pm2 = data_pm2.pr.merge(this_table_pm2)
-
-# convert back to IF to have units in the fixed format
-data_if = data_pm2.pr.to_interchange_format()
-
-# ###
-# save data
-# ###
-# data in original categories
-pm2.pm2io.write_interchange_format(output_folder /
-                                   (output_filename + coords_terminologies["category"]),
-                                   data_if)
-encoding = {var: compression for var in data_pm2.data_vars}
-data_pm2.pr.to_netcdf((output_folder /
-                       (output_filename + coords_terminologies[
-                           "category"])).with_suffix(".nc"),
-                      encoding=encoding)
-
-
-# ###
-# convert to IPCC2006 categories
-# ###
-data_proc_pm2 = data_pm2.copy(deep=True)
-
-
-country_processing = {
-    'basket_copy': basket_copy,
-}
-
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=country_processing,
-    cat_terminology_out = terminology_proc,
-    category_conversion = cat_conversion,
-)
-
-# convert to IF
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-
-# ###
-# save data
-# ###
-# data in 2006 categories
-pm2.pm2io.write_interchange_format(output_folder /
-                                   (output_filename + "IPCC2006_PRIMAP"),
-                                   data_proc_if)
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf((output_folder /
-                            (output_filename + "IPCC2006_PRIMAP")).with_suffix(".nc"),
-                           encoding=encoding)

+ 0 - 363
UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR3.py

@@ -1,363 +0,0 @@
-# configuration for Thailand, BUR4
-# ###
-# for reading
-# ###
-
-# general
-gwp_to_use = "AR4GWP100"
-terminology_proc = 'IPCC2006_PRIMAP'
-
-header_inventory = ['Greenhouse gas source and sink categories',
-                   'CO2 emissions', 'CO2 removals',
-                   'CH4', 'N2O', 'NOx', 'CO', 'NMVOCs',
-                   'SO2', 'HFCs', 'PFCs', 'SF6']
-unit_inventory = ['Gg'] * len(header_inventory)
-unit_inventory[9] = "GgCO2eq"
-unit_inventory[10] = "GgCO2eq"
-
-# 2019 inventory
-inv_conf = {
-    'year': 2016,
-    'entity_row': 0,
-    'unit_row': 1,
-    'index_cols': "Greenhouse gas source and sink categories",
-    'header': header_inventory,
-    'unit': unit_inventory,
-    # special header as category UNFCCC_GHG_data and name in one column
-    'header_long': ["orig_cat_name", "entity", "unit", "time", "data"],
-    # manual category codes (manual mapping to primap1, will be mapped to primap2
-    # # automatically with the other codes)
-    'cat_codes_manual': {
-        '6. Other Memo Items (not accounted in Total Emissions)': 'MEMO',
-        'International Bunkers': 'MBK',
-        'CO2 from Biomass': 'MBIO',
-    },
-    'cat_code_regexp': r'^(?P<code>[a-zA-Z0-9]{1,4})[\s\.].*',
-}
-
-# primap2 format conversion
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_2006_THA_Inv",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR3",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-    "entity": {
-        'HFCs': f"HFCS ({gwp_to_use})",
-        'PFCs': f"PFCS ({gwp_to_use})",
-        'NMVOCs': 'NMVOC',
-    },
-}
-
-filter_remove = {
-    'f_memo': {"category": "MEMO"},
-}
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/267629",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Thailand. Biennial update report (BUR). BUR3",
-    "comment": "Read fom pdf by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
-
-# main sector time series
-header_main_sector_ts = [
-    'Year', 'Energy', 'IPPU',
-    'Agriculture', 'LULUCF', 'Waste',
-    'Net emissions (Including LULUCF)',
-    'Net emissions (Excluding LULUCF)']
-unit_main_sector_ts = ['GgCO2eq'] * len(header_main_sector_ts)
-unit_main_sector_ts[0] = ''
-
-trend_conf = {
-    'header': header_main_sector_ts,
-    'unit': unit_main_sector_ts,
-    # manual category codes (manual mapping to primap1, will be mapped to primap2
-    # automatically with the other codes)
-    'cat_codes_manual': {
-        'Energy': "1",
-        'IPPU': "2",
-        'Agriculture': "3",
-        'LULUCF': "4",
-        'Waste': "5",
-        'Net emissions (Including LULUCF)': "0",
-        'Net emissions (Excluding LULUCF)': "M0EL",
-    },
-}
-
-coords_cols_main_sector_ts = {
-    "category": "category",
-    "unit": "unit",
-}
-
-coords_defaults_main_sector_ts = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR3",
-    "entity": f"KYOTOGHG ({gwp_to_use})",
-}
-
-# indirect gases time series
-header_indirect = ['Year', 'NOx', 'CO',
-                    'NMVOCs', 'SO2']
-unit_indirect = ['Gg'] * len(header_indirect)
-unit_indirect[0] = ''
-ind_conf = {
-    'header': header_indirect,
-    'unit': unit_indirect,
-    'cols_to_remove': ['Average Annual Growth Rate'],
-}
-
-coords_cols_indirect = {
-    "entity": "entity",
-    "unit": "unit",
-}
-
-coords_defaults_indirect = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR3",
-    "category": "0",
-}
-
-# ###
-# for processing
-# ###
-# aggregate categories
-country_processing_step1 = {
-    'aggregate_cats': {
-        '2.A.4': {'sources': ['2.A.4.b', '2.A.4.d'],
-                  'name': 'Other Process uses of Carbonates'},
-    },
-    'aggregate_gases': {
-        'KYOTOGHG': {
-            'basket': 'KYOTOGHG (AR4GWP100)',
-            'basket_contents': ['CO2', 'CH4', 'N2O', 'SF6',
-                                'HFCS (AR4GWP100)', 'PFCS (AR4GWP100)'],
-            'skipna': True,
-            'min_count': 1,
-            'sel': {f'category ({coords_terminologies["category"]})':
-                [
-                    '0', '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
-                    '1.A.4', '1.B', '1.B.1', '1.B.2',
-                    '1.C',
-                    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
-                    '2.B', '2.C', '2.D', '2.H',
-                    '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
-                    '3.H', '3.I',
-                    '4', '4.A', '4.B', '4.C', '4.D', '4.E',
-                    '5', '5.A', '5.B', '5.C', '5.D'
-                ]
-            }, # not tested
-        },
-    },
-}
-
-country_processing_step2 = {
-    'downscale': {
-        # main sectors present as KYOTOGHG sum. subsectors need to be downscaled
-        # TODO: downscale CO, NOx, NMVOC, SO2 (national total present)
-        'sectors': {
-            '1': {
-                'basket': '1',
-                'basket_contents': ['1.A', '1.B', '1.C'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '1.A': {
-                'basket': '1.A',
-                'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '1.B': {
-                'basket': '1.B',
-                'basket_contents': ['1.B.1', '1.B.2'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '2': {
-                'basket': '2',
-                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.H'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '2.A': {
-                'basket': '2.A',
-                'basket_contents': ['2.A.1', '2.A.2', '2.A.3', '2.A.4'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '3': {
-                'basket': '3',
-                'basket_contents': ['3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
-                                    '3.H', '3.I'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '4': {
-                'basket': '4',
-                'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '5': {
-                'basket': '5',
-                'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-        },
-        'entities': {
-            'KYOTO': {
-                'basket': 'KYOTOGHG (AR4GWP100)',
-                'basket_contents': ['CH4', 'CO2', 'N2O', 'HFCS (AR4GWP100)',
-                                    'PFCS (AR4GWP100)', 'SF6'],
-                'sel': {f'category ({coords_terminologies["category"]})':
-                    [
-                        '0', '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
-                        '1.A.4', '1.B', '1.B.1', '1.B.2', '1.C',
-                        '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
-                        '2.B', '2.C', '2.D', '2.H',
-                        '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
-                        '3.H', '3.I',
-                        '4', '4.A', '4.B', '4.C', '4.D', '4.E',
-                        '5', '5.A', '5.B', '5.C', '5.D']},
-            },
-        },
-    },
-    'basket_copy': {
-        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
-        'entities': ["HFCS", "PFCS"],
-        'source_GWP': gwp_to_use,
-    },
-}
-## not in BUR3: 1.A.1.a, 1.A.1.b, 1.A.3.a, 1.A.3.b, 1.A.3.c, 1.A.3.d, 1.A.5, 1.B.3,
-# 2.B.x, 2.F, 2.G
-# 4.E.x, 5.X.y M.BK.A, M.BK.M
-
-cat_conversion = {
-    'mapping': {
-        '0': '0',
-        'M.0.EL': 'M.0.EL',
-        '1': '1',
-        '1.A': '1.A',
-        '1.A.1': '1.A.1',
-        '1.A.2': '1.A.2',
-        '1.A.3': '1.A.3',
-        '1.A.4': '1.A.4',
-        '1.B': '1.B',
-        '1.B.1': '1.B.1',
-        '1.B.2': '1.B.2',
-        '1.C': '1.C',
-        '1.C.1': '1.C.1',
-        '1.C.2': '1.C.2',
-        '1.C.3': '1.C.3',
-        '2': '2',
-        '2.A': '2.A',
-        '2.A.1': '2.A.1',
-        '2.A.2': '2.A.2',
-        '2.A.3': '2.A.3',
-        '2.A.4': '2.A.4',
-        '2.A.4.b': '2.A.4.b',
-        '2.A.4.d': '2.A.4.d',
-        '2.B': '2.B',
-        '2.C': '2.C',
-        '2.C.1': '2.C.1',
-        '2.D': '2.D',
-        '2.D.1': '2.D.1',
-        '2.H': '2.H',
-        '2.H.1': '2.H.1',
-        '2.H.2': '2.H.2',
-        '3': 'M.AG',
-        '3.A': '3.A.1',
-        '3.B': '3.A.2',
-        '3.C': 'M.3.C.1.AG',  # field burning of agricultural residues
-        '3.D': '3.C.2',  # Liming
-        '3.E': '3.C.3',  # urea application
-        '3.F': '3.C.4',  # direct N2O from agri soils
-        '3.G': '3.C.5',  # indirect N2O from agri soils
-        '3.H': '3.C.6',  # indirect N2O from manure management
-        '3.I': '3.C.7',  # rice
-        '4': 'M.LULUCF',
-        '4.A': '3.B.1.a',  # forest remaining forest
-        '4.B': '3.B.2.a',  # cropland remaining cropland
-        '4.C': '3.B.2.b',  # land converted to cropland
-        '4.D': '3.B.6.b',  # land converted to other land
-        '4.E': 'M.3.C.1.LU',  # biomass burning (LULUCF)
-        '5': '4',
-        '5.A': '4.A',
-        '5.B': '4.B',
-        '5.C': '4.C',
-        '5.D': '4.D',
-        'M.BK': 'M.BK',
-        'M.BIO': 'M.BIO',
-    },
-    'aggregate': {
-        '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-        '3.C.1': {'sources': ['M.3.C.1.AG', 'M.3.C.1.LU'],
-                  'name': 'Emissions from Biomass Burning'},
-        '3.C': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-                'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-        'M.3.C.AG': {
-            'sources': ['M.3.C.1.AG', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
-        'M.AG.ELV': {'sources': ['M.3.C.AG'],
-                     'name': 'Agriculture excluding livestock emissions'},
-        'M.3.C.LU': {'sources': ['M.3.C.1.LU'],
-                     'name': 'Aggregate sources and non-CO2 emissions sources on land (Land use)'},
-        '3.B.1': {'sources': ['3.B.1.a'], 'name': 'Forest Land'},
-        '3.B.2': {'sources': ['3.B.2.a', '3.B.2.b'], 'name': 'Cropland'},
-        '3.B.6': {'sources': ['3.B.6.b'], 'name': 'Other Land'},
-        '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.6'], 'name': 'Land'},
-        'M.LULUCF': {'sources': ['3.B', 'N.3.C.LU'], 'name': 'LULUCF'},
-        '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-    },
-}
-
-sectors_to_save = [
-    '1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
-    '1.B', '1.B.1', '1.B.2', '1.C', '1.C.1', '1.C.2', '1.C.3',
-    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4', '2.A.4.b', '2.A.4.d',
-    '2.B', '2.C', '2.C.1', '2.H', '2.H.1', '2.H.2',
-    '3', 'M.AG', '3.A', '3.A.1', '3.A.2',
-    '3.C', '3.C.1', '3.C.2', '3.C.3', '3.C.4',
-    '3.C.5', '3.C.6', '3.C.7', 'M.3.C.1.AG', 'M.3.C.AG', 'M.AG.ELV',
-    'M.LULUCF', 'M.3.C.1.LU', 'M.3.C.LU', '3.B', '3.B.1', '3.B.1.a', '3.B.2', '3.B.2.a',
-    '3.B.2.b', '3.B.6', '3.B.6.b',
-    '4', '4.A', '4.B', '4.C', '4.D',
-    '0', 'M.0.EL', 'M.BK', 'M.BIO']
-
-
-# gas baskets
-gas_baskets = {
-    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
-    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
-    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
-    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
-    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
-    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
-}

+ 0 - 381
UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR4.py

@@ -1,381 +0,0 @@
-# configuration for Thailand, BUR4
-# ###
-# for reading
-# ###
-
-# general
-gwp_to_use = "AR4GWP100"
-terminology_proc = 'IPCC2006_PRIMAP'
-
-# 2019 inventory
-inv_conf = {
-    'year': 2019,
-    'entity_row': 0,
-    'unit_row': 1,
-    'index_cols': "Greenhouse gas source and sink categories",
-    # special header as category UNFCCC_GHG_data and name in one column
-    'header_long': ["orig_cat_name", "entity", "unit", "time", "data"],
-    # manual category codes (manual mapping to primap1, will be mapped to primap2
-    # # automatically with the other codes)
-    'cat_codes_manual': {
-        'Total national emissions and removals': '0',
-        'Memo Items (not accounted in total Emissions)': 'MEMO',
-        'International Bunkers': 'MBK',
-        'Aviation International Bunkers': 'MBKA',
-        'Marine-International Bunkers': 'MBKM',
-        'CO2 from biomass': 'MBIO',
-    },
-    'cat_code_regexp': r'^(?P<code>[a-zA-Z0-9]{1,4})[\s\.].*',
-}
-
-# primap2 format conversion
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_2006_THA_Inv",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR4",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-    "entity": {
-        'HFCs': f"HFCS ({gwp_to_use})",
-        'PFCs': f"PFCS ({gwp_to_use})",
-        'SF6': f'SF6 ({gwp_to_use})',
-        'NMVOCs': 'NMVOC',
-        'Nox': 'NOx',
-    },
-}
-
-filter_remove = {
-    'f_memo': {"category": "MEMO"},
-}
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/624750",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Thailand. Biennial update report (BUR). BUR4",
-    "comment": "Read fom pdf by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
-
-# main sector time series
-# manual category codes (manual mapping to primap1, will be mapped to primap2
-# automatically with the other codes)
-cat_codes_manual_main_sector_ts = {
-    'Energy': "1",
-    'Industrial Processes and Product Use': "2",
-    'Agriculture': "3",
-    'LULUCF': "4",
-    'Waste': "5",
-    'Net emissions (Include LULUCF)': "0",
-    'Total emissions (Exclude LULUCF)': "M0EL",
-}
-
-coords_cols_main_sector_ts = {
-    "category": "category",
-}
-
-coords_defaults_main_sector_ts = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR4",
-    "entity": f"KYOTOGHG ({gwp_to_use})",
-    "unit": "GgCO2eq",
-}
-
-# indirect gases time series
-coords_cols_indirect = {
-    "entity": "entity",
-}
-
-coords_defaults_indirect = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR4",
-    "category": "0",
-    "unit": "Gg",
-}
-
-# ###
-# for processing
-# ###
-# aggregate categories
-country_processing_step1 = {
-    'aggregate_cats': {
-        '2.A.4': {'sources': ['2.A.4.b', '2.A.4.d'],
-                  'name': 'Other Process uses of Carbonates'},
-        '2.B.8': {'sources': ['2.B.8.b', '2.B.8.c', '2.B.8.e', '2.B.8.f'],
-                  'name': 'Petrochemical and Carbon Black production'},
-    },
-    'aggregate_gases': {
-        'KYOTOGHG': {
-            'basket': 'KYOTOGHG (AR4GWP100)',
-            'basket_contents': ['CO2', 'CH4', 'N2O', 'SF6',
-                                'HFCS (AR4GWP100)', 'PFCS (AR4GWP100)'],
-            'skipna': True,
-            'min_count': 1,
-            'sel': {f'category ({coords_terminologies["category"]})':
-                [
-                    '0', '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
-                    '1.A.4', '1.A.5', '1.B', '1.B.1', '1.B.2', '1.B.3',
-                    '1.C',
-                    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
-                    '2.B', '2.C', '2.D', '2.F', '2.G', '2.H',
-                    '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
-                    '3.H', '3.I',
-                    '4', '4.A', '4.B', '4.C', '4.D',
-                    '4.E', '4.E.1', '4.E.2', '4.E.3',
-                    '5', '5.A', '5.B', '5.C', '5.D'
-                ]
-            }, # not tested
-        },
-    },
-}
-
-country_processing_step2 = {
-    'downscale': {
-        # main sectors present as KYOTOGHG sum. subsectors need to be downscaled
-        # TODO: downscale CO, NOx, NMVOC, SO2 (national total present)
-        'sectors': {
-            '1': {
-                'basket': '1',
-                'basket_contents': ['1.A', '1.B', '1.C'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '1.A': {
-                'basket': '1.A',
-                'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4', '1.A.5'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '1.B': {
-                'basket': '1.B',
-                'basket_contents': ['1.B.1', '1.B.2', '1.B.3'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '2': {
-                'basket': '2',
-                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.F', '2.G', '2.H'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '2.A': {
-                'basket': '2.A',
-                'basket_contents': ['2.A.1', '2.A.2', '2.A.3', '2.A.4'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '3': {
-                'basket': '3',
-                'basket_contents': ['3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
-                                    '3.H', '3.I'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '4': {
-                'basket': '4',
-                'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '4.E': {
-                'basket': '4.E',
-                'basket_contents': ['4.E.1', '4.E.2', '4.E.3'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-            '5': {
-                'basket': '5',
-                'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-                'entities': ['KYOTOGHG (AR4GWP100)'],
-                'dim': f'category ({coords_terminologies["category"]})',
-            },
-        },
-        'entities': {
-            'KYOTO': {
-                'basket': 'KYOTOGHG (AR4GWP100)',
-                'basket_contents': ['CH4', 'CO2', 'N2O', 'HFCS (AR4GWP100)',
-                                    'PFCS (AR4GWP100)', 'SF6'],
-                'sel': {f'category ({coords_terminologies["category"]})':
-                    [
-                        '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
-                        '1.A.4', '1.A.5', '1.B', '1.B.1', '1.B.2', '1.B.3',
-                        '1.C',
-                        '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
-                        '2.B', '2.C', '2.D', '2.F', '2.G', '2.H',
-                        '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
-                        '3.H', '3.I',
-                        '4', '4.A', '4.B', '4.C', '4.D',
-                        '4.E', '4.E.1', '4.E.2', '4.E.3',
-                        '5', '5.A', '5.B', '5.C', '5.D']},
-            },
-        },
-    },
-    'basket_copy': {
-        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
-        'entities': ["HFCS", "PFCS"],
-        'source_GWP': gwp_to_use,
-    },
-}
-
-cat_conversion = {
-    'mapping': {
-        '0': '0',
-        'M.0.EL': 'M.0.EL',
-        '1': '1',
-        '1.A': '1.A',
-        '1.A.1': '1.A.1',
-        '1.A.1.a': '1.A.1.a',
-        '1.A.1.b': '1.A.1.b',
-        '1.A.2': '1.A.2',
-        '1.A.3': '1.A.3',
-        '1.A.3.a': '1.A.3.a',
-        '1.A.3.b': '1.A.3.b',
-        '1.A.3.c': '1.A.3.c',
-        '1.A.3.d': '1.A.3.d',
-        '1.A.4': '1.A.4',
-        '1.A.5': '1.A.5',
-        '1.B': '1.B',
-        '1.B.1': '1.B.1',
-        '1.B.2': '1.B.2',
-        '1.B.3': '1.B.3',
-        '1.C': '1.C',
-        '1.C.1': '1.C.1',
-        '1.C.2': '1.C.2',
-        '1.C.3': '1.C.3',
-        '2': '2',
-        '2.A': '2.A',
-        '2.A.1': '2.A.1',
-        '2.A.2': '2.A.2',
-        '2.A.3': '2.A.3',
-        '2.A.4': '2.A.4',
-        '2.A.4.b': '2.A.4.b',
-        '2.A.4.d': '2.A.4.d',
-        '2.B': '2.B',
-        '2.B.2': '2.B.2',
-        '2.B.4': '2.B.4',
-        '2.B.8': '2.B.8',
-        '2.B.8.b': '2.B.8.b',
-        '2.B.8.c': '2.B.8.c',
-        '2.B.8.e': '2.B.8.e',
-        '2.B.8.f': '2.B.8.f',
-        '2.C': '2.C',
-        '2.C.1': '2.C.1',
-        '2.D': '2.D',
-        '2.D.1': '2.D.1',
-        '2.F': '2.F',
-        '2.F.1': '2.F.1',
-        '2.G': '2.G',
-        '2.G.1': '2.G.1',
-        '2.H': '2.H',
-        '2.H.1': '2.H.1',
-        '2.H.2': '2.H.2',
-        '3': 'M.AG',
-        '3.A': '3.A.1',
-        '3.B': '3.A.2',
-        '3.C': 'M.3.C.1.b.i',  # field burning of agricultural residues
-        '3.D': '3.C.2',  # Liming
-        '3.E': '3.C.3',  # urea application
-        '3.F': '3.C.4',  # direct N2O from agri soils
-        '3.G': '3.C.5',  # indirect N2O from agri soils
-        '3.H': '3.C.6',  # indirect N2O from manure management
-        '3.I': '3.C.7',  # rice
-        #'4': 'M.LULUCF',
-        '4.A': '3.B.1.a',  # forest remaining forest
-        '4.B': '3.B.2.a',  # cropland remaining cropland
-        '4.C': '3.B.2.b',  # land converted to cropland
-        '4.D': '3.B.6.b',  # land converted to other land
-        #'4.E': 'M.3.C.1.LU',  # biomass burning (LULUCF)
-        '4.E.1': '3.C.1.a', # biomass burning (Forest Land)
-        '4.E.2': 'M.3.C.1.b.ii', # biomass burning (Cropland)
-        '4.E.3': '3.C.1.d', # biomass burning (Other Land)
-        '5': '4',
-        '5.A': '4.A',
-        '5.A.1': '4.A.1',
-        '5.A.2': '4.A.2',
-        '5.B': '4.B',
-        '5.C': '4.C',
-        '5.C.1': '4.C.1',
-        '5.D': '4.D',
-        '5.D.1': '4.D.1',
-        '5.D.2': '4.D.2',
-        'M.BK': 'M.BK',
-        'M.BK.A': 'M.BK.A',
-        'M.BK.M': 'M.BM.M',
-        'M.BIO': 'M.BIO',
-    },
-    'aggregate': {
-        '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-        '3.C.1.b': {'sources': ['M.3.C.1.b.i', 'M.3.C.1.b.ii'],
-                  'name': 'Biomass Burning In Cropland'},
-        'M.3.C.1.AG': {'sources': ['3.C.1.b', '3.C.1.c'],
-                  'name': 'Biomass Burning (Agriculture)'},
-        'M.3.C.1.LU': {'sources': ['3.C.1.a', '3.C.1.d'],
-                  'name': 'Biomass Burning (LULUCF)'},
-        '3.C.1': {'sources': ['M.3.C.1.AG', 'M.3.C.1.LU'],
-                  'name': 'Emissions from Biomass Burning'},
-        '3.C': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-                'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-        'M.3.C.AG': {
-            'sources': ['M.3.C.1.AG', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
-        'M.AG.ELV': {'sources': ['M.3.C.AG'],
-                     'name': 'Agriculture excluding livestock emissions'},
-        'M.3.C.LU': {'sources': ['M.3.C.1.LU'],
-                     'name': 'Aggregate sources and non-CO2 emissions sources on land (Land use)'},
-        '3.B.1': {'sources': ['3.B.1.a'], 'name': 'Forest Land'},
-        '3.B.2': {'sources': ['3.B.2.a', '3.B.2.b'], 'name': 'Cropland'},
-        '3.B.6': {'sources': ['3.B.6.b'], 'name': 'Other Land'},
-        '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.6'], 'name': 'Land'},
-        'M.LULUCF': {'sources': ['3.B', 'N.3.C.LU'], 'name': 'LULUCF'},
-        '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-    },
-}
-
-sectors_to_save = [
-    '1', '1.A', '1.A.1', '1.A.1.a', '1.A.1.b', '1.A.2', '1.A.3', '1.A.3.a', '1.A.3.b',
-    '1.A.3.c', '1.A.3.d', '1.A.4', '1.A.5',
-    '1.B', '1.B.1', '1.B.2', '1.B.3', '1.C', '1.C.1', '1.C.2', '1.C.3',
-    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4', '2.A.4.b', '2.A.4.d',
-    '2.B', '2.B.2', '2.B.4', '2.B.8', '2.B.8.a', '2.B.8.c', '2.B.8.e', '2.B.8.f',
-    '2.C', '2.C.1', '2.F', '2.F.1', '2.G', '2.G.1', '2.H', '2.H.1', '2.H.2',
-    '3', 'M.AG', '3.A', '3.A.1', '3.A.2',
-    '3.C', '3.C.1', '3.C.1.a', '3.C.1.b', '3.C.1.d', '3.C.2', '3.C.3', '3.C.4',
-    '3.C.5', '3.C.6', '3.C.7', 'M.3.C.1.AG', 'M.3.C.AG', 'M.AG.ELV',
-    'M.LULUCF', 'M.3.C.1.LU', 'M.3.C.LU', '3.B', '3.B.1', '3.B.1.a', '3.B.2', '3.B.2.a',
-    '3.B.2.b', '3.B.6', '3.B.6.b',
-    '4', '4.A', '4.A.1', '4.A.2', '4.B', '4.C', '4.C.1', '4.D', '4.D.1', '4.D.2',
-    '0', 'M.0.EL', 'M.BK', 'M.BK.A', 'M.BK.M', 'M.BIO']
-
-
-# gas baskets
-gas_baskets = {
-    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
-    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
-    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
-    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
-    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
-    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
-}

+ 0 - 270
UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR3_from_pdf.py

@@ -1,270 +0,0 @@
-# this script reads data from Thailand's BUR3
-# Data is read from the pdf file
-
-import pandas as pd
-import primap2 as pm2
-import camelot
-
-from UNFCCC_GHG_data.helper import process_data_for_country
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from config_THA_BUR3 import inv_conf, trend_conf, ind_conf
-from config_THA_BUR3 import coords_cols, coords_defaults, coords_terminologies, \
-    coords_value_mapping, filter_remove, filter_keep, meta_data
-from config_THA_BUR3 import coords_cols_main_sector_ts, coords_defaults_main_sector_ts
-from config_THA_BUR3 import coords_defaults_indirect, coords_cols_indirect
-from config_THA_BUR3 import gas_baskets, cat_conversion, terminology_proc, \
-    sectors_to_save
-from config_THA_BUR3 import country_processing_step1, country_processing_step2
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Thailand' / 'BUR3'
-output_folder = extracted_data_path / 'UNFCCC' / 'Thailand'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-inventory_file = 'BUR3_Thailand_251220_.pdf'
-output_filename = 'THA_BUR3_2020_'
-
-compression = dict(zlib=True, complevel=9)
-
-# inventory tables
-pages_inventory = '68,69'
-
-# main sector time series
-page_main_sector_ts = '70'
-
-# indirect gases time series
-page_indirect = '72'
-
-
-# ###
-# read the inventory data and convert to PM2 IF
-# ###
-tables_inventory = camelot.read_pdf(str(input_folder / inventory_file), pages=pages_inventory,
-                                    split_text=True, flavor="lattice")
-
-df_inventory = tables_inventory[0].df[1:]
-df_header = pd.DataFrame([inv_conf["header"], inv_conf["unit"]])
-
-df_inventory = pd.concat([df_header, df_inventory, tables_inventory[1].df.iloc[1:]],
-                         axis=0, join='outer')
-
-df_inventory = pm2.pm2io.nir_add_unit_information(df_inventory,
-                                                  unit_row=inv_conf["unit_row"],
-                                                  entity_row=inv_conf["entity_row"],
-                                                  regexp_entity=".*", regexp_unit=".*",
-                                                  default_unit="Gg")
-# set index and convert to long format
-df_inventory = df_inventory.set_index(inv_conf["index_cols"])
-df_inventory_long = pm2.pm2io.nir_convert_df_to_long(df_inventory, inv_conf["year"],
-                                                     inv_conf["header_long"])
-df_inventory_long["orig_cat_name"] = df_inventory_long["orig_cat_name"].str[0]
-
-# prep for conversion to PM2 IF and native format
-# make a copy of the categories row
-df_inventory_long["category"] = df_inventory_long["orig_cat_name"]
-
-# replace cat names by codes in col "category"
-# first the manual replacements
-df_inventory_long["category"] = \
-    df_inventory_long["category"].replace(inv_conf["cat_codes_manual"])
-# then the regex replacements
-repl = lambda m: m.group('code')
-df_inventory_long["category"] = \
-    df_inventory_long["category"].str.replace(inv_conf["cat_code_regexp"], repl,
-                                              regex=True)
-df_inventory_long = df_inventory_long.reset_index(drop=True)
-
-# replace "," with "" in data
-repl = lambda m: m.group('part1') + m.group('part2')
-df_inventory_long.loc[:, "data"] = \
-    df_inventory_long.loc[:, "data"].str.replace(
-        '(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-df_inventory_long.loc[:, "data"] = df_inventory_long.loc[:, "data"].str.\
-    replace(' ','', regex=False)
-
-# make sure all col headers are str
-df_inventory_long.columns = df_inventory_long.columns.map(str)
-
-df_inventory_long = df_inventory_long.drop(columns=["orig_cat_name"])
-
-data_inventory_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_inventory_long,
-    coords_cols=coords_cols,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-    )
-
-# ###
-# read the main sector time series and convert to PM2 IF
-# ###
-tables_main_sector_ts = camelot.read_pdf(str(input_folder / inventory_file), pages=page_main_sector_ts,
-                                    split_text=True, flavor="lattice")
-
-df_main_sector_ts = tables_main_sector_ts[0].df.iloc[2:]
-#df_header = pd.DataFrame([header_main_sector_ts, unit_main_sector_ts])
-#df_main_sector_ts = pd.concat([df_header, df_main_sector_ts], axis=0, join='outer')
-df_main_sector_ts.columns = [trend_conf["header"], trend_conf["unit"]]
-
-df_main_sector_ts = df_main_sector_ts.transpose()
-df_main_sector_ts = df_main_sector_ts.reset_index(drop=False)
-cols = df_main_sector_ts.iloc[0].copy(deep=True)
-cols.iloc[0] = "category"
-cols.iloc[1] = "unit"
-df_main_sector_ts.columns = cols
-df_main_sector_ts = df_main_sector_ts.drop(0)
-
-# replace cat names by codes in col "category"
-df_main_sector_ts["category"] = df_main_sector_ts["category"].replace(
-    trend_conf["cat_codes_manual"])
-
-repl = lambda m: m.group('part1') + m.group('part2')
-year_cols = list(set(df_main_sector_ts.columns) - set(['category', 'unit']))
-for col in year_cols:
-    df_main_sector_ts.loc[:, col] = df_main_sector_ts.loc[:, col].str.\
-        replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-    df_main_sector_ts.loc[:, col] = df_main_sector_ts.loc[:, col].str.\
-        replace(' ','', regex=False)
-
-data_main_sector_ts_IF = pm2.pm2io.convert_wide_dataframe_if(
-    df_main_sector_ts,
-    coords_cols=coords_cols_main_sector_ts,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults_main_sector_ts,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-    )
-
-
-# ###
-# read the indirect gases time series and convert to PM2 IF
-# ###
-tables_indirect = camelot.read_pdf(str(input_folder / inventory_file), pages=page_indirect,
-                                    split_text=True, flavor="lattice")
-
-df_indirect = tables_indirect[0].df.iloc[2:]
-#df_header = pd.DataFrame([header_main_sector_ts, unit_main_sector_ts])
-#df_main_sector_ts = pd.concat([df_header, df_main_sector_ts], axis=0, join='outer')
-df_indirect.columns = [ind_conf["header"], ind_conf["unit"]]
-
-df_indirect = df_indirect.transpose()
-df_indirect = df_indirect.reset_index(drop=False)
-cols = df_indirect.iloc[0].copy(deep=True)
-cols.iloc[0] = "entity"
-cols.iloc[1] = "unit"
-df_indirect.columns = cols
-df_indirect = df_indirect.drop(0)
-df_indirect = df_indirect.drop(columns=ind_conf["cols_to_remove"])
-
-repl = lambda m: m.group('part1') + m.group('part2')
-year_cols = list(set(df_indirect.columns) - set(['entity', 'unit']))
-for col in year_cols:
-    df_indirect.loc[:, col] = df_indirect.loc[:, col].str.\
-        replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-    df_indirect.loc[:, col] = df_indirect.loc[:, col].str.\
-        replace(' ','', regex=False)
-
-data_indirect_IF = pm2.pm2io.convert_wide_dataframe_if(
-    df_indirect,
-    coords_cols=coords_cols_indirect,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults_indirect,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-    )
-
-# ###
-# merge the three datasets
-# ###
-data_inventory_pm2 = pm2.pm2io.from_interchange_format(data_inventory_IF)
-data_main_sector_ts_pm2 = pm2.pm2io.from_interchange_format(data_main_sector_ts_IF)
-data_indirect_pm2 = pm2.pm2io.from_interchange_format(data_indirect_IF)
-
-data_all_pm2 = data_inventory_pm2.pr.merge(data_main_sector_ts_pm2)
-data_all_pm2 = data_all_pm2.pr.merge(data_indirect_pm2)
-
-data_all_if = data_all_pm2.pr.to_interchange_format()
-
-# ###
-# save raw data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_all_if)
-
-encoding = {var: compression for var in data_all_pm2.data_vars}
-data_all_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding)
-
-# ###
-# ## process the data
-# ###
-data_proc_pm2 = data_all_pm2
-
-# combine CO2 emissions and removals
-data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].pr.sum\
-    (dim="entity", skipna=True, min_count=1)
-data_proc_pm2["CO2"].attrs['entity'] = 'CO2'
-
-# actual processing
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=['CO2 emissions', 'CO2 removals'],
-    gas_baskets={},
-    processing_info_country=country_processing_step1,
-)
-
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=country_processing_step2,
-    cat_terminology_out = terminology_proc,
-    category_conversion = cat_conversion,
-    sectors_out = sectors_to_save,
-)
-
-# adapt source and metadata
-# TODO: processing info is present twice
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"),
-    encoding=encoding)

+ 0 - 225
UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR4_from_pdf.py

@@ -1,225 +0,0 @@
-# this script reads data from Thailand's BUR3
-# Data is read from two csv files which have been created manually from ocr processed
-# pdf files
-# pdftk Thailand_BUR4_final_28122022.pdf cat 65-67east output inventory_2019.pdf
-# ocrmypdf --force-ocr inventory_2019.pdf inventory_2019_ocr.pdf
-# pdftk Thailand_BUR4_final_28122022.pdf cat 69 output trends.pdf
-# ocrmypdf --force-ocr trends.pdf trends_ocr.pdf
-
-# values for HFCs and SF6 have been taken from Table2-9 where they are present in
-# CO2eq and thus HFC data can be used and SF6 data is not 0 as in the mein inventory
-# tables
-
-import pandas as pd
-import primap2 as pm2
-
-from UNFCCC_GHG_data.helper import process_data_for_country
-from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from config_THA_BUR4 import gwp_to_use, inv_conf
-from config_THA_BUR4 import coords_cols, coords_defaults, coords_terminologies, \
-    coords_value_mapping, filter_remove, filter_keep, meta_data
-from config_THA_BUR4 import coords_cols_main_sector_ts, \
-    cat_codes_manual_main_sector_ts, coords_defaults_main_sector_ts
-from config_THA_BUR4 import coords_defaults_indirect, coords_cols_indirect
-from config_THA_BUR4 import gas_baskets, cat_conversion, terminology_proc, \
-    sectors_to_save
-from config_THA_BUR4 import country_processing_step1, country_processing_step2
-
-# ###
-# configuration
-# ###
-input_folder = downloaded_data_path / 'UNFCCC' / 'Thailand' / 'BUR4'
-output_folder = extracted_data_path / 'UNFCCC' / 'Thailand'
-if not output_folder.exists():
-    output_folder.mkdir()
-
-inventory_file = 'THA_inventory_2019.csv'
-trends_file = 'THA_trends_2000-2019.csv'
-indirect_file = 'THA_indirect_2000-2019.csv'
-output_filename = 'THA_BUR4_2022_'
-
-compression = dict(zlib=True, complevel=9)
-
-
-# ###
-# read the inventory data and convert to PM2 IF
-# ###
-df_inventory = pd.read_csv(input_folder /inventory_file, header=None)
-df_inventory = pm2.pm2io.nir_add_unit_information(
-    df_inventory, unit_row=inv_conf["unit_row"], entity_row=inv_conf["entity_row"],
-    regexp_entity=".*", regexp_unit=".*", default_unit="Gg")
-# set index and convert to long format
-df_inventory = df_inventory.set_index(inv_conf["index_cols"])
-df_inventory_long = pm2.pm2io.nir_convert_df_to_long(df_inventory, inv_conf["year"],
-                                                     inv_conf["header_long"])
-df_inventory_long["orig_cat_name"] = df_inventory_long["orig_cat_name"].str[0]
-
-# prep for conversion to PM2 IF and native format
-# make a copy of the categories row
-df_inventory_long["category"] = df_inventory_long["orig_cat_name"]
-
-# replace cat names by codes in col "category"
-# first the manual replacements
-df_inventory_long["category"] = \
-    df_inventory_long["category"].replace(inv_conf["cat_codes_manual"])
-# then the regex replacements
-repl = lambda m: m.group('code')
-df_inventory_long["category"] = \
-    df_inventory_long["category"].str.replace(inv_conf["cat_code_regexp"], repl,
-                                              regex=True)
-df_inventory_long = df_inventory_long.reset_index(drop=True)
-
-# make sure all col headers are str
-df_inventory_long.columns = df_inventory_long.columns.map(str)
-
-df_inventory_long = df_inventory_long.drop(columns=["orig_cat_name"])
-
-data_inventory_IF = pm2.pm2io.convert_long_dataframe_if(
-    df_inventory_long,
-    coords_cols=coords_cols,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-    )
-
-# ###
-# read the main sector time series and convert to PM2 IF
-# ###
-df_main_sector_ts = pd.read_csv(input_folder / trends_file)
-
-df_main_sector_ts = df_main_sector_ts.transpose()
-df_main_sector_ts = df_main_sector_ts.reset_index(drop=False)
-cols = df_main_sector_ts.iloc[0].copy(deep=True)
-cols.iloc[0] = "category"
-cols.iloc[1:] = cols.iloc[1:].astype(int).astype(str)
-df_main_sector_ts.columns = cols
-df_main_sector_ts = df_main_sector_ts.drop(0)
-
-# replace cat names by codes in col "category"
-df_main_sector_ts["category"] = \
-    df_main_sector_ts["category"].replace(cat_codes_manual_main_sector_ts)
-
-data_main_sector_ts_IF = pm2.pm2io.convert_wide_dataframe_if(
-    df_main_sector_ts,
-    coords_cols=coords_cols_main_sector_ts,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults_main_sector_ts,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format='%Y',
-    )
-
-
-# ###
-# read the indirect gases time series and convert to PM2 IF
-# ###
-df_indirect = pd.read_csv(input_folder / indirect_file)
-
-df_indirect = df_indirect.transpose()
-df_indirect = df_indirect.reset_index(drop=False)
-cols = df_indirect.iloc[0].copy(deep=True)
-cols.iloc[0] = "entity"
-cols.iloc[1:] = cols.iloc[1:].astype(int).astype(str)
-df_indirect.columns = cols
-df_indirect = df_indirect.drop(0)
-
-data_indirect_IF = pm2.pm2io.convert_wide_dataframe_if(
-    df_indirect,
-    coords_cols=coords_cols_indirect,
-    #add_coords_cols=add_coords_cols,
-    coords_defaults=coords_defaults_indirect,
-    coords_terminologies=coords_terminologies,
-    coords_value_mapping=coords_value_mapping,
-    #coords_value_filling=coords_value_filling,
-    #filter_remove=filter_remove,
-    #filter_keep=filter_keep,
-    meta_data=meta_data,
-    convert_str=True,
-    time_format="%Y",
-    )
-
-# ###
-# merge the three datasets
-# ###
-data_inventory_pm2 = pm2.pm2io.from_interchange_format(data_inventory_IF)
-data_main_sector_ts_pm2 = pm2.pm2io.from_interchange_format(data_main_sector_ts_IF)
-data_indirect_pm2 = pm2.pm2io.from_interchange_format(data_indirect_IF)
-
-data_all_pm2 = data_inventory_pm2.pr.merge(data_main_sector_ts_pm2)
-data_all_pm2 = data_all_pm2.pr.merge(data_indirect_pm2)
-
-data_all_if = data_all_pm2.pr.to_interchange_format()
-
-# ###
-# save raw data to IF and native format
-# ###
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
-    data_all_if)
-
-encoding = {var: compression for var in data_all_pm2.data_vars}
-data_all_pm2.pr.to_netcdf(
-    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
-    encoding=encoding)
-
-# ###
-# ## process the data
-# ###
-data_proc_pm2 = data_all_pm2
-
-# combine CO2 emissions and removals
-data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].pr.sum\
-    (dim="entity", skipna=True, min_count=1)
-data_proc_pm2["CO2"].attrs['entity'] = 'CO2'
-
-# actual processing
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=['CO2 emissions', 'CO2 removals'],
-    gas_baskets={},
-    processing_info_country=country_processing_step1,
-)
-
-data_proc_pm2 = process_data_for_country(
-    data_proc_pm2,
-    entities_to_ignore=[],
-    gas_baskets=gas_baskets,
-    processing_info_country=country_processing_step2,
-    cat_terminology_out = terminology_proc,
-    category_conversion = cat_conversion,
-    sectors_out = sectors_to_save,
-)
-
-# adapt source and metadata
-# TODO: processing info is present twice
-current_source = data_proc_pm2.coords["source"].values[0]
-data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
-data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
-
-# ###
-# save data to IF and native format
-# ###
-data_proc_if = data_proc_pm2.pr.to_interchange_format()
-if not output_folder.exists():
-    output_folder.mkdir()
-pm2.pm2io.write_interchange_format(
-    output_folder / (output_filename + terminology_proc), data_proc_if)
-
-encoding = {var: compression for var in data_proc_pm2.data_vars}
-data_proc_pm2.pr.to_netcdf(
-    output_folder / (output_filename + terminology_proc + ".nc"),
-    encoding=encoding)

+ 0 - 1
UNFCCC_GHG_data/UNFCCC_reader/__init__.py

@@ -1 +0,0 @@
-#

+ 0 - 77
UNFCCC_GHG_data/UNFCCC_reader/read_UNFCCC_submission.py

@@ -1,77 +0,0 @@
-# this script takes submission and country as input (from doit) and
-# runs the appropriate script to extract the submission data
-
-import datalad.api
-import argparse
-from get_submissions_info import get_possible_inputs
-from get_submissions_info import get_possible_outputs
-from UNFCCC_GHG_data.helper import root_path, get_code_file
-
-# Find the right function and possible input and output files and
-# read the data using datalad run.
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country name or code')
-parser.add_argument('--submission', help='Submission to read')
-
-args = parser.parse_args()
-
-country = args.country
-submission = args.submission
-
-
-print(f"Attempting to extract data for {submission} from {country}.")
-print("#"*80)
-print("")
-
-# get the correct script
-script_name = get_code_file(country, submission)
-
-if script_name is not None:
-    print(f"Found code file {script_name}")
-    print("")
-
-    # get possible input files
-    input_files = get_possible_inputs(country, submission)
-    if not input_files:
-        print(f"No possible input files found for {country}, {submission}. "
-              f"Something might be wrong here.")
-    else:
-        print(f"Found the following input_files:")
-        for file in input_files:
-            print(file)
-        print("")
-    # make input files absolute to avoid datalad confusions when
-    # root directory is via symlink
-    input_files = [root_path / file for file in input_files]
-    # convert file's path to str
-    input_files = [file.as_posix() for file in input_files]
-
-    # get possible output files
-    output_files = get_possible_outputs(country, submission)
-    if not output_files:
-        print(f"No possible output files found for {country}, {submission}. "
-              f"This is either the first run or something is wrong.")
-    else:
-        print(f"Found the following output_files:")
-        for file in output_files:
-            print(file)
-        print("")
-    # convert file path's to str
-    output_files = [file.as_posix() for file in output_files]
-
-    print(f"Run the script using datalad run via the python api")
-    datalad.api.run(
-        cmd=f"./venv/bin/python3 {script_name.as_posix()}",
-        dataset=root_path,
-        message=f"Read data for {country}, {submission}.",
-        inputs=input_files,
-        outputs=output_files,
-        dry_run=None,
-        explicit=True,
-    )
-else:
-    # no UNFCCC_GHG_data found.
-    print(f"No code found to read {submission} from {country}")
-    print(f"Use 'doit country_info country={country} to get "
-          f"a list of available submissions and datasets.")
-

+ 0 - 15
UNFCCC_GHG_data/__init__.py

@@ -1,15 +0,0 @@
-####
-
-from . import helper
-from . import UNFCCC_reader
-from . import UNFCCC_CRF_reader
-from . import UNFCCC_DI_reader
-from . import UNFCCC_downloader
-
-__all__ = [
-    "helper",
-    "UNFCCC_reader",
-    "UNFCCC_CRF_reader",
-    "UNFCCC_DI_reader",
-    "UNFCCC_downloader"
-]

+ 0 - 36
UNFCCC_GHG_data/helper/__init__.py

@@ -1,36 +0,0 @@
-from .definitions import root_path, code_path, log_path
-from .definitions import extracted_data_path, extracted_data_path_UNFCCC
-from .definitions import legacy_data_path
-from .definitions import downloaded_data_path, downloaded_data_path_UNFCCC
-from .definitions import dataset_path, dataset_path_UNFCCC
-from .definitions import custom_country_mapping, custom_folders
-from .definitions import GWP_factors, gas_baskets
-from .definitions import compression
-from .functions import get_country_code, get_country_name, convert_categories
-from .functions import create_folder_mapping, process_data_for_country, get_code_file
-from .functions import fix_rows, make_wide_table
-
-__all__ = [
-    "root_path",
-    "code_path",
-    "log_path",
-    "extracted_data_path",
-    "extracted_data_path_UNFCCC",
-    "legacy_data_path",
-    "downloaded_data_path",
-    "downloaded_data_path_UNFCCC",
-    "dataset_path",
-    "dataset_path_UNFCCC",
-    "custom_country_mapping",
-    "custom_folders",
-    "GWP_factors",
-    "gas_baskets",
-    "get_country_code",
-    "get_country_name",
-    "convert_categories",
-    "create_folder_mapping",
-    "process_data_for_country",
-    "fix_rows",
-    "make_wide_table"
-    "compression",
-]

+ 0 - 22
UNFCCC_GHG_data/helper/country_info.py

@@ -1,22 +0,0 @@
-# this script takes country as input (from doit) and
-# runs displays available submissions and datasets
-
-import argparse
-from UNFCCC_GHG_data.helper.functions import get_country_submissions
-from UNFCCC_GHG_data.helper.functions import get_country_datasets
-
-# Find the right function and possible input and output files and
-# read the data using datalad run.
-parser = argparse.ArgumentParser()
-parser.add_argument('--country', help='Country name or UNFCCC_GHG_data')
-args = parser.parse_args()
-country = args.country
-
-# print available submissions
-print("="*15 + " Available submissions " + "="*15)
-get_country_submissions(country, True)
-print("")
-
-#print available datasets
-print("="*15 + " Available datasets " + "="*15)
-get_country_datasets(country, True)

+ 0 - 173
UNFCCC_GHG_data/helper/definitions.py

@@ -1,173 +0,0 @@
-import os
-from pathlib import Path
-
-
-def get_root_path() -> Path:
-    """ get the root_path from an environment variable """
-    root_path_env = os.getenv('UNFCCC_GHG_ROOT_PATH', None)
-    if root_path_env is None:
-        raise ValueError('UNFCCC_GHG_ROOT_PATH environment variable needs to be set')
-    else:
-        root_path = Path(root_path_env).resolve()
-    return root_path
-
-root_path = get_root_path()
-code_path = root_path / "UNFCCC_GHG_data"
-log_path = root_path / "log"
-extracted_data_path = root_path / "extracted_data"
-extracted_data_path_UNFCCC = extracted_data_path / "UNFCCC"
-downloaded_data_path = root_path / "downloaded_data"
-downloaded_data_path_UNFCCC = downloaded_data_path / "UNFCCC"
-legacy_data_path = root_path / "legacy_data"
-dataset_path = root_path / "datasets"
-dataset_path_UNFCCC = dataset_path / "UNFCCC"
-
-
-custom_country_mapping = {
-    "EUA": "European Union",
-    "EUC": "European Union",
-    "FRK": "France",
-    "DKE": "Denmark",
-    "DNM": "Denmark",
-    "GBK": "United Kingdom of Great Britain and Northern Ireland",
-}
-
-custom_folders = {
-    'Venezeula_(Bolivarian_Republic_of)': 'VEN',
-    'Venezuela_(Bolivarian_Republic_of)': 'VEN',
-    'Micronesia_(Federated_State_of)': 'FSM',
-    'Micronesia_(Federated_States_of)': 'FSM',
-    'The_Republic_of_North_Macedonia': 'MKD',
-    'Republic_of_Korea': 'KOR',
-    'Bolivia_(Plurinational_State_of)': 'BOL',
-    'Türkiye': 'TUR',
-    'Iran_(Islamic_Republic_of)': 'IRN',
-    'Côte_d’Ivoire': 'CIV',
-    'Democratic_Republic_of_the_Congo': "COD",
-    'European_Union': 'EUA',
-    'Taiwan': 'TWN',
-}
-
-GWP_factors = {
-    'SARGWP100_to_AR4GWP100': {
-        'HFCS': 1.1,
-        'PFCS': 1.1,
-        'UnspMixOfHFCs': 1.1,
-        'UnspMixOfPFCs': 1.1,
-        'FGASES': 1.1,
-        'other halogenated gases': 1.1,
-    },
-    'SARGWP100_to_AR5GWP100': {
-        'HFCS': 1.2,
-        'PFCS': 1.2,
-        'UnspMixOfHFCs': 1.2,
-        'UnspMixOfPFCs': 1.2,
-        'FGASES': 1.2,
-        'other halogenated gases': 1.2,
-    },
-    'SARGWP100_to_AR6GWP100': {
-        'HFCS': 1.4,
-        'PFCS': 1.3,
-        'UnspMixOfHFCs': 1.4,
-        'UnspMixOfPFCs': 1.3,
-        'FGASES': 1.35,
-        'other halogenated gases': 1.35,
-    },
-    'AR4GWP100_to_SARGWP100': {
-        'HFCS': 0.91,
-        'PFCS': 0.91,
-        'UnspMixOfHFCs': 0.91,
-        'UnspMixOfPFCs': 0.91,
-        'FGASES': 0.91,
-        'other halogenated gases': 0.91,
-    },
-    'AR4GWP100_to_AR5GWP100': {
-        'HFCS': 1.1,
-        'PFCS': 1.1,
-        'UnspMixOfHFCs': 1.1,
-        'UnspMixOfPFCs': 1.1,
-        'FGASES': 1.1,
-        'other halogenated gases': 1.1,
-    },
-    'AR4GWP100_to_AR6GWP100': {
-        'HFCS': 1.27,
-        'PFCS': 1.18,
-        'UnspMixOfHFCs': 1.27,
-        'UnspMixOfPFCs': 1.18,
-        'FGASES': 1.23,
-        'other halogenated gases': 1.23,
-    },
-    'AR5GWP100_to_SARGWP100': {
-        'HFCS': 0.83,
-        'PFCS': 0.83,
-        'UnspMixOfHFCs': 0.83,
-        'UnspMixOfPFCs': 0.83,
-        'FGASES': 0.83,
-        'other halogenated gases': 0.83,
-    },
-    'AR5GWP100_to_AR4GWP100': {
-        'HFCS': 0.91,
-        'PFCS': 0.91,
-        'UnspMixOfHFCs': 0.91,
-        'UnspMixOfPFCs': 0.91,
-        'FGASES': 0.91,
-        'other halogenated gases': 0.91,
-    },
-    'AR5GWP100_to_AR6GWP100': {
-        'HFCS': 1.17,
-        'PFCS': 1.08,
-        'UnspMixOfHFCs': 1.17,
-        'UnspMixOfPFCs': 1.08,
-        'FGASES': 1.125,
-        'other halogenated gases': 1.125,
-    },
-}
-
-gas_baskets = {
-    'HFCS (SARGWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
-                         'HFC134a', 'HFC143',  'HFC143a', 'HFC152', 'HFC152a',
-                         'HFC227ea', 'HFC161', 'HFC227EA', 'HFC236cb', 'HFC236ea',
-                         'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc', 'HFC404a',
-                         'HFC407c', 'HFC410a', 'HFC4310mee',
-                         'UnspMixOfHFCs (SARGWP100)'],
-    'HFCS (AR4GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
-                         'HFC134a', 'HFC143',  'HFC143a', 'HFC152', 'HFC152a',
-                         'HFC227ea', 'HFC161', 'HFC227EA', 'HFC236cb', 'HFC236ea',
-                         'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc', 'HFC404a',
-                         'HFC407c', 'HFC410a', 'HFC4310mee',
-                         'UnspMixOfHFCs (AR4GWP100)'],
-    'HFCS (AR5GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
-                         'HFC134a', 'HFC143',  'HFC143a', 'HFC152', 'HFC152a',
-                         'HFC227ea', 'HFC161', 'HFC227EA', 'HFC236cb', 'HFC236ea',
-                         'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc', 'HFC404a',
-                         'HFC407c', 'HFC410a', 'HFC4310mee',
-                         'UnspMixOfHFCs (AR5GWP100)'],
-    'HFCS (AR6GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
-                         'HFC134a', 'HFC143',  'HFC143a', 'HFC152', 'HFC152a',
-                         'HFC227ea', 'HFC161', 'HFC227EA', 'HFC236cb', 'HFC236ea',
-                         'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc', 'HFC404a',
-                         'HFC407c', 'HFC410a', 'HFC4310mee',
-                         'UnspMixOfHFCs (AR6GWP100)'],
-    'PFCS (SARGWP100)': ['CF4', 'C2F6', 'C3F8', 'C4F10', 'C5F12', 'C6F14',
-                         'C10F18', 'cC3F6', 'cC4F8', 'UnspMixOfPFCs (SARGWP100)'],
-    'PFCS (AR4GWP100)': ['CF4', 'C2F6', 'C3F8', 'C4F10', 'C5F12', 'C6F14',
-                         'C10F18', 'cC3F6', 'cC4F8', 'UnspMixOfPFCs (AR4GWP100)'],
-    'PFCS (AR5GWP100)': ['CF4', 'C2F6', 'C3F8', 'C4F10', 'C5F12', 'C6F14',
-                         'C10F18', 'cC3F6', 'cC4F8', 'UnspMixOfPFCs (AR5GWP100)'],
-    'PFCS (AR6GWP100)': ['CF4', 'C2F6', 'C3F8', 'C4F10', 'C5F12', 'C6F14',
-                         'C10F18', 'cC3F6', 'cC4F8', 'UnspMixOfPFCs (AR6GWP100)'],
-    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
-    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
-    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (SARGWP100)',
-                          'PFCS (SARGWP100)'],
-    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR4GWP100)',
-                          'PFCS (AR4GWP100)'],
-    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR5GWP100)',
-                            'PFCS (AR5GWP100)'],
-    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR6GWP100)',
-                            'PFCS (AR6GWP100)'],
-}
-
-compression = dict(zlib=True, complevel=9)

+ 0 - 23
UNFCCC_GHG_data/helper/folder_mapping.py

@@ -1,23 +0,0 @@
-# this script takes a folder as input (from doit) and
-# runs creates the mapping of subfolders to country codes
-# oir that folder
-
-import argparse
-from UNFCCC_GHG_data.helper import create_folder_mapping
-
-# Find the right function and possible input and output files and
-# read the data using datalad run.
-parser = argparse.ArgumentParser()
-parser.add_argument('--folder', help='folder name, relative to '
-                                     'repository root folder')
-args = parser.parse_args()
-folder = args.folder
-
-if 'extracted_data' in folder:
-    extracted = True
-else:
-    extracted = False
-
-# print available submissions
-print("="*10 + f" Creating folder mapping for  {folder} " + "="*10)
-create_folder_mapping(folder, extracted)

+ 0 - 160
UNFCCC_GHG_data/helper/functions_temp.py

@@ -1,160 +0,0 @@
-"""Temporary file for new functions to avoid merging issues due to different automatic formatting. Delete after merge."""
-
-import pandas as pd
-import warnings
-import numpy as np
-
-
-
-def find_and_replace_values(
-    df: pd.DataFrame,
-    replace_info: list[tuple[str | float]],
-    category_column: str,
-    entity_column: str = "entity",
-) -> pd.DataFrame:
-    """
-    Find values and replace single values in a dataframe.
-
-    Input
-    -----
-    df
-        Input data frame
-    replace_info
-        Category, entity, year, and new value. Don't put a new value if you would like to replace with nan.
-        For example [("3.C", "CO", "2019", 3.423)] or [("3.C", "CO", "2019")]
-    category_column
-        The name of the column that contains the categories.
-    entity_column
-        The name of the column that contains the categories.
-
-    Output
-    ------
-        Data frame with updated values.
-
-    """
-    for replace_info_value in replace_info:
-        category = replace_info_value[0]
-        entity = replace_info_value[1]
-        year = replace_info_value[2]
-
-        if len(replace_info_value) == 4:
-            new_value = replace_info_value[3]
-        elif len(replace_info_value) == 3:
-            new_value = np.nan
-        else:
-            raise AssertionError(
-                f"Expected tuple of length 3 or 4. Got {replace_info_value}"
-            )
-
-        index = df.loc[
-            (df[category_column] == category) & (df[entity_column] == entity),
-        ].index[0]
-
-        # pandas recommends using .at[] for changing single values
-        df.at[index, year] = new_value
-        print(f"Set value for {category}, {entity}, {year} to {new_value}.")
-
-    return df
-
-
-def assert_values(
-        df: pd.DataFrame,
-        test_case: tuple[str | float | int],
-        category_column: str = "category (IPCC1996_2006_GIN_Inv)",
-        entity_column: str = "entity",
-) -> None:
-    """
-    Check if a value in a dataframe matches the expected value.
-    Input
-    -----
-    df
-        The data frame to check.
-    test_case
-        The combination of parameters and the expected value.
-        Use the format (<category>, <entity>, <year>, <expected_value>).
-    category_column
-        The columns where to look for the category.
-    entity_column
-        The column where to look for the entity.
-    """
-    category = test_case[0]
-    entity = test_case[1]
-    year = test_case[2]
-    expected_value = test_case[3]
-
-    assert isinstance(expected_value, (float, int)), "This function only works for numbers. Use assert_nan_values to check for NaNs and empty values."
-
-    arr = df.loc[
-        (df[category_column] == category) & (df[entity_column] == entity), year
-    ].values
-
-    # Assert the category exists in the data frame
-    assert (
-            category in df[category_column].unique()
-    ), f"{category} is not a valid category. Choose from {df[category_column].unique()}"
-
-    # Assert the entity exists in the data frame
-    assert (
-            entity in df[entity_column].unique()
-    ), f"{entity} is not a valid entity. Choose from {df[entity_column].unique()}"
-
-    assert (
-            arr.size > 0
-    ), f"No value found for category {category}, entity {entity}, year {year}!"
-
-    assert (
-            arr.size <= 1
-    ), f"More than one value found for category {category}, entity {entity}, year {year}!"
-
-    assert (
-            arr[0] == test_case[3]
-    ), f"Expected value {expected_value}, actual value is {arr[0]}"
-
-    print(
-        f"Value for category {category}, entity {entity}, year {year} is as expected."
-    )
-
-def assert_nan_values(
-        df: pd.DataFrame,
-        test_case: tuple[str, ...],
-        category_column: str = "category (IPCC1996_2006_GIN_Inv)",
-        entity_column: str = "entity",
-) -> None:
-    """
-    Check if values that are empty or NE or NE1 in the PDF tables
-    are not present in the dataset.
-
-    Input
-    -----
-    df
-        The data frame to check.
-    test_case
-        The combination of input parameters.
-        Use the format (<category>, <entity>, <year>).
-    category_column
-        The columns where to look for the category.
-    entity_column
-        The column where to look for the entity.
-
-    """
-    category = test_case[0]
-    entity = test_case[1]
-    year = test_case[2]
-
-    if category not in df[category_column].unique():
-        warning_string = f"{category} is not in the data set. Either all values for this category are NaN or the category never existed in the data set."
-        warnings.warn(warning_string)
-        return
-
-    if entity not in df[entity_column].unique():
-        warning_string = f"{entity} is not in the data set. Either all values for this entity are NaN or the category never existed in the data set."
-        warnings.warn(warning_string)
-        return
-
-    arr = df.loc[
-        (df[category_column] == category) & (df[entity_column] == entity), year
-    ].values
-
-    assert np.isnan(arr[0]), f"Value is {arr[0]} and not NaN."
-
-    print(f"Value for category {category}, entity {entity}, year {year} is NaN.")

+ 41 - 0
changelog/README.md

@@ -0,0 +1,41 @@
+# CHANGELOG
+
+This directory contains "news fragments", i.e. short files that contain a small markdown-formatted bit of text that will be
+added to the CHANGELOG when it is next compiled.
+
+The CHANGELOG will be read by users, so this description should be aimed to Country greenhouse gas data submitted to the UNFCCC users instead of
+describing internal changes which are only relevant to developers. Merge requests in combination with our git history provides additional
+developer-centric information.
+
+Make sure to use phrases in the past tense and use punctuation, examples:
+
+```
+Improved verbose diff output with sequences.
+
+Terminal summary statistics now use multiple colors.
+```
+
+Each file should have a name of the form `<MR>.<TYPE>.md`, where `<MR>` is the merge request number, and `<TYPE>` is one of:
+
+* `feature`: new user facing features, like new command-line options and new behaviour.
+* `improvement`: improvement of existing functionality, usually without requiring user intervention
+* `fix`: fixes a bug.
+* `docs`: documentation improvement, like rewording an entire section or adding missing docs.
+* `deprecation`: feature deprecation.
+* `breaking`: a change which may break existing uses, such as feature removal or behaviour change.
+* `trivial`: fixing a small typo or internal change that might be noteworthy.
+
+So for example: `123.feature.md`, `456.fix.md`.
+
+Since you need the merge request number for the filename, you must submit a MR first. From this MR, you can get the MR number and then create the news file. A single MR can also have multiple news items, for example a given MR may add a feature as well as
+deprecate some existing functionality.
+
+If you are not sure what issue type to use, don't hesitate to ask in your MR.
+
+`towncrier` preserves multiple paragraphs and formatting (code blocks, lists, and so on), but for entries other than
+features it is usually better to stick to a single paragraph to keep it concise. You may also use `MyST` [style
+cross-referencing](https://myst-parser.readthedocs.io/en/latest/syntax/cross-referencing.html) within your news items to link to other
+documentation.
+
+You can also run `towncrier build --draft` to see the draft changelog that will be appended to [docs/source/changelog.md]()
+on the next release.

Niektóre pliki nie zostały wyświetlone z powodu dużej ilości zmienionych plików