浏览代码

Merge remote-tracking branch 'origin/master'

Johannes 1 年之前
父节点
当前提交
be007ea29b
共有 100 个文件被更改,包括 5675 次插入423 次删除
  1. 352 106
      UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_config.py
  2. 91 7
      UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_core.py
  3. 9 2
      UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_proc.py
  4. 1 1
      UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_group_datalad.py
  5. 0 3
      UNFCCC_GHG_data/UNFCCC_reader/Argentina/read_ARG_BUR4_from_pdf.py
  6. 12 8
      UNFCCC_GHG_data/UNFCCC_reader/Chile/read_CHL_BUR5_from_xlsx.py
  7. 2 2
      UNFCCC_GHG_data/UNFCCC_reader/Indonesia/read_IDN_BUR3_from_pdf.py
  8. 430 0
      UNFCCC_GHG_data/UNFCCC_reader/Israel/config_ISR_BUR2.py
  9. 299 0
      UNFCCC_GHG_data/UNFCCC_reader/Israel/read_ISR_BUR2_from_pdf.py
  10. 676 0
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR3.py
  11. 402 0
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR4.py
  12. 211 0
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR3_from_pdf.py
  13. 214 0
      UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR4_from_pdf.py
  14. 433 0
      UNFCCC_GHG_data/UNFCCC_reader/Nigeria/config_NGA_BUR2.py
  15. 228 0
      UNFCCC_GHG_data/UNFCCC_reader/Nigeria/read_NGA_BUR2_from_pdf.py
  16. 493 0
      UNFCCC_GHG_data/UNFCCC_reader/Singapore/config_SGP_BUR5.py
  17. 260 0
      UNFCCC_GHG_data/UNFCCC_reader/Singapore/read_SGP_BUR5_from_pdf.py
  18. 1 0
      UNFCCC_GHG_data/UNFCCC_reader/Taiwan/read_TWN_2022-Inventory_from_pdf.py
  19. 363 0
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR3.py
  20. 381 0
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR4.py
  21. 106 267
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR3_from_pdf.py
  22. 225 0
      UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR4_from_pdf.py
  23. 4 0
      UNFCCC_GHG_data/UNFCCC_reader/folder_mapping.json
  24. 3 3
      UNFCCC_GHG_data/UNFCCC_reader/read_UNFCCC_submission.py
  25. 5 0
      UNFCCC_GHG_data/helper/__init__.py
  26. 108 0
      UNFCCC_GHG_data/helper/definitions.py
  27. 74 13
      UNFCCC_GHG_data/helper/functions.py
  28. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.csv
  29. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.nc
  30. 41 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.yaml
  31. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-17.csv
  32. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-17.nc
  33. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-17.yaml
  34. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18.csv
  35. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18.nc
  36. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18.yaml
  37. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18_raw.csv
  38. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18_raw.nc
  39. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18_raw.yaml
  40. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.csv
  41. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.nc
  42. 41 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.yaml
  43. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.csv
  44. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.nc
  45. 40 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.yaml
  46. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_ef19c9a21441456740388c14aa7fe3e7_hash.csv
  47. 1 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_ef19c9a21441456740388c14aa7fe3e7_hash.nc
  48. 41 0
      datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_ef19c9a21441456740388c14aa7fe3e7_hash.yaml
  49. 1 0
      downloaded_data/UNFCCC/00_new_downloads_BUR-2023-07-17.csv
  50. 1 0
      downloaded_data/UNFCCC/00_new_downloads_CRF2023-2023-07-17.csv
  51. 1 0
      downloaded_data/UNFCCC/00_new_downloads_NC-2023-07-17.csv
  52. 1 0
      downloaded_data/UNFCCC/Austria/CRF2023/asr2023_AUT.pdf
  53. 1 0
      downloaded_data/UNFCCC/Belarus/CRF2023/asr2023_BLR.pdf
  54. 1 0
      downloaded_data/UNFCCC/Canada/CRF2023/asr2023_CAN_0.pdf
  55. 1 0
      downloaded_data/UNFCCC/Cyprus/CRF2023/asr2023_CYP.pdf
  56. 1 0
      downloaded_data/UNFCCC/Guatemala/BUR1/2023_1IBA_GT.pdf
  57. 1 0
      downloaded_data/UNFCCC/Ireland/CRF2023/asr2023_IRL.pdf
  58. 1 0
      downloaded_data/UNFCCC/Kazakhstan/CRF2023/asr2023_KAZ.pdf
  59. 1 0
      downloaded_data/UNFCCC/Peru/BUR3/Tercer_BUR_Per%C3%BA_Jun2023.pdf
  60. 1 0
      downloaded_data/UNFCCC/Republic_of_Korea/BUR4/1092386_Republic_of_Korea-BUR4-3-Fourth_Biennial_Update_Report_of_the_Republic_of_Korea_rev.pdf
  61. 1 0
      downloaded_data/UNFCCC/Russian_Federation/CRF2023/asr2023_RUS.pdf
  62. 1 0
      downloaded_data/UNFCCC/Sweden/CRF2023/asr2023_SWE.pdf
  63. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/THA_indirect_2000-2019.csv
  64. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/THA_inventory_2019.csv
  65. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/THA_trends_2000-2019.csv
  66. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/indirect.pdf
  67. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/indirect_ocr.pdf
  68. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/inventory_2019.pdf
  69. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/inventory_2019_ocr.pdf
  70. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/trends.pdf
  71. 1 0
      downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/trends_ocr.pdf
  72. 1 0
      downloaded_data/UNFCCC/Türkiye/CRF2023/asr2023_TUR.pdf
  73. 1 0
      downloaded_data/UNFCCC/Ukraine/CRF2023/asr2023_UKR.pdf
  74. 1 1
      downloaded_data/UNFCCC/submissions-annexI_2023.csv
  75. 1 1
      downloaded_data/UNFCCC/submissions-bur.csv
  76. 1 1
      extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-05-24.csv
  77. 1 1
      extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-05-24.nc
  78. 1 1
      extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-05-24.yaml
  79. 1 0
      extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-07-18_raw.csv
  80. 1 0
      extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-07-18_raw.nc
  81. 1 0
      extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-07-18_raw.yaml
  82. 1 0
      extracted_data/UNFCCC/Afghanistan/AFG_DI_88365af8429188c90d963dc666e57b6e_hash.csv
  83. 1 0
      extracted_data/UNFCCC/Afghanistan/AFG_DI_88365af8429188c90d963dc666e57b6e_hash.nc
  84. 31 0
      extracted_data/UNFCCC/Afghanistan/AFG_DI_88365af8429188c90d963dc666e57b6e_hash.yaml
  85. 1 1
      extracted_data/UNFCCC/Albania/ALB_DI_2023-05-24.csv
  86. 1 1
      extracted_data/UNFCCC/Albania/ALB_DI_2023-05-24.nc
  87. 1 1
      extracted_data/UNFCCC/Albania/ALB_DI_2023-05-24.yaml
  88. 1 0
      extracted_data/UNFCCC/Albania/ALB_DI_2023-07-18_raw.csv
  89. 1 0
      extracted_data/UNFCCC/Albania/ALB_DI_2023-07-18_raw.nc
  90. 1 0
      extracted_data/UNFCCC/Albania/ALB_DI_2023-07-18_raw.yaml
  91. 1 0
      extracted_data/UNFCCC/Albania/ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.csv
  92. 1 0
      extracted_data/UNFCCC/Albania/ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.nc
  93. 31 0
      extracted_data/UNFCCC/Albania/ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.yaml
  94. 1 1
      extracted_data/UNFCCC/Algeria/DZA_DI_2023-05-24.csv
  95. 1 1
      extracted_data/UNFCCC/Algeria/DZA_DI_2023-05-24.nc
  96. 1 1
      extracted_data/UNFCCC/Algeria/DZA_DI_2023-05-24.yaml
  97. 1 0
      extracted_data/UNFCCC/Algeria/DZA_DI_2023-07-18_raw.csv
  98. 1 0
      extracted_data/UNFCCC/Algeria/DZA_DI_2023-07-18_raw.nc
  99. 1 0
      extracted_data/UNFCCC/Algeria/DZA_DI_2023-07-18_raw.yaml
  100. 1 0
      extracted_data/UNFCCC/Algeria/DZA_DI_8f53edd26fd8bb6afdb774fb001c25de_hash.csv

+ 352 - 106
UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_config.py

@@ -1,4 +1,5 @@
-# TODO: move gas baskets to helper
+# TODO: check if downscaling respects gas basket resolution for GWP transformation
+# TODO: why is albania IPPU KYOTOGHG 0 in 2005
 
 di_query_filters = [
     'classifications', 'measures', 'gases',
@@ -27,6 +28,8 @@ filter_activity_factors = {
 cat_code_regexp = r'(?P<code>^(([0-9][A-Za-z0-9\.]{0,10}[0-9A-Za-z]))|([0-9]))[' \
                   r'\s\.].*'
 
+gwp_to_use = 'SARGWP100'
+
 # PRIMAP2 interchange format config
 di_to_pm2if_template_nai = {
     "coords_cols": {
@@ -57,18 +60,18 @@ di_to_pm2if_template_nai = {
     # mapping of values e.g. gases to the primap2 format
     "coords_value_mapping": {
         "entity": {
-            "Aggregate GHGs (SARGWP100)": "KYOTOGHG (SARGWP100)",
-            "Aggregate F-gases (SARGWP100)": "FGASES (SARGWP100)",
-            "HFCs (SARGWP100)": "HFCS (SARGWP100)",
-            "PFCs (SARGWP100)": "PFCS (SARGWP100)",
-            #"SF6 (SARGWP100)": "SF6 (SARGWP100)",
-            #"CH4 (SARGWP100)": "CH4 (SARGWP100)",
-            "CO2 (SARGWP100)": "CO2",
-            #"N2O (SARGWP100)": "N2O (SARGWP100)",
-            #"Unspecified mix of HFCs and PFCs (SARGWP100)":
-            #    "UnspMixOfHFCsPFCs (SARGWP100)",
-            "Unspecified mix of HFCs (SARGWP100)": "UnspMixOfHFCs (SARGWP100)",
-            "Unspecified mix of PFCs (SARGWP100)": "UnspMixOfPFCs (SARGWP100)",
+            f"Aggregate GHGs ({gwp_to_use})": f"KYOTOGHG ({gwp_to_use})",
+            f"Aggregate F-gases ({gwp_to_use})": f"FGASES ({gwp_to_use})",
+            f"HFCs ({gwp_to_use})": f"HFCS ({gwp_to_use})",
+            f"PFCs ({gwp_to_use})": f"PFCS ({gwp_to_use})",
+            #f"SF6 ({gwp_to_use})": f"SF6 ({gwp_to_use})",
+            #f"CH4 ({gwp_to_use})": f"CH4 ({gwp_to_use})",
+            f"CO2 ({gwp_to_use})": "CO2",
+            #f"N2O ({gwp_to_use})": f"N2O ({gwp_to_use})",
+            #f"Unspecified mix of HFCs and PFCs ({gwp_to_use})":
+            #    f"UnspMixOfHFCsPFCs ({gwp_to_use})",
+            f"Unspecified mix of HFCs ({gwp_to_use})": f"UnspMixOfHFCs ({gwp_to_use})",
+            f"Unspecified mix of PFCs ({gwp_to_use})": f"UnspMixOfPFCs ({gwp_to_use})",
             "HFC-23": "HFC23",
             "HFC-32": "HFC32",
             "HFC-41": "HFC41",
@@ -277,6 +280,7 @@ cat_conversion = {
             '1.B': '1.B',
             '1.B.1': '1.B.1',
             '1.B.2': '1.B.2',
+            '2': '2',
             '2.A': '2.A',
             '2.B': 'M.2.B_2.B',
             '2.C': '2.C',
@@ -309,8 +313,8 @@ cat_conversion = {
         'aggregate': {
             '2.B': {'sources': ['M.2.B_2.B', 'M.2.B_2.E'], 'name': 'Chemical Industry'},
             '2.H': {'sources': ['M.2.H.1_2', '2.H.3'], 'name': 'Other'},
-            '2': {'sources': ['2.A', '2.B', '2.C', '2.F', '2.H'],
-                  'name': 'Industrial Processes and Product Use'},
+            #'2': {'sources': ['2.A', '2.B', '2.C', '2.F', '2.H'],
+            #      'name': 'Industrial Processes and Product Use'},
             '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
             '3.C.1': {'sources': ['3.C.1.b', '3.C.1.c'],
                          'name': 'Emissions from biomass burning'},
@@ -330,6 +334,45 @@ cat_conversion = {
 di_processing_templates = {
     # templates fro the DI processing. Most processing rules will apply to several
     # versions. So we store them here and refer to them in the processing info dict
+    # general templates
+    'general': {
+        'copyUnspHFCUnspPFC': {
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
+                'source_GWP': gwp_to_use,
+            },
+        },
+        'copyUnspHFC': {
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
+        },
+        'copyHFCPFC': {
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["HFCS", "PFCS"],
+                'source_GWP': gwp_to_use,
+            },
+        },
+        'copyPFC': {
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["PFCS"],
+                'source_GWP': gwp_to_use,
+            },
+        },
+        'copyFGASES': {
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["FGASES"],
+                'source_GWP': gwp_to_use,
+            },
+        },
+    },
+    # country templates
     #AFG: not needed (newer data in BUR1), 2005, 2013 only
     #AGO: 2000, 2005 only (external key needed for some gases / sectors)
     'ALB': {
@@ -338,18 +381,18 @@ di_processing_templates = {
             'remove_ts': {
                 '2.A_H': { # looks wrong in 2005
                     'category': ['2.A', '2.B', '2.C', '2.D', '2.G'],
-                    'entities': ['CO2', 'KYOTOGHG (SARGWP100)'],
+                    'entities': ['CO2', f'KYOTOGHG ({gwp_to_use})'],
                         'time': ['2005'],
                 },
-                'Bunkers': { # Aviation and marine swappen in 2005
+                'Bunkers': { # Aviation and marine swapped in 2005
                     'category': ['14423', '14424'],
-                    'entities': ['KYOTOGHG (SARGWP100)'],
+                    'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'time': ['2005'],
                 },
                 'Bunkers_CH4': { # 2005 looks all wrong (swap in activity data not
                     # result?)
                     'category': ['14423', '14424', '14637'],
-                    'entities': ['CH4', 'KYOTOGHG (SARGWP100)', 'N2O'],
+                    'entities': ['CH4', f'KYOTOGHG ({gwp_to_use})', 'N2O'],
                         'time': ['2005'],
                 },
             },
@@ -424,6 +467,11 @@ di_processing_templates = {
                     },
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
+                'source_GWP': gwp_to_use,
+            },
         }
     },
     #AND: no data
@@ -453,9 +501,13 @@ di_processing_templates = {
             'remove_ts': {
                 '1.A.1': { #contains data for all subsectors
                     'category': ['1.A.1'],
-                    'entities': ['CH4', 'KYOTOGHG (SARGWP100)'],
-                        'time': ['1990', '2000', '2005', '2006', '2007', '2008', '2009',
-                                 '2010', '2011', '2012'],
+                    'entities': ['CH4', f'KYOTOGHG ({gwp_to_use})'],
+                    'time': ['1990', '2000', '2005', '2006', '2007', '2008', '2009',
+                             '2010', '2011', '2012'],
+                },
+                'pfcs': { # only HFCs in other years, likely wrong
+                    'entities': [f'PFCS ({gwp_to_use})'],
+                    'time': ['1991', '1992', '1993', '1994'],
                 },
             },
             'downscale': { # needed for 1990, 2000, 2005-2012
@@ -470,8 +522,29 @@ di_processing_templates = {
                         'skipna': True,
                     },
                 },
+                'entities': {
+                    'FGASES': {
+                        'basket': f'FGASES ({gwp_to_use})',
+                        'basket_contents': [f'HFCS ({gwp_to_use})'],
+                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
+                                         '1995']},
+                    },
+                    'HFC': {
+                        'basket': f'HFCS ({gwp_to_use})',
+                        'basket_contents': [f'UnspMixOfHFCs ({gwp_to_use})'],
+                        'sel': {'time': ['1990', '1991', '1992', '1993', '1994',
+                                         '1995', '2000', '2001', '2002', '2003',
+                                         '2004', '2005', '2006', '2007', '2008',
+                                         '2009', '2010', '2012']},
+                    },
+                },
             },
-        }
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
+        },
     },
     # BDI 1998, 2005, 2010, 2015 # data coverage is a bit inconsistent
     # BEN 1995, 2000 # data coverage a bit inconsistent
@@ -480,6 +553,11 @@ di_processing_templates = {
             # and missing sectors (e.g. 1,2 for CH4, N2O), Agri. burning (4.E,
             # 4.F) missing for 2008-2017
             'remove_years': ['2007'],
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # BGD 1994, 2001, 2005; coverage mostly consistent but not fully
@@ -492,7 +570,7 @@ di_processing_templates = {
                     '4': { # 1994
                         'basket': '4',
                         'basket_contents': ['4.A', '4.B', '4.D', '4.G'],
-                        'entities': ['CH4', 'CO2', 'KYOTOGHG (SARGWP100)'], # no N2O but
+                        'entities': ['CH4', 'CO2', f'KYOTOGHG ({gwp_to_use})'], # no N2O but
                         # CO2 is unusual
                         'dim': 'category (BURDI)',
                         'skipna_evaluation_dims': None,
@@ -580,7 +658,7 @@ di_processing_templates = {
                 },
                 'entities': {  # 2002-2014
                     'KYOTO': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CH4', 'CO2', 'N2O'],
                         'sel': {'category (BURDI)':
                                     ['1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
@@ -629,7 +707,7 @@ di_processing_templates = {
                     '5_2000': {
                         'basket': '5',
                         'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2009', '2010']},
@@ -639,7 +717,7 @@ di_processing_templates = {
                 },
                 'entities': {  # 2000-2010 (1997 as key)
                     'KYOTO': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CO2', 'CH4', 'N2O'],
                         'sel': {'category (BURDI)':
                                     ['1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
@@ -733,8 +811,27 @@ di_processing_templates = {
                         'skipna': True,
                     },
                 },
+                'entities': {
+                    'HFC': {
+                        'basket': f'HFCS ({gwp_to_use})',
+                        'basket_contents': ['HFC125', 'HFC134a', 'HFC143a', 'HFC152a',
+                                            'HFC227ea', 'HFC23', 'HFC236fa', 'HFC32',
+                                            f'UnspMixOfHFCs ({gwp_to_use})'],
+                        'sel': {'time': ['2005', '2010']},
+                    },
+                    'PFC': {
+                        'basket': f'PFCS ({gwp_to_use})',
+                        'basket_contents': ['C2F6', 'CF4'],
+                        'sel': {'time': ['2005', '2010']},
+                    },
+                },
             },
-        }
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
+        },
     },
     'CIV' :{
         'DI2023-05-24': { #1994 (needs some downscaling), 2000
@@ -743,13 +840,18 @@ di_processing_templates = {
                     '1.A': { # 2005
                         'basket': '1.A',
                         'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': ['CO2', 'CH4', 'N2O', 'KYOTOGHG (SARGWP100)'],
+                        'entities': ['CO2', 'CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'skipna_evaluation_dims': None,
                         'skipna': True,
                     },
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["FGASES"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # CMR: 1994, 2000, not fully consistent
@@ -862,7 +964,7 @@ di_processing_templates = {
                     '2': {
                         'basket': '2',
                         'basket_contents': ['2.A', '2.F'],
-                        'entities': ['CO2', 'HFCS (SARGWP100)'],
+                        'entities': ['CO2', f'HFCS ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     'bunkers': {
@@ -873,6 +975,11 @@ di_processing_templates = {
                     },
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # DOM: # 1990, 1994, 1998, 2000, 2010
@@ -885,51 +992,51 @@ di_processing_templates = {
                     '1': {
                         'basket': '1',
                         'basket_contents': ['1.A', '1.B'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     '1.A': {
                         'basket': '1.A',
                         'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4',
                                             '1.A.5'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     '1.B': {
                         'basket': '1.B',
                         'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     '2': {
                         'basket': '2',
                         'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.G'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     '4': {
                         'basket': '4',
                         'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E',
                                             '4.F', '4.G'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     '5': {
                         'basket': '5',
                         'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     '6': {
                         'basket': '6',
                         'basket_contents': ['6.A', '6.B', '6.D'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                 },
                 'entities': {
                     'KYOTO': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CH4', 'CO2', 'N2O'],
                         'sel': {'category (BURDI)':
                                     ['15163', '24540',
@@ -949,13 +1056,9 @@ di_processing_templates = {
         'DI2023-05-24': { # 1990, 2000, 2005
             #omit aerosols / GHG precursosrs in downscaling
             'remove_ts': {
-                '2.H': { # all in 2.H in 1990
-                        'category': ['2.H'],
-                        'entities': ['KYOTOGHG (AR4GWP100)', 'CH4', 'CO2', 'N2O'],
-                    },
-                '2': { # all in 2.H in 1990
-                        'category': ['2.H'],
-                        'entities': ['KYOTOGHG (AR4GWP100)', 'CH4'],
+                '2.G': { # all in 2.G in 1990
+                        'category': ['2.G'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})', 'CO2', 'N2O'],
                     },
             },
             'downscale': {
@@ -968,9 +1071,13 @@ di_processing_templates = {
                     },
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
-    # EGY:  TODO: downscale 2 in 1990, remove
     # 'ERI' #1994 1995-1999 (partial coverage, KYOTOGHG and total are incomplete), 2000
     'ETH': {
         'DI2023-05-24': { # 1990-1993 (downscaling needed), 1994-2013
@@ -1004,13 +1111,13 @@ di_processing_templates = {
                     'bunkers': {
                         'basket': '14637',
                         'basket_contents': ['14424'],
-                        'entities': ['CO2', 'KYOTOGHG (SARGWP100)'],
+                        'entities': ['CO2', f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                 },
                 'entities': {
                     'bunkers': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CH4', 'CO2', 'N2O'],
                         'sel': {'category (BURDI)': ['14637', '14424']}
                     },
@@ -1164,10 +1271,15 @@ di_processing_templates = {
             'remove_ts': {
                 'waste': { # very high in 1994
                     'category': ['6', '6.A', '6.B', '6.D'],
-                    'entities': ['CH4', 'N2O', 'KYOTOGHG (SARGWP100)'],
+                    'entities': ['CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
                         'time': ['1994'],
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # LKA: 1994, 2000. a bit inconsisten in subsectrs (all emissions in "other in
@@ -1217,20 +1329,20 @@ di_processing_templates = {
                     '1.B': {
                         'basket': '1.B',
                         'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                     '5': {
                         'basket': '5',
                         'basket_contents': ['5.A', '5.B'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'tolerance' : 0.018, # LULUCF data inconstent in 2012
                     },
                 },
                 'entities': {
                     'all': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CH4', 'CO2', 'N2O'],
                         'sel': {'category (BURDI)': [
                             '1', '2', '4', '5', '6', '15163', '24540',
@@ -1272,7 +1384,7 @@ di_processing_templates = {
                     'kyotoghg_4': { # in general similar problem to 1.A, but most sectors have
                         # only one gas and we need the data for PRIMAP-hist,
                         # so we have to do it anyway
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CH4', 'N2O'],
                         'sel': {
                             'category (BURDI)': [
@@ -1288,7 +1400,28 @@ di_processing_templates = {
     # MDV: 1994 (only few sectors), 2011-2015
     # MEX: more data in BURs 2 and 3
     # MHL: 2000, 2005, 2010
-    # MKD: 1990-2009
+    # MKD:
+    'MKD': {
+        'DI2023-05-24': {  # 1990-2009
+            'downscale': {
+                'entities': {
+                    'FGASES': {
+                        'basket': f'FGASES ({gwp_to_use})',
+                        'basket_contents': [f'HFCS ({gwp_to_use})'],
+                    },
+                    'HFC': {
+                        'basket': f'HFCS ({gwp_to_use})',
+                        'basket_contents': [f'UnspMixOfHFCs ({gwp_to_use})'],
+                    },
+                },
+            },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
+        },
+    },
     'MLI': {
         'DI2023-05-24': {  # 1995,2000, 2005
             'downscale': {
@@ -1301,6 +1434,21 @@ di_processing_templates = {
                         'sel': {'time': ['1995', '2000']},
                     },
                 },
+                'entities': {
+                    'FGASES': {
+                        'basket': f'FGASES ({gwp_to_use})',
+                        'basket_contents': [f'HFCS ({gwp_to_use})'],
+                    },
+                    'HFC': {
+                        'basket': f'HFCS ({gwp_to_use})',
+                        'basket_contents': [f'UnspMixOfHFCs ({gwp_to_use})'],
+                    },
+                },
+            },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
             },
         },
     },
@@ -1317,7 +1465,7 @@ di_processing_templates = {
                 },
                 'entities': {
                     'kyotoghg_5': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CO2', 'CH4', 'N2O'],
                         'sel': {
                             'category (BURDI)': [
@@ -1326,6 +1474,11 @@ di_processing_templates = {
                     },
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # MNE: more data in BUR3
@@ -1336,12 +1489,17 @@ di_processing_templates = {
     'MUS': {
         'DI2023-05-24': { #1995, 200-2006, 2013
             'remove_ts': {
-                'waste': { # 1994 inconsistent
+                'waste': { # 1995 inconsistent
                     'category': ['6', '6.A', '6.B', '6.C', '6.D'],
-                    'entities': ['CO2', 'CH4', 'N2O', 'KYOTOGHG (SARGWP100)'],
-                        'time': ['1994'],
+                    'entities': ['CO2', 'CH4', 'N2O', f'KYOTOGHG ({gwp_to_use})'],
+                        'time': ['1995'],
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # MWI: 1990, 1994. inconsistency in 1.B.1: 1994: CO2, 1990: CH4
@@ -1365,13 +1523,13 @@ di_processing_templates = {
                     '6': {
                         'basket': '6',
                         'basket_contents': ['6.A', '6.B'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                     },
                 },
                 'entities': {
                     'kyotoghg_56': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CH4', 'N2O'],
                         'sel': {
                             'category (BURDI)': ['6', '6.A', '6.B'],
@@ -1410,7 +1568,7 @@ di_processing_templates = {
             'downscale': {
                 'entities': {
                     'kyotoghg': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CO2', 'CH4', 'N2O'],
                         'sel': {
                             'category (BURDI)': [
@@ -1436,7 +1594,7 @@ di_processing_templates = {
             'downscale': {
                 'entities': {
                     'kyotoghg': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CO2', 'CH4', 'N2O'],
                         'sel': {
                             'category (BURDI)': [
@@ -1489,7 +1647,7 @@ di_processing_templates = {
                     '1': {
                         'basket': '1',
                         'basket_contents': ['1.A', '1.B'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2008', '2009',
@@ -1498,7 +1656,7 @@ di_processing_templates = {
                     '1.A': {
                         'basket': '1.A',
                         'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2008', '2009',
@@ -1507,7 +1665,7 @@ di_processing_templates = {
                     '1.B': {
                         'basket': '1.B',
                         'basket_contents': ['1.B.1', '1.B.2'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2008', '2009',
@@ -1516,7 +1674,7 @@ di_processing_templates = {
                     '2': {
                         'basket': '2',
                         'basket_contents': ['2.A', '2.B', '2.C', '2.D'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2008', '2009',
@@ -1526,7 +1684,7 @@ di_processing_templates = {
                         'basket': '4',
                         'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E',
                                             '4.F'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2008', '2009',
@@ -1535,7 +1693,7 @@ di_processing_templates = {
                     '5': {
                         'basket': '5',
                         'basket_contents': ['5.A', '5.B', '5.C'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2008', '2009',
@@ -1544,7 +1702,7 @@ di_processing_templates = {
                     '6': {
                         'basket': '6',
                         'basket_contents': ['6.A', '6.B', '6.C'],
-                        'entities': ['KYOTOGHG (SARGWP100)'],
+                        'entities': [f'KYOTOGHG ({gwp_to_use})'],
                         'dim': 'category (BURDI)',
                         'sel': {'time': ['2000', '2001', '2002', '2003', '2004',
                                          '2005', '2006', '2007', '2008', '2009',
@@ -1553,7 +1711,7 @@ di_processing_templates = {
                 },
                 'entities': {
                     'KYOTO': {
-                        'basket': 'KYOTOGHG (SARGWP100)',
+                        'basket': f'KYOTOGHG ({gwp_to_use})',
                         'basket_contents': ['CH4', 'CO2', 'N2O'],
                         'sel': {
                             'category (BURDI)': [
@@ -1584,6 +1742,7 @@ di_processing_templates = {
     # TZA: 1990, 1994
     # UGA: 1994, 2000, subcategories a bit inconsistent
     'URY': {
+        # remove data: CH4, 1998, 2002, 1
         'DI2023-05-24': {
             'downscale': {
                 'sectors': {
@@ -1640,6 +1799,11 @@ di_processing_templates = {
                     },
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfPFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # UZB: 1990-2012
@@ -1670,6 +1834,11 @@ di_processing_templates = {
                     },
                 },
             },
+            'basket_copy': {
+                'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+                'entities': ["UnspMixOfHFCs"],
+                'source_GWP': gwp_to_use,
+            },
         },
     },
     # ZWE: 1994, 2000, 2006 consistency of sectors and coverage does not look good,
@@ -1690,6 +1859,10 @@ di_processing_info = {
         'default': di_processing_templates['ARE']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['ARE']['DI2023-05-24'],
     },
+    'ARG': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'AZE': {
         'default': di_processing_templates['AZE']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['AZE']['DI2023-05-24'],
@@ -1706,10 +1879,22 @@ di_processing_info = {
         'default': di_processing_templates['BIH']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['BIH']['DI2023-05-24'],
     },
+    'BOL': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'BRB': {
         'default': di_processing_templates['BRB']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['BRB']['DI2023-05-24'],
     },
+    'BRN': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
+    'CHL': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
     'CHN': {
         'default': di_processing_templates['CHN']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['CHN']['DI2023-05-24'],
@@ -1742,6 +1927,10 @@ di_processing_info = {
         'default': di_processing_templates['GEO']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['GEO']['DI2023-05-24'],
     },
+    'GMB': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
     'GNB': {
         'default': di_processing_templates['GNB']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['GNB']['DI2023-05-24'],
@@ -1754,14 +1943,38 @@ di_processing_info = {
         'default': di_processing_templates['IND']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['IND']['DI2023-05-24'],
     },
+    'ISR': {
+        'default': di_processing_templates['general']['copyHFCPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyHFCPFC'],
+    },
+    'JAM': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
+    'JOR': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
     'KEN': {
         'default': di_processing_templates['KEN']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['KEN']['DI2023-05-24'],
     },
+    'KGZ': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
+    'KOR': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'LCA': {
         'default': di_processing_templates['LCA']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['LCA']['DI2023-05-24'],
     },
+    'LKA': {
+        'default': di_processing_templates['general']['copyFGASES'],
+        'DI2023-05-24': di_processing_templates['general']['copyFGASES'],
+    },
     'LSO': {
         'default': di_processing_templates['LSO']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['LSO']['DI2023-05-24'],
@@ -1770,10 +1983,26 @@ di_processing_info = {
         'default': di_processing_templates['MAR']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['MAR']['DI2023-05-24'],
     },
+    'MDA': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'MDG': {
         'default': di_processing_templates['MDG']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['MDG']['DI2023-05-24'],
     },
+    'MDV': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
+    'MEX': {
+        'default': di_processing_templates['general']['copyHFCPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyHFCPFC'],
+    },
+    'MHL': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'MLI': {
         'default': di_processing_templates['MLI']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['MLI']['DI2023-05-24'],
@@ -1782,6 +2011,18 @@ di_processing_info = {
         'default': di_processing_templates['MMR']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['MMR']['DI2023-05-24'],
     },
+    'MNE': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
+    'MNG': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
+    'MOZ': {
+        'default': di_processing_templates['general']['copyPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyPFC'],
+    },
     'MUS': {
         'default': di_processing_templates['MUS']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['MUS']['DI2023-05-24'],
@@ -1790,18 +2031,42 @@ di_processing_info = {
         'default': di_processing_templates['PHL']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['PHL']['DI2023-05-24'],
     },
+    'PRY': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
+    'PSE': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'RWA': {
         'default': di_processing_templates['RWA']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['RWA']['DI2023-05-24'],
     },
+    'SEN': {
+        'default': di_processing_templates['general']['copyHFCPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyHFCPFC'],
+    },
+    'SGP': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'SLB': {
         'default': di_processing_templates['SLB']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['SLB']['DI2023-05-24'],
     },
+    'SMR': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'STP': {
         'default': di_processing_templates['STP']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['STP']['DI2023-05-24'],
     },
+    'SWZ': {
+        'default': di_processing_templates['general']['copyUnspHFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFC'],
+    },
     'TCD': {
         'default': di_processing_templates['TCD']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['TCD']['DI2023-05-24'],
@@ -1814,45 +2079,26 @@ di_processing_info = {
         'default': di_processing_templates['URY']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['URY']['DI2023-05-24'],
     },
+    'UZB': {
+        'default': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+        'DI2023-05-24': di_processing_templates['general']['copyUnspHFCUnspPFC'],
+    },
     'ZMB': {
         'default': di_processing_templates['ZMB']['DI2023-05-24'],
         'DI2023-05-24': di_processing_templates['ZMB']['DI2023-05-24'],
     },
 }
 
-gas_baskets = {
-    'HFCS (SARGWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
-                     'HFC134a', 'HFC143',  'HFC143a', 'HFC152a', 'HFC227ea',
-                     'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc',  'HFC404a',
-                     'HFC407c', 'HFC410a', 'HFC4310mee', #'OTHERHFCS (SARGWP100)',
-                         'Unspecified mix of HFCs (SARGWP100)'],
-    'HFCS (AR4GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
-                     'HFC134a', 'HFC143',  'HFC143a', 'HFC152a', 'HFC227ea',
-                     'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc',  'HFC404a',
-                     'HFC407c', 'HFC410a', 'HFC4310mee', 'Unspecified mix of HFCs (AR4GWP100)'],
-    'HFCS (AR5GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
-                      'HFC134a', 'HFC143',  'HFC143a', 'HFC152a', 'HFC227ea',
-                      'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc',  'HFC404a',
-                      'HFC407c', 'HFC410a', 'HFC4310mee',
-                         'Unspecified mix of HFCs (AR5GWP100)'],
-    'PFCS (SARGWP100)': ['C3F8', 'C4F10', 'CF4', 'C2F6', 'C6F14', 'C5F12', 'cC4F8',
-                      'Unspecified mix of PFCs (SARGWP100)'],
-    'PFCS (AR4GWP100)': ['C3F8', 'C4F10', 'CF4', 'C2F6', 'C6F14', 'C5F12', 'cC4F8',
-                      'Unspecified mix of PFCs (AR4GWP100)'],
-    'PFCS (AR5GWP100)': ['C3F8', 'C4F10', 'CF4', 'C2F6', 'C6F14', 'C5F12', 'cC4F8',
-                      'Unspecified mix of PFCs (AR5GWP100)'],
-    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
-    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
-    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
-    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (SARGWP100)',
-                          'PFCS (SARGWP100)',
-                          'Unspecified mix of HFCs (SARGWP100)',
-                          'Unspecified mix of PFCs (SARGWP100)'],
-    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR4GWP100)',
-                          'PFCS (AR4GWP100)',
-                             'Unspecified mix of HFCs (AR4GWP100)', 'Unspecified mix of PFCs (AR4GWP100)'],
-    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR5GWP100)',
-                            'PFCS (AR5GWP100)',
-                             'Unspecified mix of HFCs (AR5GWP100)',
-                             'Unspecified mix of PFCs (AR5GWP100)'],
-}
+basket_copy_HFCPFC = {
+    'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+    'entities': ["HFCS", "PFCS"],
+    'source_GWP': gwp_to_use,
+},
+basket_copy_unspHFCPFC = {
+    'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+    'entities': ["UnspMixOfHFCs", "UnspMixOfPFCs"],
+    'source_GWP': gwp_to_use,
+},
+
+
+

+ 91 - 7
UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_core.py

@@ -30,6 +30,7 @@ def read_UNFCCC_DI_for_country(
         pm2if_specifications: Optional[dict]=None,
         default_gwp: Optional[str]=None,
         debug: Optional[bool]=False,
+        use_zenodo: Optional[bool]=True,
 ):
     """
     reads data for a country from the UNFCCC DI interface and saves to native and
@@ -37,12 +38,20 @@ def read_UNFCCC_DI_for_country(
     """
 
     # read the data
-    data_df = read_UNFCCC_DI_for_country_df(
-        country_code=country_code,
-        category_groups=category_groups,
-        read_subsectors=read_subsectors,
-        debug=debug,
-    )
+    if use_zenodo:
+        data_df = read_UNFCCC_DI_for_country_df_zenodo(
+            country_code=country_code,
+            category_groups=category_groups,
+            read_subsectors=read_subsectors,
+            debug=debug,
+        )
+    else:
+        data_df = read_UNFCCC_DI_for_country_df(
+            country_code=country_code,
+            category_groups=category_groups,
+            read_subsectors=read_subsectors,
+            debug=debug,
+        )
 
     # set date_str if not given
     if date_str is None:
@@ -243,6 +252,79 @@ def read_UNFCCC_DI_for_country_df(
     return di_data
 
 
+def read_UNFCCC_DI_for_country_df_zenodo(
+        country_code: str,
+        category_groups: Optional[Dict]=None,
+        read_subsectors: bool=False,
+        debug: Optional[bool]=False,
+)->pd.DataFrame:
+    """
+    read UNFCCC DI data for a given country. All data will be read
+    including all categories, gases, measures, and classifications
+    Filtering is done later on conversion to PRIMAP2 format
+
+    Parameters
+    ----------
+    country_code: str
+        ISO3 code of the country (country names don't work, use the wrapper function)
+
+    category_groups: dict (optional)
+        define which categories to read including filters on classification, measure,
+        gases
+
+        cat_groups = {
+            "4.A  Enteric Fermentation": { #4.A  Enteric Fermentation[14577]
+                "measure": [
+                    'Net emissions/removals',
+                    'Total population',
+                ],
+                "gases": ["CH4"],
+            },
+        }
+
+    Returns
+    -------
+    pd.DataFrame with read data
+
+    """
+    if read_subsectors:
+        raise ValueError("Subsector reading is not possible with the Zenodo reader "
+                         "yet")
+
+    reader = unfccc_di_api.ZenodoReader()
+
+    di_data = reader.query(party_code=country_code)
+
+    # remove the "no_gas" data
+    di_data = di_data[di_data["gas"] != "No gas"]
+
+    if category_groups is not None:
+        di_data = di_data[di_data["category"].isin(category_groups)]
+
+    # if data has been collected print some information and save the data
+    if di_data is None or len(di_data) == 0:
+        raise ValueError(f"No data collected for country {country_code} and category "
+                         f"groups "
+                         f"{category_groups}")
+    elif debug:
+        # print some information on collected data
+        print(f"Collected data for country {country_code}")
+        print("### Categories ###")
+        categories = di_data["category"].unique()
+        categories.sort()
+        print(categories)
+        print("### Classifications ###")
+        classifications = di_data["classification"].unique()
+        classifications.sort()
+        print(classifications)
+        print("### Measures ###")
+        measures = di_data["measure"].unique()
+        measures.sort()
+        print(measures)
+
+    return di_data
+
+
 def convert_DI_data_to_pm2_if(
         data: pd.DataFrame,
         pm2if_specifications: Optional[dict]=None,
@@ -266,7 +348,6 @@ def convert_DI_data_to_pm2_if(
     data_temp = data.copy(deep=True)
 
     # check which country group we have
-    reader = unfccc_di_api.UNFCCCApiReader()
     parties_present_ai = [party for party in data_temp["party"].unique() if party
                           in AI_countries]
     parties_present_nai = [party for party in data_temp["party"].unique() if party
@@ -439,6 +520,9 @@ def read_UNFCCC_DI_for_country_group(
         except unfccc_di_api.NoDataError as err:
             print(f"No data for {country}.")
             print(err)
+        except ValueError as err:
+            print(f"ValueError for {country}.")
+            print(err)
 
     if annexI:
         data_all = pm2.pm2io.from_interchange_format(data_all_if, attrs=attrs,

+ 9 - 2
UNFCCC_GHG_data/UNFCCC_DI_reader/UNFCCC_DI_reader_proc.py

@@ -8,11 +8,10 @@ from typing import Optional, Dict, List, Union
 
 from .UNFCCC_DI_reader_config import di_processing_info
 from .UNFCCC_DI_reader_config import cat_conversion
-from .UNFCCC_DI_reader_config import gas_baskets
 from .util import NoDIDataError, nAI_countries
 from .util import DI_date_format
 
-from UNFCCC_GHG_data.helper import process_data_for_country
+from UNFCCC_GHG_data.helper import process_data_for_country, gas_baskets
 from .UNFCCC_DI_reader_helper import find_latest_DI_data
 from .UNFCCC_DI_reader_helper import determine_filename
 
@@ -125,6 +124,14 @@ def process_UNFCCC_DI_for_country(
     else:
         processing_info_country_scen = None
 
+    # fill net emissions from actual emissions where necessary (e.g. 24540 for
+    # individual fgases)
+    if "Actual emissions" in data_country.coords["measure"].values:
+        data_country = data_country.pr.set("measure", "Net emissions/removals",
+                                           data_country.pr.loc[
+                                               {"measure": "Actual emissions"}],
+                                           existing='fillna')
+
     # 3: map categories
     if country_code in nAI_countries:
         # conversion from BURDI to IPCC2006_PRIMAP needed

+ 1 - 1
UNFCCC_GHG_data/UNFCCC_DI_reader/process_UNFCCC_DI_for_country_group_datalad.py

@@ -11,7 +11,7 @@ import argparse
 parser = argparse.ArgumentParser()
 parser.add_argument('--annexI', help='read for AnnexI countries (default is for '
                                      'non-AnnexI)', action='store_true')
-parser.add_argument('--date', help='date of inout data to use (default is None '
+parser.add_argument('--date', help='date of input data to use (default is None '
                                        'to read latest data)', default=None)
 args = parser.parse_args()
 annexI = args.annexI

+ 0 - 3
UNFCCC_GHG_data/UNFCCC_reader/Argentina/read_ARG_BUR4_from_pdf.py

@@ -1,9 +1,6 @@
 # this script reads data from Chile's 2020 national inventory which is underlying BUR4
 # Data is read from the xlsx file
 
-import os
-os.environ["UNFCCC_GHG_ROOT_PATH"] = \
-    "/storage/data/data/PRIMAP/primap_2.0/datasets/UNFCCC_non-AnnexI_data/"
 import sys
 import camelot
 import primap2 as pm2

+ 12 - 8
UNFCCC_GHG_data/UNFCCC_reader/Chile/read_CHL_BUR5_from_xlsx.py

@@ -23,8 +23,9 @@ if not output_folder.exists():
 
 output_filename = 'CHL_BUR5_2022_'
 
-inventory_file = 'Inventario_Nacional_de_GEI-1990-2018.xlsx'
-years_to_read = range(1990, 2018 + 1)
+inventory_file = '2022_GEI_CL.xlsx'
+years_to_read = range(1990, 2020 + 1)
+time_format='%Y'
 
 # configuration for conversion to PRIMAP2 data format
 unit_row = "header"
@@ -81,7 +82,7 @@ coords_defaults = {
     "source": "CHL-GHG-Inventory",
     "provenance": "measured",
     "area": "CHL",
-    "scenario": "BUR4"
+    "scenario": "BUR5"
 }
 
 coords_value_mapping = {
@@ -130,17 +131,17 @@ filter_remove = {
         "entity": ["Absorciones CO₂", "Emisiones CO₂"],
     },
     "f2": {
-        "orig_cat_name": ["Partidas informativas"],
+        "orig_cat_name": ["Partidas informativas", "Todas las emisiones nacionales"],
     },
 }
 
 filter_keep = {}
 
 meta_data = {
-    "references": "https://unfccc.int/documents/267936, https://snichile.mma.gob.cl/wp-content/uploads/2021/03/Inventario_Nacional_de_GEI-1990-2018.xlsx",
+    "references": "https://unfccc.int/documents/624735, https://snichile.mma.gob.cl/wp-content/uploads/2023/04/2022_GEI_CL.xlsx",
     "rights": "",
     "contact": "mail@johannes-guetschow.de.de",
-    "title": "Chile: BUR4",
+    "title": "Chile: BUR5",
     "comment": "Read fom xlsx file by Johannes Gütschow",
     "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
 }
@@ -200,7 +201,8 @@ data_if = pm2.pm2io.convert_long_dataframe_if(
     coords_value_filling=coords_value_filling,
     filter_remove=filter_remove,
     filter_keep=filter_keep,
-    meta_data=meta_data
+    meta_data=meta_data,
+    time_format=time_format,
 )
 
 
@@ -232,7 +234,8 @@ data_if_2006 = pm2.pm2io.convert_long_dataframe_if(
     coords_value_filling=coords_value_filling,
     filter_remove=filter_remove,
     filter_keep=filter_keep,
-    meta_data=meta_data
+    meta_data=meta_data,
+    time_format=time_format
 )
 
 cat_label = 'category (' + coords_terminologies_2006["category"] + ')'
@@ -260,6 +263,7 @@ for cat_to_agg in aggregate_cats:
 
         df_combine = df_combine.groupby(
             by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity', 'unit']).sum()
+        df_combine = df_combine.drop(columns=["category (IPCC2006_PRIMAP)", "orig_cat_name"])
 
         df_combine.insert(0, cat_label, cat_to_agg)
         df_combine.insert(1, "orig_cat_name", aggregate_cats[cat_to_agg]["name"])

+ 2 - 2
UNFCCC_GHG_data/UNFCCC_reader/Indonesia/read_IDN_BUR3_from_pdf.py

@@ -44,7 +44,7 @@ cat_codes_manual = {
     #'3A2b Direct N2O Emissions from Manure Management': '3.A.2',
 }
 
-cat_code_regexp = r'(?P<UNFCCC_GHG_data>^[a-zA-Z0-9]{1,4})\s.*'
+cat_code_regexp = r'(?P<code>^[a-zA-Z0-9]{1,4})\s.*'
 
 coords_cols = {
     "category": "category",
@@ -195,7 +195,7 @@ df_all["category"] = df_all["orig_cat_name"]
 # first the manual replacements
 df_all["category"] = df_all["category"].replace(cat_codes_manual)
 # then the regex replacements
-repl = lambda m: m.group('UNFCCC_GHG_data')
+repl = lambda m: m.group('code')
 df_all["category"] = df_all["category"].str.replace(cat_code_regexp, repl, regex=True)
 df_all = df_all.reset_index(drop=True)
 

+ 430 - 0
UNFCCC_GHG_data/UNFCCC_reader/Israel/config_ISR_BUR2.py

@@ -0,0 +1,430 @@
+#### configuration for trend tables
+import locale
+gwp_to_use = 'SARGWP100'
+terminology_proc = 'IPCC2006_PRIMAP'
+# bunkers [0,1] need different specs
+trend_table_def = {
+    # only GHG read, rest dropped
+    'GHG': {
+        'tables': [2],
+        'cols_add': {
+            'unit': 'ktCO2eq',
+            'category': '0',
+        },
+        'given_col': 'entity',
+        'take_only': ['Total GHG'],
+    },
+    'CO2': {
+        'tables': [3],
+        'cols_add': {
+            'unit': 'kt',
+            'entity': 'CO2',
+        },
+        'given_col': 'category',
+    },
+    'CH4': {
+        'tables': [5],
+        'cols_add': {
+            'unit': 'kt',
+            'entity': 'CH4',
+        },
+        'given_col': 'category',
+        'take_only': [
+            'Total emissions', 'From fuel combustion',
+            'From Industrial processes', 'From Agriculture'
+        ], # ignore the waste time series as they don't cover the full sector
+        # and lead to problems becaus eof the methodology chnage in the inventory
+    },
+    'N2O': {
+        'tables': [6],
+        'cols_add': {
+            'unit': 'kt',
+            'entity': 'N2O',
+        },
+        'given_col': 'category',
+    },
+    'FGases': {
+        'tables': [7],
+        'cols_add': {
+            'unit': 'ktCO2eq',
+            'category': '0',
+        },
+        'given_col': 'entity',
+    },
+}
+
+#### configuration for inventory tables
+inv_tab_conf = {
+    'unit_row': 0,
+    'entity_row': 0,
+    'regex_unit': r"\((.*)\)",
+    'regex_entity': r"^(.*)\s\(",
+    'index_cols': 'category',
+    'cat_pos': (0, 0),
+    'header_long': ["category", "entity", "unit", "time", "data"],
+    'header_2010': ["2010", "CO2 emissions (Gg)", "CO2 removals (Gg)",
+                  "CH4 (Gg)", "N2O (Gg)", "CO (Gg)", "NOx (Gg)",
+                  "NMVOCs (Gg)", "SOx (Gg)", "SF6 (CO2eq Gg)",
+                  "HFCs (CO2eq Gg)", "PFCs (CO2eq Gg)"],
+    'unit_repl': {
+        "SF6 (CO2e Gg)": "GgCO2eq",
+        "HFCs (CO2eGg)": "GgCO2eq",
+        "PFCs (CO2e Gg)": "GgCO2eq",
+        "SF6 (CO2eq Gg)": "GgCO2eq",
+        "HFCs (CO2eq Gg)": "GgCO2eq",
+        "PFCs (CO2eq Gg)": "GgCO2eq",
+    },
+}
+
+inv_table_def = {
+    '1996': {'tables': [1, 2]},
+    '2000': {'tables': [3, 4]},
+    '2005': {'tables': [5, 6]},
+    '2010': {'tables': [7, 8]},
+    '2015': {'tables': [9, 10, 11]},
+    '2019': {'tables': [12, 13, 14]},
+    '2020': {'tables': [15, 16]},
+}
+
+#### configuration for PM2 format
+coords_cols = {
+    "category": "category",
+    "entity": "entity",
+    "unit": "unit",
+}
+
+coords_terminologies = {
+    "area": "ISO3",
+    "category": "BURDI_ISRBUR2",
+    "scenario": "PRIMAP",
+}
+
+coords_defaults = {
+    "source": "ISR-GHG-Inventory",
+    "provenance": "measured",
+    "area": "ISR",
+    "scenario": "BUR2",
+}
+
+coords_value_mapping = {
+    "unit": "PRIMAP1",
+    "category": {
+        'Total national emissions and removals': '24540',
+        '0': '24540', # no mapping, just for completeness
+        'Total emissions and removals': '24540',
+        'Total emissions': '24540',
+        '1. Energy': '1',
+        'A. Fuel combustion (sectoral approach)': '1.A',
+        'A. From fuel combustion': '1.A',
+        'From fuel combustion': '1.A',
+        '1. Energy industries': '1.A.1',
+        '2. Manufacturing industries and construction': '1.A.2',
+        '2. Manufacturing, industries and construction': '1.A.2',
+        '3. Transport': '1.A.3',
+        '4. Other sectors': '1.A.4',
+        '4. Other': '1.A.4',
+        'Commercial, institutional residential sectors': '1.A.4.ab', # not BURDI
+        'Commercial, institutional': '1.A.4.a', #not BURDI
+        'residential sectors': '1.A.4.b', #not BURDI
+        'Agriculture, forestry and fishing': '1.A.4.c', # not BURDI
+        '5. Other (please specify)': '1.A.5',
+        'B. Fugitive emissions from fuels': '1.B',
+        '1. Solid fuels': '1.B.1',
+        '2. Oil and natural gas': '1.B.2',
+        '2. Industrial processes': '2',
+        'B. industrial processes': '2',
+        'From Industrial processes': '2',
+        'A. Mineral products': '2.A',
+        'CEMENT PRODUCTION': '2.A.1',
+        'PRODUCTION OF LIME': '2.A.2',
+        'SODA ASH USE': '2.A.4.b',
+        'ROAD PAVING WITH ASPHALT': '2.A.6',
+        'Container Glass': '2.A.7.a',
+        'B. Chemical industry': '2.B',
+        'NITRIC ACID PRODUCTION': '2.B.2',
+        'Ethylene': '2.B.5.b',
+        'PRODUCTION OF OTHER CHEMICALS': '2.B.5.g', #not BURDI
+        'Sulphuric Acid': '2.B.5.f', #not BURDI
+        'C. Metal production': '2.C',
+        'D. Other production': '2.D',
+        'E. Production of halocarbons and sulphur hexafluoride': '2.E',
+        'F. Consumption of halocarbons and sulphur hexafluoride': '2.F',
+        'G. Other (IPPU)': '2.G',
+        '3. Solvent and other product use': '3',
+        '4. Agriculture': '4',
+        'From Agriculture': '4',
+        'From agriculture': '4',
+        'A. Enteric fermentation': '4.A',
+        'B. Manure management': '4.B',
+        'C. Rice cultivation': '4.C',
+        'D. Agricultural soils': '4.D',
+        'E. Prescribed burning of savannahs': '4.E',
+        'F. Field burning of agricultural residues': '4.F',
+        'G. Other (Agri)': '4.G',
+        '5. Land-use change and forestry': '5',
+        'C. Land-use change and forestry': '5',
+        'A. Changes in forest and other woody biomass stocks': '5.A',
+        '2. Changes in forest and other woody biomass stocks': '5.A',
+        'B. Forest and grassland conversion': '5.B',
+        'C. Abandonment of managed lands': '5.C',
+        'D. CO2 emissions and removals from soil': '5.D',
+        '1. CO2 emissions and removals from soil': '5.D',
+        'E. Other (LULUCF)': '5.E',
+        # waste in 2006 categories, not BURDI as we will lose info of we map to BURDI and back
+        '6. Waste': '6',
+        'A. Solid waste disposal on land': '6.A',
+        'From solid waste disposal on land': '6.A',
+        'B. Waste-water handling': '6X.B', # combine with 6.D
+        'From waste-water treatment': '6X.B', # not BURDI
+        'C. Waste incineration': '6.C',
+        'D. Other (please specify)': '6X.D', # combine with 6.E
+        'B. Biological Treatment of Solid Waste': '6.B', # not BURDI
+        'D.Waste-water handling': '6.D', # not BURDI
+        'D. Waste-water handling': '6.D', # not BURDI
+        'E. Other (Waste)': '6.E', # not BURDI
+        '7. Other (please specify)': '7',
+        'International bunkers': '14637',
+        'Aviation': '14424',
+        'Marine': '14423',
+        'CO2 emissions from biomass': '14638',
+    },
+    "entity": {
+        'Total GHG': f'KYOTOGHG ({gwp_to_use})',
+        'Carbon Dioxide (CO2)': 'CO2',
+        'CO2': 'CO2', # no mapping, just added for completeness here
+        'CO2 emissions': 'CO2 emissions', # no mapping, just added for completeness here
+        'CO2 removals': 'CO2 removals', # no mapping, just added for completeness here
+        'CO2 Emissions': 'CO2 emissions',
+        'CO2 Removals': 'CO2 removals',
+        'Methane (CH4)': 'CH4',
+        'CH4': 'CH4', # no mapping, just added for completeness here
+        'Nitrous Oxides (N2O)': 'N2O',
+        'NO2': 'NO2', # no mapping, just added for completeness here
+        'Sulfur hexafluoride (SF6)': f'SF6 ({gwp_to_use})',
+        'SF6': f'SF6 ({gwp_to_use})',
+        "Hydrofluorocarbons (HFC'S)": f'HFCS ({gwp_to_use})',
+        "HFCs": f'HFCS ({gwp_to_use})',
+        "Perfluorocarbons (PFC'S)": f'PFCS ({gwp_to_use})',
+        "PFCs": f'PFCS ({gwp_to_use})',
+        'NOx': 'NOX',
+        'Nox': 'NOX',
+        'Co': 'CO',
+        'CO': 'CO', # no mapping, just added for completeness here
+        'NMVOCs': 'NMVOC',
+        'SOx': 'SOX', # no mapping, just added for completeness here
+    },
+}
+
+filter_remove = {
+    'rem_cat': {'category': ['Memo items', 'G. Other (please specify)']},
+    #'rem_ent': {'entity': ['GHG per capita', 'GHG per GDP (2015 prices)']},
+}
+
+filter_keep = {}
+
+meta_data = {
+    "references": "https://unfccc.int/documents/627150",
+    "rights": "",
+    "contact": "mail@johannes-guestchow.de",
+    "title": "Israel. Biennial update report (BUR). BUR2",
+    "comment": "Read fom pdf by Johannes Gütschow",
+    "institution": "UNFCCC",
+}
+
+#### for processing
+# aggregate categories
+cats_to_agg = {
+    '1': {'sources': ['1.A'], 'name': 'Energy'}, # for trends
+    '1.A.4': {'sources': ['1.A.4.a', '1.A.4.b', '1.A.4.c', '1.A.4.ab'],
+              'name': 'Other sectors'},
+    '2.A.4': {'sources': ['2.A.4.b'], 'name': 'Soda Ash'},
+    '2.A.7': {'sources': ['2.A.7.a'], 'name': 'Other'},
+    '2.A': {'sources': ['2.A.1', '2.A.2', '2.A.4', '2.A.6', '2.A.7'], 'name': 'Mineral Products'},
+    '2.B.5': {'sources': ['2.B.5.f', '2.B.5.g'], 'name': 'Other'},
+    '2.B': {'sources': ['2.B.2', '2.B.5'], 'name': 'Chemical Industry'},
+    '6.D': {'sources': ['6.D', '6X.B'], 'name': 'Wastewater Treatment and Discharge'},
+    #'6.E': {'sources': ['6.E', '6X.D'], 'Other'}, # currently empty
+}
+
+# downscale
+# 1.A.4.ab
+downscaling = {
+    'sectors': {
+        '24540': {
+            'basket': '24540',
+            'basket_contents': ['2'],
+            'entities': ['SF6', 'HFCS (SARGWP100)', 'PFCS (SARGWP100)'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+        '1.A': {
+            'basket': '1.A',
+            'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
+            'entities': ['CO2', 'CH4', 'N2O'],
+            'dim': f"category ({coords_terminologies['category']})",
+            'tolerance': 0.05, # some inconsistencies (rounding?)
+        },
+        '1.A.4.ab': {
+            'basket': '1.A.4.ab',
+            'basket_contents': ['1.A.4.a', '1.A.4.b'],
+            'entities': ['CO2', 'CH4', 'N2O', 'SOX', 'NOX', 'CO'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+        '1.A.4': {
+            'basket': '1.A.4',
+            'basket_contents': ['1.A.4.a', '1.A.4.b', '1.A.4.c'],
+            'entities': ['CO2', 'CH4', 'N2O'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+        '2': {
+            'basket': '2',
+            'basket_contents': ['2.A', '2.B', '2.F'],
+            'entities': ['CO2', 'CH4', 'N2O', 'SF6', 'PFCS (SARGWP100)', 'HFCS (SARGWP100)'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+        '2.A': {
+            'basket': '2.A',
+            'basket_contents': ['2.A.1', '2.A.2', '2.A.4', '2.A.7'],
+            'entities': ['CO2', 'CH4', 'N2O'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+        '2.B': {
+            'basket': '2.B',
+            'basket_contents': ['2.B.2', '2.B.5'],
+            'entities': ['CO2', 'CH4', 'N2O'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+        '4': {
+            'basket': '4',
+            'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E', '4.F', '4.G'],
+            'entities': ['CH4', 'N2O'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+        '5': {
+            'basket': '5',
+            'basket_contents': ['5.A', '5.D'], # the other sectors are 0
+            'entities': ['CO2'],
+            'dim': f"category ({coords_terminologies['category']})",
+        },
+    },
+}
+
+# map to IPCC2006
+cat_conversion = {
+    # ANNEXI to come (low priority as we read from CRF files)
+    'mapping': {
+        '1': '1',
+        '1.A': '1.A',
+        '1.A.1': '1.A.1',
+        '1.A.2': '1.A.2',
+        '1.A.3': '1.A.3',
+        '1.A.4': '1.A.4',
+        '1.A.4.a': '1.A.4.a',
+        '1.A.4.b': '1.A.4.b',
+        '1.A.4.c': '1.A.4.c',
+        '1.A.5': '1.A.5', # currently not needed
+        '1.B': '1.B', # currently not needed
+        '1.B.1': '1.B.1', # currently not needed
+        '1.B.2': '1.B.2', # currently not needed
+        '2': '2',
+        '2.A': '2.A',
+        '2.A.1': '2.A.1', # cement
+        '2.A.2': '2.A.2', # lime
+        '2.A.4': '2.A.4.b', # soda ash
+        '2.A.6': '2.A.5', # road paving with asphalt -> other
+        '2.A.7.a': '2.A.3', # glass
+        '2.B': 'M.2.B_2.B',
+        '2.B.2': '2.B.2', # nitric acid
+        '2.B.5.b': '2.B.8.b', # Ethylene
+        '2.B.5.f': 'M.2.B.10.a', # sulphuric acid
+        '2.B.5.g': 'M.2.B.10.b', # other chemicals
+        '2.C': '2.C',
+        '2.D': 'M.2.H.1_2',
+        '2.E': '2.B.9',
+        '2.F': '2.F',
+        '2.G': '2.H.3',
+        '4': 'M.AG',
+        '4.A': '3.A.1',
+        '4.B': '3.A.2',
+        '4.C': '3.C.7',
+        '4.D': 'M.3.C.45.AG',
+        '4.E': '3.C.1.c',
+        '4.F': '3.C.1.b',
+        '4.G': '3.C.8',
+        '5': 'M.LULUCF',
+        '6': '4',
+        '6.A': '4.A',
+        '6.B': '4.B',
+        '6.C': '4.C',
+        '6.D': '4.D',
+        '24540': '0',
+        '15163': 'M.0.EL',
+        '14637': 'M.BK',
+        '14424': 'M.BK.A',
+        '14423': 'M.BK.M',
+        '14638': 'M.BIO',
+        '7': '5',
+    }, #5.A-D ignored as not fitting 2006 cats
+
+    'aggregate': {
+        '2.A.4': {'sources': ['2.A.4.b'], 'name': 'Other uses of soda ashes'},
+        '2.B.8': {'sources': ['2.B.8.b'], 'name': 'Petrochemical and Carbon Black production'},
+        '2.B.10': {'sources': ['M.2.B.10.a', 'M.2.B.10.b'], 'name': 'Other'},
+        '2.B': {'sources': ['2.B.2', '2.B.8', '2.B.9', '2.B.10'], 'name': 'Chemical Industry'},
+        '2.H': {'sources': ['M.2.H.1_2', '2.H.3'], 'name': 'Other'},
+        # '2': {'sources': ['2.A', '2.B', '2.C', '2.F', '2.H'],
+        #       'name': 'Industrial Processes and Product Use'},
+        '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
+        '3.C.1': {'sources': ['3.C.1.b', '3.C.1.c'],
+                     'name': 'Emissions from biomass burning'},
+        'M.3.C.1.AG': {'sources': ['3.C.1.b', '3.C.1.c'],
+                     'name': 'Emissions from biomass burning (Agriculture)'},
+        '3.C': {'sources': ['3.C.1', 'M.3.C.45.AG', '3.C.7', '3.C.8'],
+                     'name': 'Aggregate sources and non-CO2 emissions sources on land'},
+        'M.3.C.AG': {'sources': ['M.3.C.1.AG', 'M.3.C.45.AG', '3.C.7', '3.C.8'],
+                     'name': 'Aggregate sources and non-CO2 emissions sources on land ('
+                             'Agriculture)'},
+        'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock'},
+        '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
+        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'], 'name': 'National total '
+                                                                    'excluding LULUCF'},
+    },
+    'basket_copy': {
+        'GWPs_to_add': ["AR4GWP100", "AR5GWP100", "AR6GWP100"],
+        'entities': ["HFCS", "PFCS"],
+        'source_GWP': 'SARGWP100',
+    },
+}
+
+sectors_to_save = [
+    '1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4', '1.A.4.a', '1.A.4.b', '1.A.4.c',
+    '1.A.5',
+    '1.B', '1.B.1', '1.B.2',
+    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4', '2.A.5',
+    '2.B', '2.B.2', '2.B.8', '2.B.9', '2.B.10', '2.C', '2.F', '2.H',
+    '3', 'M.AG', '3.A', '3.A.1', '3.A.2',
+    '3.C', '3.C.1', 'M.3.C.1.AG', '3.C.7', 'M.3.C.45.AG', '3.C.8', 'M.3.C.AG',
+    'M.LULUCF', 'M.AG.ELV',
+    '4', '4.A', '4.B', '4.C', '4.D',
+    '0', 'M.0.EL', 'M.BK', 'M.BK.A', 'M.BK.M', 'M.BIO', '5']
+
+
+# gas baskets
+gas_baskets = {
+    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
+    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR5GWP100)': ['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR6GWP100)': ['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
+    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
+    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
+    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
+    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
+}
+
+
+#### functions
+def is_int(input: str) -> bool:
+    try:
+        locale.atoi(input)
+        return True
+    except:
+        return False

+ 299 - 0
UNFCCC_GHG_data/UNFCCC_reader/Israel/read_ISR_BUR2_from_pdf.py

@@ -0,0 +1,299 @@
+# read Israel's BUR2 from pdf
+
+# TODO: bunkers trend tables not read because of special format
+
+from UNFCCC_GHG_data.helper import process_data_for_country, GWP_factors
+from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
+import camelot
+import primap2 as pm2
+import pandas as pd
+import locale
+
+# configuration import
+from config_ISR_BUR2 import trend_table_def, gwp_to_use
+from config_ISR_BUR2 import inv_tab_conf, inv_table_def
+from config_ISR_BUR2 import coords_cols, coords_terminologies, coords_defaults, \
+    coords_value_mapping, filter_remove, filter_keep, meta_data
+from config_ISR_BUR2 import cat_conversion, sectors_to_save, downscaling, \
+    cats_to_agg, gas_baskets, terminology_proc
+from config_ISR_BUR2 import is_int
+
+### genral configuration
+input_folder = downloaded_data_path / 'UNFCCC' / 'Israel' / 'BUR2'
+output_folder = extracted_data_path / 'UNFCCC' / 'Israel'
+if not output_folder.exists():
+    output_folder.mkdir()
+
+output_filename = 'ISR_BUR2_2021_'
+inventory_file_pdf = '2nd_Biennial_Update_Report_2021_final.pdf'
+#years_to_read = range(1990, 2018 + 1)
+pages_to_read_trends = range(48, 54)
+pages_to_read_inventory = range(54, 66)
+
+# define locale to use for str to float conversion
+locale_to_use = 'en_IL.UTF-8'
+locale.setlocale(locale.LC_NUMERIC, locale_to_use)
+
+compression = dict(zlib=True, complevel=9)
+
+#### trend tables
+
+# read
+tables_trends = camelot.read_pdf(str(input_folder / inventory_file_pdf), pages=','.join(
+    [str(page) for page in pages_to_read_trends]), flavor='lattice')
+
+# convert to pm2
+table_trends = None
+for table in trend_table_def.keys():
+    current_def = trend_table_def[table]
+    new_table = None
+    for subtable in current_def['tables']:
+        if new_table is None:
+            new_table = tables_trends[subtable].df
+        else:
+            new_table = pd.concat([new_table, tables_trends[subtable].df])
+
+    for col in new_table.columns.values:
+        new_table[col] = new_table[col].str.replace("\n", "")
+
+    new_table.iloc[0, 0] = current_def['given_col']
+    new_table.columns = new_table.iloc[0]
+    new_table = new_table.drop(labels=[0])
+    new_table = new_table.reset_index(drop=True)
+
+    if 'take_only' in current_def.keys():
+        new_table = new_table[
+            new_table[current_def['given_col']].isin(current_def['take_only'])]
+
+    time_cols = [col for col in new_table.columns.values if is_int(col)]
+    for col in time_cols:
+        # no NE,NA etc, just numbers, so we can just remove the ','
+        new_table[col] = new_table[col].str.replace(',', '')
+        new_table[col] = new_table[col].str.replace(' ', '')
+
+    for col in current_def['cols_add']:
+        new_table[col] = current_def['cols_add'][col]
+
+    if table_trends is None:
+        table_trends = new_table
+    else:
+        table_trends = pd.concat([table_trends, new_table])
+
+# ###
+# convert to PRIMAP2 interchange format
+# ###
+data_if_trends = pm2.pm2io.convert_wide_dataframe_if(
+    table_trends,
+    coords_cols=coords_cols,
+    # add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults,
+    coords_terminologies=coords_terminologies,
+    coords_value_mapping=coords_value_mapping,
+    # coords_value_filling=coords_value_filling,
+    filter_remove=filter_remove,
+    # filter_keep=filter_keep,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format='%Y'
+)
+
+
+data_pm2_trends = pm2.pm2io.from_interchange_format(data_if_trends)
+
+#### inventory tables
+# read inventory tables
+tables_inv = camelot.read_pdf(
+    str(input_folder / inventory_file_pdf),
+    pages=','.join([str(page) for page in pages_to_read_inventory]),
+    flavor='lattice')
+
+# process
+table_inv = None
+for table in inv_table_def.keys():
+    new_table = None
+    print(f"working on year {table}")
+    for subtable in inv_table_def[table]['tables']:
+        print(f"adding table {subtable}")
+        if new_table is None:
+            new_table = tables_inv[subtable].df
+        else:
+            new_table = pd.concat([new_table, tables_inv[subtable].df], axis=0,
+                                  join='outer')
+            new_table = new_table.reset_index(drop=True)
+
+        # replace line breaks, double, and triple spaces in category names
+        new_table.iloc[:, 0] = new_table.iloc[:, 0].str.replace("\n", " ")
+        new_table.iloc[:, 0] = new_table.iloc[:, 0].str.replace("   ", " ")
+        new_table.iloc[:, 0] = new_table.iloc[:, 0].str.replace("  ", " ")
+
+    if table == "2010":
+        # table has a broken header. use last one
+        new_table.iloc[inv_tab_conf["entity_row"]] = inv_tab_conf["header_2010"]
+    else:
+        # replace line breaks in units and entities
+        new_table.iloc[inv_tab_conf["entity_row"]] = new_table.iloc[
+            inv_tab_conf["entity_row"]].str.replace('\n', '')
+
+    # get_year
+    year = new_table.iloc[inv_tab_conf["cat_pos"][0], inv_tab_conf["cat_pos"][1]]
+
+    # set category col label
+    new_table.iloc[inv_tab_conf["cat_pos"][0], inv_tab_conf["cat_pos"][1]] = 'category'
+
+    new_table = pm2.pm2io.nir_add_unit_information(
+        new_table,
+        unit_row=inv_tab_conf["unit_row"], entity_row=inv_tab_conf["entity_row"],
+        regexp_entity=inv_tab_conf["regex_entity"], regexp_unit=inv_tab_conf[
+            "regex_unit"],
+        default_unit="", manual_repl_unit=inv_tab_conf["unit_repl"])
+
+    # fix individual values
+    if table == '1996':
+        loc = new_table[new_table["category"] == "NITRIC ACID PRODUCTION"].index
+        value = new_table.loc[loc, "CH4"].values
+        new_table.loc[loc, "N2O"] = value[0, 0]
+        new_table.loc[loc, "CH4"] = ''
+    if table == '2015':
+        loc_total = new_table[
+            new_table["category"] == "Total national emissions and removals"].index
+        loc_IPPU = new_table[new_table["category"] == "2. Industrial processes"].index
+        value = new_table.loc[loc_IPPU, "PFCs"].values
+        new_table.loc[loc_total, "PFCs"] = value[0, 0]
+
+    # remove lines with empty category
+    new_table = new_table.drop(new_table[new_table["category"] == ""].index)
+
+    # rename E. Other (please specify) according to row above
+    e_locs = list(new_table[new_table["category"] == "E. Other (please specify)"].index)
+    for loc in e_locs:
+        iloc = new_table.index.get_loc(loc)
+        if new_table.iloc[iloc - 1]["category"][
+            0] == "D. CO2 emissions and removals from soil":
+            new_table.loc[loc]["category"] = "E. Other (LULUCF)"
+        elif new_table.iloc[iloc - 1]["category"][0] in ["D.Waste-water handling",
+                                                         'D. Waste-water handling']:
+            new_table.loc[loc]["category"] = "E. Other (Waste)"
+
+    # rename G. Other (please specify) according to row above
+    g_locs = list(new_table[new_table["category"] == "G. Other (please specify)"].index)
+    for loc in g_locs:
+        iloc = new_table.index.get_loc(loc)
+        if new_table.iloc[iloc - 1]["category"][
+            0] == "F. Field burning of agricultural residues":
+            new_table.loc[loc]["category"] = "G. Other (Agri)"
+        elif new_table.iloc[iloc - 1]["category"][
+            0] == "F. Consumption of halocarbons and sulphur hexafluoride":
+            new_table.loc[loc]["category"] = "G. Other (IPPU)"
+
+    # set index and convert to long format
+    new_table = new_table.set_index(inv_tab_conf["index_cols"])
+    new_table_long = pm2.pm2io.nir_convert_df_to_long(new_table, year,
+                                                      inv_tab_conf["header_long"])
+    # remove line breaks in values
+    new_table_long["data"] = new_table_long["data"].str.replace("\n", "")
+
+    if table_inv is None:
+        table_inv = new_table_long
+    else:
+        table_inv = pd.concat([table_inv, new_table_long], axis=0, join='outer')
+        table_inv = table_inv.reset_index(drop=True)
+
+# no NE,NA etc, just numbers, so we can just remove the ','
+table_inv["data"] = table_inv["data"].str.replace(',', '')
+table_inv["data"] = table_inv["data"].str.replace(' ', '')
+
+# ###
+# convert to PRIMAP2 interchange format
+# ###
+data_if_inv = pm2.pm2io.convert_long_dataframe_if(
+    table_inv,
+    coords_cols=coords_cols,
+    # add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults,
+    coords_terminologies=coords_terminologies,
+    coords_value_mapping=coords_value_mapping,
+    # coords_value_filling=coords_value_filling,
+    filter_remove=filter_remove,
+    # filter_keep=filter_keep,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format='%Y',
+)
+
+data_pm2_inv = pm2.pm2io.from_interchange_format(data_if_inv)
+
+#### combine
+# tolerance needs to be high as rounding in trend tables leads to inconsistent data
+data_pm2 = data_pm2_inv.pr.merge(data_pm2_trends,tolerance=0.11)
+# convert back to IF to have units in the fixed format
+data_if = data_pm2.pr.to_interchange_format()
+
+# ###
+# save data to IF and native format
+# ###
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw"), data_if)
+
+encoding = {var: compression for var in data_pm2.data_vars}
+data_pm2.pr.to_netcdf(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
+    encoding=encoding)
+
+
+#### processing
+data_proc_pm2 = data_pm2
+
+# combine CO2 emissions and removals
+temp_CO2 = data_proc_pm2["CO2"].copy()
+#data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].to_array()
+# .pr.sum(dim="variable", skipna=True, min_count=1)
+data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].pr.sum\
+    (dim="entity", skipna=True, min_count=1)
+data_proc_pm2["CO2"].attrs = temp_CO2.attrs
+data_proc_pm2["CO2"] = data_proc_pm2["CO2"].fillna(temp_CO2)
+
+# actual processing
+country_processing_step1 = {
+    'aggregate_cats': cats_to_agg,
+}
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=['CO2 emissions', 'CO2 removals'],
+    gas_baskets={},
+    processing_info_country=country_processing_step1,
+)
+
+country_processing_step2 = {
+    'downscale': downscaling,
+}
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=[],
+    gas_baskets=gas_baskets,
+    processing_info_country=country_processing_step2,
+    cat_terminology_out = terminology_proc,
+    category_conversion = cat_conversion,
+    sectors_out = sectors_to_save,
+)
+
+# adapt source and metadata
+# TODO: processing info is present twice
+current_source = data_proc_pm2.coords["source"].values[0]
+data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
+data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
+
+# ###
+# save data to IF and native format
+# ###
+data_proc_if = data_proc_pm2.pr.to_interchange_format()
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + terminology_proc), data_proc_if)
+
+encoding = {var: compression for var in data_proc_pm2.data_vars}
+data_proc_pm2.pr.to_netcdf(
+    output_folder / (output_filename + terminology_proc + ".nc"),
+    encoding=encoding)

+ 676 - 0
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR3.py

@@ -0,0 +1,676 @@
+import pandas as pd
+gwp_to_use = "AR4GWP100"
+
+
+cat_names_fix = {
+    '2A3 Glass Prod.': '2A3 Glass Production',
+    '2F6 Other Applications': '2F6 Other Applications (please specify)',
+    '3A2 Manure Mngmt': '3A2 Manure Mngmt.',
+    '3C7 Rice Cultivations': '3C7 Rice Cultivation',
+}
+
+values_replacement = {
+    '': '-',
+    ' ': '-',
+}
+
+cols_for_space_stripping = ["Categories"]
+
+index_cols = ["Categories", "entity", "unit"]
+
+# parameters part 2: conversion to interchange format
+cats_remove = ['Memo items', 'Information items']
+
+cat_codes_manual = {
+    'Annual change in long-term storage of carbon in HWP waste': 'M.LTS.AC.HWP',
+    'Annual change in total long-term storage of carbon stored': 'M.LTS.AC.TOT',
+    'CO2 captured': 'M.CCS',
+    'CO2 from Biomass Burning for Energy Production': 'M.BIO',
+    'For domestic storage': 'M.CCS.DOM',
+    'For storage in other countries': 'M.CCS.OCT',
+    'International Aviation (International Bunkers)': 'M.BK.A',
+    'International Bunkers': 'M.BK',
+    'International Water-borne Transport (International Bunkers)': 'M.BK.M',
+    'Long-term storage of carbon in waste disposal sites': 'M.LTS.WASTE',
+    'Multilateral Operations': 'M.MULTIOP',
+    'Other (please specify)': 'M.OTHER',
+    'Total National Emissions and Removals': '0',
+}
+
+cat_code_regexp = r'(?P<code>^[A-Z0-9]{1,4})\s.*'
+
+coords_terminologies = {
+    "area": "ISO3",
+    "category": "IPCC2006_PRIMAP",
+    "scenario": "PRIMAP",
+}
+
+coords_defaults = {
+    "source": "MYS-GHG-inventory",
+    "provenance": "measured",
+    "area": "MYS",
+    "scenario": "BUR3"
+}
+
+coords_value_mapping = {
+}
+
+coords_cols = {
+    "category": "Categories",
+    "entity": "entity",
+    "unit": "unit"
+}
+
+add_coords_cols = {
+    "orig_cat_name": ["orig_cat_name", "category"],
+}
+
+meta_data = {
+    "references": "https://unfccc.int/documents/267685",
+    "rights": "",
+    "contact": "mail@johannes-guetschow.de",
+    "title": "Malaysia - Third Biennial Update Report to the UNFCCC",
+    "comment": "Read fom pdf file by Johannes Gütschow",
+    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
+}
+
+terminology_proc = coords_terminologies["category"]
+
+table_def_templates = {
+    '184': { #184
+        "area": ['54,498,793,100'],
+        "cols": ['150,197,250,296,346,394,444,493,540,587,637,685,738'],
+        "rows_to_fix": {
+            3: ['Total National', '1A Fuel Combustion', '1A1 Energy', '1A2 Manufacturing',
+                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other emissions',
+                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A1 Cement',
+               ],
+        },
+    },
+    '185': { #184
+        "area": ['34,504,813,99'],
+        "cols": ['128,177,224,273,321,373,425,473,519,564,611,661,713,765'],
+        "rows_to_fix": {
+            3: ['Total National', '1A Fuel', '1A1 Energy', '1A2 Manufacturing',
+                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other',
+                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A Mineral',
+                '2A1 Cement', '2A2 Lime',
+               ],
+        },
+    },
+    '186': { #also 200
+        "area": ['53,498,786,104'],
+        "cols": ['150,197,238,296,347,396,444,489,540,587,634,686,739'],
+        "rows_to_fix": {
+            3: ['2A3 Glass', '2A4 Other Process', '2A5 Other (please',
+                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
+                '2B3 Adipic Acid', '2B4 Caprolactam,', '2B5 Carbide',
+                '2B6 Titanium', '2B7 Soda Ash', '2B8 Petrochemical',
+                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys'
+               ],
+            2: ['2B9 Fluorochemical'],
+        },
+    },
+    '187': { # also 201
+        "area": ['39,499,807,91'],
+        "cols": ['132,185,232,280,327,375,425,470,522,568,613,664,713,763'],
+        "rows_to_fix": {
+            3: ['2A3 Glass', '2A4 Other Process', '2A5 Other (please',
+                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
+                '2B3 Adipic Acid', '2B5 Carbide',
+                '2B6 Titanium', '2B7 Soda Ash', '2B8 Petrochemical',
+                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys',
+               ],
+            2: ['2B9 Fluorochemical'],
+            5: ['2B4 Caprolactam,'],
+        },
+    },
+    '188': {
+        "area": ['48,503,802,92'],
+        "cols": ['146,194,245,295,346,400,452,500,549,596,642,695,746'],
+        "rows_to_fix": {
+            3: ['2C3 Aluminium', '2C4 Magnesium', '2C7 Other (please',
+                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
+                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
+                '2F1 Refrigeration',
+               ],
+            2: ['2E2 TFT Flat Panel', '2E4 Heat Transfer'],
+            5: ['2F Product Uses as'],
+        },
+    },
+    '189': {
+        "area": ['41,499,806,95'],
+        "cols": ['141,184,233,282,331,376,427,472,520,567,618,665,717,760'],
+        "rows_to_fix": {
+            3: ['2C3 Aluminium', '2C4 Magnesium', '2C7 Other (please',
+                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
+                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
+                '2F1 Refrigeration',
+               ],
+            2: ['2E2 TFT Flat Panel', '2E4 Heat Transfer'],
+            5: ['2F Product Uses as'],
+        },
+    },
+    '190': {
+        "area": ['45,500,802,125'],
+        "cols": ['146,193,243,295,349,400,453,501,549,595,644,696,748'],
+        "rows_to_fix": {
+            3: ['2F2 Foam Blowing', '2F6 Other', '2G Other Product',
+                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
+                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
+               ],
+            2: ['2G1 Electrical', '2G3 N2O from', '3A1 Enteric'],
+        },
+    },
+    '191': {
+        "area": ['38,498,814,120'],
+        "cols": ['130,180,229,277,326,381,429,477,526,570,620,669,717,765'],
+        "rows_to_fix": {
+            3: ['2F2 Foam Blowing', '2F6 Other', '2G Other Product',
+                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
+                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
+               ],
+            2: ['2G1 Electrical', '2G3 N2O from', '3A1 Enteric'],
+        },
+    },
+    '192': {
+        "area": ['39,502,807,106'],
+        "cols": ['134,193,245,296,346,400,455,507,556,602,650,701,755'],
+        "rows_to_fix": {
+            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
+                '3C6 Indirect N2O', '3C8 Other (please', '3D1 Harvested Wood',
+                '3D2 Other (please',
+               ],
+            5: ['3C Aggregate',],
+        },
+    },
+    '193': {
+        "area": ['36,508,815,119'],
+        "cols": ['128,179,228,278,327,379,428,476,525,571,622,670,717,766'],
+        "rows_to_fix": {
+            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
+                '3C6 Indirect N2O', '3C8 Other (please', '3D1 Harvested',
+                '3D2 Other (please',
+               ],
+            5: ['3C Aggregate',],
+        },
+    },
+    '194': {
+        "area": ['80,502,762,151'],
+        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
+        "rows_to_fix": {
+            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',],
+            2: ['4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised Waste',
+                '4B Biological Treatment', '4D Wastewater', '4D1 Domestic Wastewater',
+                '4D2 Industrial Wastewater',
+               ],
+            5: ['5A Indirect N2O'],
+        },
+    },
+    '195': {
+        "area": ['78,508,765,103'],
+        "cols": ['191,230,271,314,352,400,438,475,519,566,600,645,686,730'],
+        "rows_to_fix": {
+            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',
+                '4B Biological', '4D Wastewater', '4D1 Domestic',
+                '4D2 Industrial', '5B Other (please'
+               ],
+            2: ['4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised',
+                '4A Solid Waste',
+               ],
+            5: ['5A Indirect N2O'],
+        },
+    },
+    '196': {
+        "area": ['80,502,762,151'],
+        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
+        "rows_to_fix": {
+            3: ['International Aviation', 'International Water-borne',
+                'CO2 from Biomass Burning', 'For storage in other',
+                'Long-term storage of', 'Annual change in total',
+                'Annual change in long-',
+               ],
+        },
+    },
+    '197': {
+        "area": ['74,507,779,201'],
+        "cols": ['182,226,268,311,354,398,444,482,524,565,610,654,693,733'],
+        "rows_to_fix": {
+            3: ['International Aviation', 'International Water-',
+                'CO2 from Biomass', 'For storage in other',
+                'Long-term storage of', 'Annual change in total',
+                'Annual change in long-',
+               ],
+        },
+    },
+    '198': { # first CH4 table
+        "area": ['54,498,793,100'],
+        "cols": ['140,197,250,296,346,394,444,493,540,587,637,685,738'],
+        "rows_to_fix": {
+            3: ['Total National', '1A Fuel Combustion', '1A1 Energy', '1A2 Manufacturing',
+                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other emissions',
+                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A1 Cement',
+               ],
+            -3: ['2A Mineral Industry'],
+        },
+    },
+    '199': {
+        "area": ['34,506,818,97'],
+        "cols": ['132,177,228,276,329,377,432,479,528,574,618,667,722,774'],
+        "rows_to_fix": {
+            3: ['Total National', '1A Fuel', '1A1 Energy', '1A2 Manufacturing',
+                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other',
+                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A1 Cement',
+                '2A Mineral', '2A2 Lime',
+               ],
+        },
+    },
+    '202': {
+        "area": ['48,503,802,92'],
+        "cols": ['146,194,245,295,346,400,452,500,549,596,642,695,746'],
+        "rows_to_fix": {
+            3: ['2C3 Aluminium', '2C7 Other (please',
+                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
+                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
+               ],
+            2: ['2C4 Magnesium', '2E2 TFT Flat Panel', '2E4 Heat Transfer',
+                '2F1 Refrigeration',
+               ],
+            5: ['2F Product Uses as'],
+        },
+    },
+    '203': {
+        "area": ['41,499,806,95'],
+        "cols": ['141,184,233,282,331,376,427,472,520,567,618,665,717,760'],
+        "rows_to_fix": {
+            3: ['2C3 Aluminium', '2C7 Other (please',
+                '2D Non-Energy', '2D2 Paraffin Wax', '2D4 Other (please',
+                '2E Electronics', '2E1 Integrated', '2E5 Other (please',
+               ],
+            2: ['2C4 Magnesium', '2E2 TFT Flat Panel', '2E4 Heat Transfer',
+                '2F1 Refrigeration'
+               ],
+            5: ['2F Product Uses as'],
+        },
+    },
+    '204': {
+        "area": ['45,500,802,125'],
+        "cols": ['146,193,243,295,349,400,455,501,549,595,644,696,748'],
+        "rows_to_fix": {
+            3: ['2F6 Other', '2G Other Product',
+                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
+                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
+                '3A1 Enteric',
+               ],
+            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G3 N2O from'],
+        },
+    },
+    '205': {
+        "area": ['38,498,814,120'],
+        "cols": ['130,180,229,277,326,381,429,477,526,570,620,669,717,765'],
+        "rows_to_fix": {
+            3: ['2F6 Other', '2G Other Product',
+                '2G2 SF6 and PFCs', '2G4 Other (Please', '2H1 Pulp and Paper',
+                '2H2 Food and', '2H3 Other (please', '3 AGRICULTURE,',
+                '3A1 Enteric',
+               ],
+            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G3 N2O from'],
+        },
+    },
+    '206': { #also 220
+        "area": ['39,502,807,106'],
+        "cols": ['134,193,245,296,346,400,455,507,556,602,650,701,755'],
+        "rows_to_fix": {
+            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
+                '3C6 Indirect N2O', '3C8 Other (please',
+                '3D2 Other (please',
+               ],
+            2: ['3D1 Harvested Wood',],
+            5: ['3C Aggregate',],
+        },
+    },
+    '207': { # also 221
+        "area": ['36,508,815,110'],
+        "cols": ['128,179,228,278,327,379,428,476,527,571,622,670,717,766'],
+        "rows_to_fix": {
+            3: ['3C1 Emissions from', '3C4 Direct N2O', '3C5 Indirect N2O',
+                '3C6 Indirect N2O', '3C8 Other (please',
+                '3D2 Other (please',
+               ],
+            2: ['3D1 Harvested',],
+            5: ['3C Aggregate',],
+        },
+    },
+    '208': { # also 222
+        "area": ['80,502,762,151'],
+        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
+        "rows_to_fix": {
+            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',
+                '4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised Waste',
+                '4B Biological Treatment', '4D Wastewater', '4D1 Domestic Wastewater',
+                '4D2 Industrial Wastewater'
+               ],
+            5: ['5A Indirect N2O'],
+        },
+    },
+    '209': { # also 223
+        "area": ['78,508,765,103'],
+        "cols": ['191,230,271,314,352,400,438,475,519,560,600,645,686,730'],
+        "rows_to_fix": {
+            3: ['4C Incineration and', '4C2 Open Burning of', '4E Other',
+                '4B Biological', '4D Wastewater', '4D1 Domestic',
+                '4D2 Industrial', '5B Other (please',
+                '4A1 Managed Waste', '4A2 Unmanaged Waste', '4A3 Uncategorised',
+                '4A Solid Waste'
+               ],
+            5: ['5A Indirect N2O'],
+        },
+    },
+    '210': { # also 224
+        "area": ['80,502,762,151'],
+        "cols": ['201,243,285,329,376,419,462,502,551,591,635,679,724'],
+        "rows_to_fix": {
+            3: ['International Aviation', 'International Water-borne',
+                'Long-term storage of', 'Annual change in total',
+                'Annual change in long-',
+               ],
+            2: ['CO2 from Biomass Burning', 'For storage in other',],
+        },
+    },
+    '211': { # also 225
+        "area": ['74,507,779,201'],
+        "cols": ['182,226,268,311,354,398,444,482,524,565,610,654,693,733'],
+        "rows_to_fix": {
+            3: ['International Aviation', 'International Water-',
+                'Long-term storage of', 'Annual change in total',
+                'Annual change in long-', 'CO2 from Biomass',
+               ],
+            2: ['For storage in other',],
+        },
+    },
+    '212': {
+        "area": ['54,498,793,100'],
+        "cols": ['150,197,250,296,346,394,444,493,540,587,637,685,738'],
+        "rows_to_fix": {
+            3: ['Total National', '1A Fuel Combustion', '1A1 Energy', '1A2 Manufacturing',
+                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other emissions',
+                '1C Carbon Dioxide', '2 INDUSTRIAL',
+               ],
+            2: ['2A1 Cement',],
+        },
+    },
+    '213': {
+        "area": ['34,504,813,99'],
+        "cols": ['128,177,224,273,321,373,425,473,519,564,611,661,713,765'],
+        "rows_to_fix": {
+            3: ['Total National', '1A Fuel', '1A1 Energy', '1A2 Manufacturing',
+                '1B Fugitive', '1B2 Oil and Natural', '1B3 Other',
+                '1C Carbon Dioxide', '2 INDUSTRIAL', '2A Mineral',
+               ],
+            2: ['2A1 Cement', '2A2 Lime',],
+        },
+    },
+    '214': {
+        "area": ['47,499,801,93'],
+        "cols": ['141,197,246,297,350,396,453,502,550,595,642,692,748'],
+        "rows_to_fix": {
+            3: ['2A5 Other (please',
+                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
+                '2B3 Adipic Acid', '2B4 Caprolactam,', '2B5 Carbide',
+                '2B6 Titanium', '2B7 Soda Ash', '2B8 Petrochemical',
+                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys'
+               ],
+            2: ['2A3 Glass', '2A4 Other Process', '2B9 Fluorochemical'],
+            -3: ['2C Metal Industry'],
+        },
+    },
+    '215': {
+        "area": ['39,499,807,91'],
+        "cols": ['132,180,232,280,327,375,425,470,522,568,613,664,713,763'],
+        "rows_to_fix": {
+            3: ['2A5 Other (please',
+                '2B Chemical', '2B1 Ammonia', '2B2 Nitric Acid',
+                '2B3 Adipic Acid', '2B4 Caprolactam,', '2B5 Carbide',
+                '2B6 Titanium Dioxide', '2B7 Soda Ash', '2B8 Petrochemical',
+                '2B10 Other (Please', '2C1 Iron and Steel', '2C2 Ferroalloys'
+               ],
+            2: ['2A4 Other Process', '2B9 Fluorochemical'],
+            -3: ['2C Metal Industry'],
+        },
+    },
+    '216': {
+        "area": ['48,503,802,92'],
+        "cols": ['146,194,245,295,346,400,452,500,549,596,642,695,746'],
+        "rows_to_fix": {
+            3: ['2C7 Other (please', '2D Non-Energy', '2D2 Paraffin Wax',
+                '2D4 Other (please', '2E Electronics', '2E1 Integrated',
+                '2E5 Other (please',
+               ],
+            2: ['2C3 Aluminium', '2C4 Magnesium', '2E2 TFT Flat Panel',
+                '2E4 Heat Transfer', '2F1 Refrigeration',
+               ],
+            5: ['2F Product Uses as'],
+        },
+    },
+    '217': {
+        "area": ['41,499,806,95'],
+        "cols": ['141,184,233,282,331,376,427,472,520,567,618,665,717,760'],
+        "rows_to_fix": {
+            3: ['2C7 Other (please', '2D Non-Energy', '2D2 Paraffin Wax',
+                '2D4 Other (please', '2E Electronics', '2E1 Integrated',
+                '2E5 Other (please',
+               ],
+            2: ['2C3 Aluminium', '2C4 Magnesium', '2E2 TFT Flat Panel',
+                '2E4 Heat Transfer', '2F1 Refrigeration',
+               ],
+            5: ['2F Product Uses as'],
+        },
+    },
+    '218': {
+        "area": ['45,500,802,125'],
+        "cols": ['146,193,243,295,349,400,455,501,549,595,644,696,748'],
+        "rows_to_fix": {
+            3: ['2F6 Other', '2G Other Product', '2G2 SF6 and PFCs',
+                '2G3 N2O from', '2H3 Other (please', '3 AGRICULTURE,',
+               ],
+            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G4 Other (Please',
+                '2H1 Pulp and Paper', '2H2 Food and', '3A1 Enteric',],
+        },
+    },
+    '219': {
+        "area": ['38,498,814,120'],
+        "cols": ['130,180,229,277,326,381,429,477,526,570,620,669,717,765'],
+        "rows_to_fix": {
+            3: ['2F6 Other', '2G Other Product', '2G2 SF6 and PFCs',
+                '2G3 N2O from', '2H3 Other (please', '3 AGRICULTURE,',
+               ],
+            2: ['2F2 Foam Blowing', '2G1 Electrical', '2G4 Other (Please',
+                '2H1 Pulp and Paper', '2H2 Food and', '3A1 Enteric',],
+        },
+    },
+    '226': { # also 334, 238
+        "area": ['48,510,797,99'],
+        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
+        "rows_to_fix": {
+            2: ['2B4 Caprolactam, Glyoxal and Glyoxylic Acid'],
+        }
+    },
+    '227': { # also 331, 335, 339
+        "area": ['27,510,818,99'],
+        "cols": ['250,290,333,372,413,452,494,536,576,616,656,699,739,781'],
+        "rows_to_fix": {
+            2: ['2B4 Caprolactam, Glyoxal and Glyoxylic Acid'],
+        }
+    },
+    '228': {
+        "area": ['48,510,797,99'],
+        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone'],
+            2: ['2D Non-Energy Products from Fuels and Solvent'],
+        },
+    },
+    '229': {
+        "area": ['25,512,819,86'],
+        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone'],
+            2: ['2D Non-Energy Products from Fuels and Solvent'],
+        },
+    },
+    '230': {
+        "area": ['48,510,797,99'],
+        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
+        "rows_to_fix": {
+            -3: ['Total National Emissions and Removals', '2 INDUSTRIAL PROCESSES AND PRODUCT USE'],
+            2: ['2B4 Caprolactam, Glyoxal and Glyoxylic Acid'],
+        }
+    },
+    '232': { # also 236
+        "area": ['48,510,797,99'],
+        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
+        "rows_to_fix": {
+            -3: ['2G2 SF6 and PFCs from Other Product Uses',],
+            2: ['2D Non-Energy Products from Fuels and Solvent',
+                '2F Product Uses as Substitutes for Ozone',]
+        },
+    },
+    '233': {
+        "area": ['25,512,819,86'],
+        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
+        "rows_to_fix": {
+            -5: ['2F Product Uses as Substitutes for Ozone'],
+            2: ['2D Non-Energy Products from Fuels and Solvent'],
+            -3: ['2G Other Product Manufacture and Use',
+                 '2G2 SF6 and PFCs from Other Product Uses',]
+        },
+    },
+    '237': {
+        "area": ['25,512,819,86'],
+        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
+        "rows_to_fix": {
+            2: ['2D Non-Energy Products from Fuels and Solvent',
+                '2F Product Uses as Substitutes for Ozone'],
+        },
+    },
+    '240': {
+        "area": ['48,510,797,99'],
+        "cols": ['271,310,350,393,435,475,514,557,594,640,678,719,760'],
+        "rows_to_fix": {
+            2: ['2D Non-Energy Products from Fuels and Solvent',
+                '2F Product Uses as Substitutes for Ozone'],
+            -3: ['2E Electronics Industry',
+                 '2F1 Refrigeration and Air Conditioning',
+                 '2G2 SF6 and PFCs from Other Product Uses',],
+        },
+    },
+    '241': {
+        "area": ['25,512,819,86'],
+        "cols": ['246,291,331,370,412,454,495,536,577,619,656,699,740,777'],
+        "rows_to_fix": {
+            2: ['2D Non-Energy Products from Fuels and Solvent',
+                '2F Product Uses as Substitutes for Ozone',
+                '2E1 Integrated Circuit or Semiconductor',],
+            -3: ['2F1 Refrigeration and Air Conditioning',
+                 '2G2 SF6 and PFCs from Other Product Uses',],
+        },
+    },
+}
+
+table_defs = {
+    '184': {"template": '184', "entity": "CO2", "unit": "Gg CO2 / yr"}, #CO2
+    '185': {"template": '185', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '186': {"template": '186', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '187': {"template": '187', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '188': {"template": '188', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '189': {"template": '189', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '190': {"template": '190', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '191': {"template": '191', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '192': {"template": '192', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '193': {"template": '193', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '194': {"template": '194', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '195': {"template": '195', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '196': {"template": '196', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '197': {"template": '197', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '198': {"template": '198', "entity": "CH4", "unit": "Gg CH4 / yr"}, #CH4
+    '199': {"template": '199', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '200': {"template": '186', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '201': {"template": '187', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '202': {"template": '202', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '203': {"template": '203', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '204': {"template": '204', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '205': {"template": '205', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '206': {"template": '206', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '207': {"template": '207', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '208': {"template": '208', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '209': {"template": '209', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '210': {"template": '210', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '211': {"template": '211', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '212': {"template": '212', "entity": "N2O", "unit": "Gg N2O / yr"}, #N2O
+    '213': {"template": '213', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '214': {"template": '214', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '215': {"template": '215', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '216': {"template": '216', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '217': {"template": '217', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '218': {"template": '218', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '219': {"template": '219', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '220': {"template": '206', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '221': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '222': {"template": '208', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '223': {"template": '209', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '224': {"template": '210', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '225': {"template": '211', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '226': {"template": '226', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"}, #HFCs
+    '227': {"template": '227', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '228': {"template": '228', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '229': {"template": '229', "entity": "HFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '230': {"template": '230', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"}, #PFCs
+    '231': {"template": '227', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '232': {"template": '232', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '233': {"template": '233', "entity": "PFCS (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '234': {"template": '226', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"}, #SF6
+    '235': {"template": '227', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '236': {"template": '232', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '237': {"template": '237', "entity": "SF6 (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '238': {"template": '226', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"}, #NF3
+    '239': {"template": '227', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '240': {"template": '240', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"},
+    '241': {"template": '241', "entity": "NF3 (AR4GWP100)", "unit": "Gg CO2 / yr"},
+}
+
+country_processing_step1 = {
+    'aggregate_cats': {
+        'M.3.C.AG': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
+                                 '3.C.6', '3.C.7', '3.C.8'],
+                     'name': 'Aggregate sources and non-CO2 emissions sources on land '
+                             '(Agriculture)'},
+        'M.3.D.AG': {'sources': ['3.D.2'],
+                     'name': 'Other (Agriculture)'},
+        'M.AG.ELV': {'sources': ['M.3.C.AG', 'M.3.D.AG'],
+                     'name': 'Agriculture excluding livestock'},
+        'M.AG': {'sources': ['3.A', 'M.AG.ELV'],
+                     'name': 'Agriculture'},
+        'M.3.D.LU': {'sources': ['3.D.1'],
+                     'name': 'Other (LULUCF)'},
+        'M.LULUCF': {'sources': ['3.B', 'M.3.D.LU'],
+                     'name': 'LULUCF'},
+        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'],
+                     'name': 'National total emissions excluding LULUCF'},
+    },
+    'basket_copy': {
+        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
+        'entities': ["HFCS", "PFCS"],
+        'source_GWP': gwp_to_use,
+    },
+}
+
+gas_baskets = {
+    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
+    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
+    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
+    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
+    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
+    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
+}

+ 402 - 0
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/config_MYS_BUR4.py

@@ -0,0 +1,402 @@
+import pandas as pd
+gwp_to_use = "AR4GWP100"
+
+
+cat_names_fix = {
+    #'2A3 Glass Prod.': '2A3 Glass Production',
+    #'2F6 Other Applications': '2F6 Other Applications (please specify)',
+    #'3A2 Manure Mngmt': '3A2 Manure Mngmt.',
+    #'3C7 Rice Cultivations': '3C7 Rice Cultivation',
+}
+
+values_replacement = {
+    '': '-',
+    ' ': '-',
+}
+
+cols_for_space_stripping = ["Categories"]
+
+index_cols = ["Categories", "entity", "unit"]
+
+# parameters part 2: conversion to interchange format
+cats_remove = ['Memo items', 'Information items',  'Information items (1)']
+
+cat_codes_manual = {
+    'Annual change in long-term storage of carbon in HWP waste': 'M.LTS.AC.HWP',
+    'Annual change in total long-term storage of carbon stored': 'M.LTS.AC.TOT',
+    'CO2 captured': 'M.CCS',
+    'CO2 from Biomass Burning for Energy Production': 'M.BIO',
+    'For domestic storage': 'M.CCS.DOM',
+    'For storage in other countries': 'M.CCS.OCT',
+    'International Aviation (International Bunkers)': 'M.BK.A',
+    'International Bunkers': 'M.BK',
+    'International Water-borne Transport (International Bunkers)': 'M.BK.M',
+    'Long-term storage of carbon in waste disposal sites': 'M.LTS.WASTE',
+    'Multilateral Operations': 'M.MULTIOP',
+    'Other (please specify)': 'M.OTHER',
+    'Total National Emissions and Removals': '0',
+}
+
+cat_code_regexp = r'(?P<code>^[A-Z0-9]{1,4})\s.*'
+
+
+coords_terminologies = {
+    "area": "ISO3",
+    "category": "IPCC2006_PRIMAP",
+    "scenario": "PRIMAP",
+}
+
+coords_defaults = {
+    "source": "MYS-GHG-inventory",
+    "provenance": "measured",
+    "area": "MYS",
+    "scenario": "BUR4"
+}
+
+coords_value_mapping = {
+}
+
+coords_cols = {
+    "category": "Categories",
+    "entity": "entity",
+    "unit": "unit"
+}
+
+add_coords_cols = {
+    "orig_cat_name": ["orig_cat_name", "category"],
+}
+
+#filter_remove = {
+#    "f1": {
+#        "entity": ["CO2(grossemissions)", "CO2(removals)"],
+#    },
+#}
+
+meta_data = {
+    "references": "https://unfccc.int/documents/624776",
+    "rights": "",
+    "contact": "mail@johannes-guetschow.de",
+    "title": "Malaysia - Fourth Biennial Update Report under the UNFCCC",
+    "comment": "Read fom pdf file by Johannes Gütschow",
+    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
+}
+
+terminology_proc = coords_terminologies["category"]
+
+table_def_templates = {
+    # CO2
+    '203': {  # 203, 249
+        "area": ['70,480,768,169'],
+    },
+    '204': {  # 204
+        "area": ['70,500,763,141'],
+    },
+    '205': {  # 205, 209, 2014, 2018
+        "area": ['70,495,763,95'],
+        "rows_to_fix": {
+            2: ['5A Indirect N2O emissions from the atmospheric deposition of'],
+        },
+    },
+    '206': {  # 206
+        "area": ['70,495,763,353'],
+    },
+    '207': {  # 207, 208, 211, 212, 213, 215, 217, 223, 227, 231,
+        # 251, 257, 259, 263, 265
+        "area": ['70,495,763,95'],
+    },
+    '216': {  #  216
+        "area": ['70,500,763,95'],
+    },
+    # CH4
+    '219': {  # 219, 255
+        "area": ['70,480,768,100'],
+    },
+    '220': {  # 220, 224, 228
+        "area": ['70,495,763,95'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    '221': {  # 221
+        "area": ['92,508,748,92'],
+        "cols": ['298,340,380,422,462,502,542,582,622,662,702'],
+        "rows_to_fix": {
+            3: ['3C Aggregate sources and Non-CO2 emissions'],
+            2: ['5A Indirect N2O emissions from the atmospheric'],
+        },
+    },
+    '222': {  # 222
+        "area": ['70,495,763,323'],
+        "rows_to_fix": {
+            2: ['Annual change in long-term storage of carbon in HWP'],
+        },
+    },
+    '225': {  # 225
+        "area": ['92,508,748,92'],
+        "cols": ['311,357,400,443,486,529,572,615,658,701'],
+        "rows_to_fix": {
+            3: ['3C Aggregate sources and Non-CO2 emissions'],
+        },
+    },
+    '226': {  # 226, 230
+        "area": ['70,495,763,95'],
+        "rows_to_fix": {
+            2: ['5A Indirect N2O emissions from the atmospheric',
+                'Annual change in long-term storage of carbon in HWP'],
+        },
+    },
+    '229': {  # 229
+        "area": ['114,508,725,92'],
+        "cols": ['333,379,421,464,506,548,590,632,674'],
+        "rows_to_fix": {
+            3: ['3C Aggregate sources and Non-CO2 emissions'],
+        },
+    },
+    # N2O
+    '232': {  # 232
+        "area": ['70,495,763,95'],
+        "cols": ['315,366,416,466,516,566,616,666,716'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    '233': {  # 233
+        "area": ['70,495,763,95'],
+        "rows_to_fix": {
+            3: ['3C Aggregate sources and Non-CO2 emissions'],
+        },
+    },
+    '234': {  # 234
+        "area": ['70,495,763,95'],
+        "rows_to_fix": {
+            3: ['International Water-borne Transport (International'],
+        },
+    },
+    '236': {  # 236
+        "area": ['70,495,763,95'],
+        "cols": ['298,344,392,439,487,534,580,629,675,721'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    '240': {  # 240
+        "area": ['70,495,763,95'],
+        "cols": ['283,329,372,416,459,504,550,594,639,682,726'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    # HFCs
+    '243': {  # 243
+        "area": ['70,480,763,95'],
+        "cols": ['408,449,489,527,567,604,644,681,721'],
+    },
+    '244': {  # 244
+        "area": ['70,495,763,95'],
+        "cols": ['408,449,489,527,567,604,644,681,721'],
+    },
+    '245': {  # 245, 246
+        "area": ['70,495,763,95'],
+        "cols": ['405,442,478,515,550,587,621,657,693,729'],
+    },
+    '247': {  # 247, 248
+        "area": ['70,495,763,95'],
+        "cols": ['384,426,459,493,531,564,597,633,666,700,735'],
+    },
+    # PFCs
+    '250': {  # 250
+        "area": ['70,495,763,95'],
+        "cols": ['341,389,436,485,531,579,626,674,723'],
+    },
+    '252': {  # 252
+        "area": ['70,495,763,95'],
+        "cols": ['323,370,415,459,504,547,590,636,680,726'],
+    },
+    '253': {  # 253
+        "area": ['70,495,763,95'],
+        "cols": ['334,378,419,464,511,554,597,636,668,702,735'],
+    },
+    '254': {  # 254
+        "area": ['70,495,763,95'],
+        "cols": ['330,378,419,464,511,554,597,636,668,702,735'],
+        "rows_to_fix": {
+            -3: ['2F Product Uses as Substitutes for Ozone Depleting Substances'],
+        },
+    },
+    # SF6
+    '256': {  # 256
+        "area": ['70,495,763,95'],
+        "cols": ['382,420,462,504,546,588,630,672,714'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    '258': {  # 258
+        "area": ['70,495,763,95'],
+        "cols": ['363,399,441,481,522,564,606,646,688,728'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    '260': {  # 260
+        "area": ['70,495,763,95'],
+        "cols": ['346,381,419,458,498,536,576,614,652,692,732'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    # NF3
+    '261': {  # 261
+        "area": ['70,490,768,100'],
+        "cols": ['364,412,454,496,538,581,623,667,710'],
+    },
+    '262': {  # 262
+        "area": ['70,495,763,95'],
+        "cols": ['376,420,462,504,545,591,633,676,718'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    '264': {  # 264
+        "area": ['70,495,763,95'],
+        "cols": ['370,415,451,491,530,569,609,651,689,729'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+    '266': {  # 266
+        "area": ['70,495,763,95'],
+        "cols": ['355,392,430,467,505,544,580,619,656,695,732'],
+        "rows_to_fix": {
+            3: ['2F Product Uses as Substitutes for Ozone Depleting'],
+        },
+    },
+}
+
+table_defs = {
+    '203': {"template": '203', "entity": "CO2", "unit": "Gg CO2 / yr"},  # CO2
+    '204': {"template": '204', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '205': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '206': {"template": '206', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '207': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '208': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '209': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '210': {"template": '206', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '211': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '212': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '213': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '214': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '215': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '216': {"template": '216', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '217': {"template": '207', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '218': {"template": '205', "entity": "CO2", "unit": "Gg CO2 / yr"},
+    '219': {"template": '219', "entity": "CH4", "unit": "Gg CH4 / yr"},  # CH4
+    '220': {"template": '220', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '221': {"template": '221', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '222': {"template": '222', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '223': {"template": '207', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '224': {"template": '220', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '225': {"template": '225', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '226': {"template": '226', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '227': {"template": '207', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '228': {"template": '220', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '229': {"template": '229', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '230': {"template": '226', "entity": "CH4", "unit": "Gg CH4 / yr"},
+    '231': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},  # N2O
+    '232': {"template": '232', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '233': {"template": '233', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '234': {"template": '234', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '235': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '236': {"template": '236', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '237': {"template": '233', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '238': {"template": '234', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '239': {"template": '207', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '240': {"template": '240', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '241': {"template": '233', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '242': {"template": '234', "entity": "N2O", "unit": "Gg N2O / yr"},
+    '243': {"template": '243', "entity": f"HFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},  # HFCs
+    '244': {"template": '244', "entity": f"HFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '245': {"template": '245', "entity": f"HFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '246': {"template": '245', "entity": f"HFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '247': {"template": '247', "entity": f"HFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '248': {"template": '247', "entity": f"HFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '249': {"template": '203', "entity": f"PFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},  # PFCs
+    '250': {"template": '250', "entity": f"PFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '251': {"template": '207', "entity": f"PFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '252': {"template": '252', "entity": f"PFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '253': {"template": '253', "entity": f"PFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '254': {"template": '254', "entity": f"PFCS ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '255': {"template": '219', "entity": f"SF6 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},  # SF6
+    '256': {"template": '256', "entity": f"SF6 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '257': {"template": '207', "entity": f"SF6 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '258': {"template": '258', "entity": f"SF6 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '259': {"template": '207', "entity": f"SF6 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '260': {"template": '260', "entity": f"SF6 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '261': {"template": '261', "entity": f"NF3 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},  # NF3
+    '262': {"template": '262', "entity": f"NF3 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '263': {"template": '207', "entity": f"NF3 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '264': {"template": '264', "entity": f"NF3 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '265': {"template": '207', "entity": f"NF3 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+    '266': {"template": '266', "entity": f"NF3 ({gwp_to_use})",
+            "unit": "Gg CO2 / yr"},
+}
+
+country_processing_step1 = {
+    'aggregate_cats': {
+        'M.3.C.AG': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
+                                 '3.C.6', '3.C.7', '3.C.8'],
+                     'name': 'Aggregate sources and non-CO2 emissions sources on land '
+                             '(Agriculture)'},
+        'M.3.D.AG': {'sources': ['3.D.2'],
+                     'name': 'Other (Agriculture)'},
+        'M.AG.ELV': {'sources': ['M.3.C.AG', 'M.3.D.AG'],
+                     'name': 'Agriculture excluding livestock'},
+        'M.AG': {'sources': ['3.A', 'M.AG.ELV'],
+                     'name': 'Agriculture'},
+        'M.3.D.LU': {'sources': ['3.D.1'],
+                     'name': 'Other (LULUCF)'},
+        'M.LULUCF': {'sources': ['3.B', 'M.3.D.LU'],
+                     'name': 'LULUCF'},
+        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'],
+                     'name': 'National total emissions excluding LULUCF'},
+    },
+    'basket_copy': {
+        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
+        'entities': ["HFCS", "PFCS"],
+        'source_GWP': gwp_to_use,
+    },
+}
+
+gas_baskets = {
+    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
+    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
+    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
+    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
+    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
+    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
+}

+ 211 - 0
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR3_from_pdf.py

@@ -0,0 +1,211 @@
+# this script reads data from Malaysia's BUR3
+
+import camelot
+import primap2 as pm2
+from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
+
+from UNFCCC_GHG_data.helper import process_data_for_country, fix_rows
+from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
+from config_MYS_BUR3 import coords_cols, coords_defaults, coords_terminologies, \
+    meta_data, add_coords_cols
+from config_MYS_BUR3 import gas_baskets, terminology_proc, country_processing_step1
+from config_MYS_BUR3 import table_def_templates, table_defs, index_cols
+from config_MYS_BUR3 import values_replacement, cat_names_fix, cols_for_space_stripping
+from config_MYS_BUR3 import cat_codes_manual, cats_remove, cat_code_regexp
+
+# ###
+# configuration
+# ###
+input_folder = downloaded_data_path / 'UNFCCC' / 'Malaysia' / 'BUR3'
+output_folder = extracted_data_path / 'UNFCCC' / 'Malaysia'
+if not output_folder.exists():
+    output_folder.mkdir()
+
+pdf_file = "MALAYSIA_BUR3-UNFCCC_Submission.pdf"
+pdf_pages = range(184, 242)
+# CH4: 198 - 211
+# N2O: 212 - 225
+# HFCS: 226 - 228
+# PFCs: 229 - 233
+# SF6: 234 - 237
+# NF3: 238 - 241
+
+output_filename = 'MYS_BUR3_2020_'
+compression = dict(zlib=True, complevel=9)
+
+# ###
+# reading data and aggregation into one dataframe
+# ###
+df_all = None
+for page in pdf_pages:
+    print(f"++++++++++++++++++++++++++++++++")
+    print(f"+++++ Working on page {page} ++++++")
+    print(f"++++++++++++++++++++++++++++++++")
+    page_template_nr = table_defs[str(page)]["template"]
+    area = table_def_templates[page_template_nr]["area"]
+    if "cols" in table_def_templates[page_template_nr].keys():
+        cols = table_def_templates[page_template_nr]["cols"]
+        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
+                                  flavor='stream', table_areas=area, columns=cols,
+                                  split_text=True)
+    else:
+        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
+                                  flavor='stream', table_areas=area)
+
+    df_current = tables[0].df.copy()
+    df_current.iloc[0,0] = 'Categories'
+    df_current.columns = df_current.iloc[0]
+    df_current = df_current.drop(0)
+    # replace double \n
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].str.replace("\n", " ")
+    # replace double and triple spaces
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].str.replace("   ", " ")
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].str.replace("  ", " ")
+
+    # fix the split rows
+    if "rows_to_fix" in table_def_templates[page_template_nr].keys():
+        for n_rows in table_def_templates[page_template_nr]["rows_to_fix"].keys():
+            df_current = fix_rows(df_current,
+                                  table_def_templates[page_template_nr]["rows_to_fix"][
+                                      n_rows], index_cols[0], n_rows)
+
+    # replace category names with typos
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].replace(cat_names_fix)
+
+    # replace empty stings
+    df_current = df_current.replace(values_replacement)
+
+    # add entity and unit information
+    df_current.insert(1, "unit", table_defs[str(page)]["unit"])
+    df_current.insert(1, "entity", table_defs[str(page)]["entity"])
+
+    # set index
+    # df_current = df_current.set_index(index_cols)
+    # strip trailing and leading spaces
+    for col in cols_for_space_stripping:
+        df_current[col] = df_current[col].str.strip()
+
+    # print(df_current.columns.values)
+
+    # aggregate dfs
+    if df_all is None:
+        df_all = df_current
+    else:
+        # find intersecting cols
+        cols_all = df_all.columns.values
+        cols_current = df_current.columns.values
+        cols_both = list(set(cols_all).intersection(set(cols_current)))
+        # print(cols_both)
+        if len(cols_both) > 0:
+            df_all = df_all.merge(df_current, how='outer', on=cols_both,
+                                  suffixes=(None, None))
+        else:
+            df_all = df_all.merge(df_current, how='outer', suffixes=(None, None))
+        df_all = df_all.groupby(index_cols).first().reset_index()
+        # df_all = df_all.join(df_current, how='outer')
+
+# ###
+# conversion to primap2 interchange format
+# ###
+# drop the rows with memo items etc
+for cat in cats_remove:
+    df_all = df_all.drop(df_all[df_all["Categories"] == cat].index)
+# make a copy of the categories row
+df_all["orig_cat_name"] = df_all["Categories"]
+
+# replace cat names by codes in col "Categories"
+# first the manual replacements
+df_all["Categories"] = df_all["Categories"].replace(cat_codes_manual)
+# then the regex repalcements
+repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('code'))
+df_all["Categories"] = df_all["Categories"].str.replace(cat_code_regexp, repl, regex=True)
+
+# make sure all col headers are str
+df_all.columns = df_all.columns.map(str)
+
+# remove thousands separators as pd.to_numeric can't deal with that
+# also replace None with NaN
+year_cols = list(set(df_all.columns) - set(['Categories', 'entity', 'unit', 'orig_cat_name']))
+for col in year_cols:
+    df_all.loc[:, col] = df_all.loc[:, col].str.strip()
+    repl = lambda m: m.group('part1') + m.group('part2')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
+    df_all[col][df_all[col].isnull()] = 'NaN'
+    # manually map code NENO to nan
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('NENO','NaN')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('O NANaN','NaN')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NO','0')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NA NO I','0')
+    # TODO: add code to PRIMAP2
+
+# drop orig_cat_name as it's non-unique per category
+df_all = df_all.drop(columns=["orig_cat_name"])
+
+data_if = pm2.pm2io.convert_wide_dataframe_if(
+    df_all,
+    coords_cols=coords_cols,
+    #add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults,
+    coords_terminologies=coords_terminologies,
+    #coords_value_mapping=coords_value_mapping,
+    #coords_value_filling=coords_value_filling,
+    #filter_remove=filter_remove,
+    #filter_keep=filter_keep,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format="%Y",
+    )
+
+data_pm2 = pm2.pm2io.from_interchange_format(data_if)
+
+data_if = data_pm2.pr.to_interchange_format()
+
+# ###
+# save raw data to IF and native format
+# ###
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
+    data_if)
+
+encoding = {var: compression for var in data_pm2.data_vars}
+data_pm2.pr.to_netcdf(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
+    encoding=encoding)
+
+# ###
+# ## process the data
+# ###
+data_proc_pm2 = data_pm2
+
+# actual processing
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    gas_baskets=gas_baskets,
+    entities_to_ignore=[],
+    processing_info_country=country_processing_step1,
+)
+
+# adapt source and metadata
+current_source = data_proc_pm2.coords["source"].values[0]
+data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
+data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
+
+# ###
+# save data to IF and native format
+# ###
+data_proc_if = data_proc_pm2.pr.to_interchange_format()
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + terminology_proc), data_proc_if)
+
+encoding = {var: compression for var in data_proc_pm2.data_vars}
+data_proc_pm2.pr.to_netcdf(
+    output_folder / (output_filename + terminology_proc + ".nc"),
+    encoding=encoding)

+ 214 - 0
UNFCCC_GHG_data/UNFCCC_reader/Malaysia/read_MYS_BUR4_from_pdf.py

@@ -0,0 +1,214 @@
+# this script reads data from Malaysia's BUR4
+# code ist mostly identical to BUR3
+
+
+import camelot
+import primap2 as pm2
+from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
+
+from UNFCCC_GHG_data.helper import process_data_for_country, fix_rows
+from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
+from config_MYS_BUR4 import coords_cols, coords_defaults, coords_terminologies, \
+    meta_data, add_coords_cols
+from config_MYS_BUR4 import gas_baskets, terminology_proc, country_processing_step1
+from config_MYS_BUR4 import table_def_templates, table_defs, index_cols
+from config_MYS_BUR4 import values_replacement, cat_names_fix, cols_for_space_stripping
+from config_MYS_BUR4 import cat_codes_manual, cats_remove, cat_code_regexp
+
+# ###
+# configuration
+# ###
+input_folder = downloaded_data_path / 'UNFCCC' / 'Malaysia' / 'BUR4'
+output_folder = extracted_data_path / 'UNFCCC' / 'Malaysia'
+if not output_folder.exists():
+    output_folder.mkdir()
+
+pdf_file = "MY_BUR4_2022.pdf"
+pdf_pages = range(203, 267)
+# CO2: 203 - 218
+# CH4: 219 - 230
+# N2O: 231 - 2242
+# HFCS: 243 - 248
+# PFCs: 249 - 254
+# SF6: 255 - 260
+# NF3: 261 - 266
+
+output_filename = 'MYS_BUR4_2022_'
+compression = dict(zlib=True, complevel=9)
+
+# ###
+# reading data and aggregation into one dataframe
+# ###
+df_all = None
+for page in pdf_pages:
+    print(f"++++++++++++++++++++++++++++++++")
+    print(f"+++++ Working on page {page} ++++++")
+    print(f"++++++++++++++++++++++++++++++++")
+    page_template_nr = table_defs[str(page)]["template"]
+    area = table_def_templates[page_template_nr]["area"]
+    if "cols" in table_def_templates[page_template_nr].keys():
+        cols = table_def_templates[page_template_nr]["cols"]
+        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
+                                  flavor='stream', table_areas=area, columns=cols,
+                                  split_text=True)
+    else:
+        tables = camelot.read_pdf(str(input_folder / pdf_file), pages=str(page), \
+                                  flavor='stream', table_areas=area)
+
+    df_current = tables[0].df.copy()
+    df_current.iloc[0,0] = 'Categories'
+    df_current.columns = df_current.iloc[0]
+    df_current = df_current.drop(0)
+    # replace double \n
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].str.replace("\n", " ")
+    # replace double and triple spaces
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].str.replace("   ", " ")
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].str.replace("  ", " ")
+
+    # fix the split rows
+    if "rows_to_fix" in table_def_templates[page_template_nr].keys():
+        for n_rows in table_def_templates[page_template_nr]["rows_to_fix"].keys():
+            df_current = fix_rows(df_current,
+                                  table_def_templates[page_template_nr]["rows_to_fix"][
+                                      n_rows], index_cols[0], n_rows)
+
+    # replace category names with typos
+    df_current[index_cols[0]] = \
+        df_current[index_cols[0]].replace(cat_names_fix)
+
+    # replace empty stings
+    df_current = df_current.replace(values_replacement)
+
+    # add entity and unit information
+    df_current.insert(1, "unit", table_defs[str(page)]["unit"])
+    df_current.insert(1, "entity", table_defs[str(page)]["entity"])
+
+    # set index
+    # df_current = df_current.set_index(index_cols)
+    # strip trailing and leading spaces
+    for col in cols_for_space_stripping:
+        df_current[col] = df_current[col].str.strip()
+
+    # print(df_current.columns.values)
+
+    # aggregate dfs
+    if df_all is None:
+        df_all = df_current
+    else:
+        # find intersecting cols
+        cols_all = df_all.columns.values
+        cols_current = df_current.columns.values
+        cols_both = list(set(cols_all).intersection(set(cols_current)))
+        # print(cols_both)
+        if len(cols_both) > 0:
+            df_all = df_all.merge(df_current, how='outer', on=cols_both,
+                                  suffixes=(None, None))
+        else:
+            df_all = df_all.merge(df_current, how='outer', suffixes=(None, None))
+        df_all = df_all.groupby(index_cols).first().reset_index()
+        # df_all = df_all.join(df_current, how='outer')
+
+# ###
+# conversion to primap2 interchange format
+# ###
+# drop the rows with memo items etc
+for cat in cats_remove:
+    df_all = df_all.drop(df_all[df_all["Categories"] == cat].index)
+# make a copy of the categories row
+df_all["orig_cat_name"] = df_all["Categories"]
+
+# replace cat names by codes in col "Categories"
+# first the manual replacements
+df_all["Categories"] = df_all["Categories"].replace(cat_codes_manual)
+# then the regex repalcements
+repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('code'))
+df_all["Categories"] = df_all["Categories"].str.replace(cat_code_regexp, repl, regex=True)
+
+# make sure all col headers are str
+df_all.columns = df_all.columns.map(str)
+
+# remove thousands separators as pd.to_numeric can't deal with that
+# also replace None with NaN
+year_cols = list(set(df_all.columns) - set(['Categories', 'entity', 'unit', 'orig_cat_name']))
+for col in year_cols:
+    df_all.loc[:, col] = df_all.loc[:, col].str.strip()
+    repl = lambda m: m.group('part1') + m.group('part2')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
+    df_all[col][df_all[col].isnull()] = 'NaN'
+    # manually map code NENO to nan
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('NENO','NaN')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('O NANaN','NaN')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NO','0')
+    df_all.loc[:, col] = df_all.loc[:, col].str.replace('IE NA NO I','0')
+    # TODO: add code to PRIMAP2
+
+# drop orig_cat_name as it's non-unique per category
+df_all = df_all.drop(columns=["orig_cat_name"])
+
+data_if = pm2.pm2io.convert_wide_dataframe_if(
+    df_all,
+    coords_cols=coords_cols,
+    #add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults,
+    coords_terminologies=coords_terminologies,
+    #coords_value_mapping=coords_value_mapping,
+    #coords_value_filling=coords_value_filling,
+    #filter_remove=filter_remove,
+    #filter_keep=filter_keep,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format="%Y",
+    )
+
+data_pm2 = pm2.pm2io.from_interchange_format(data_if)
+
+data_if = data_pm2.pr.to_interchange_format()
+
+# ###
+# save raw data to IF and native format
+# ###
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
+    data_if)
+
+encoding = {var: compression for var in data_pm2.data_vars}
+data_pm2.pr.to_netcdf(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
+    encoding=encoding)
+
+# ###
+# ## process the data
+# ###
+data_proc_pm2 = data_pm2
+
+# actual processing
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    gas_baskets=gas_baskets,
+    entities_to_ignore=[],
+    processing_info_country=country_processing_step1,
+)
+
+# adapt source and metadata
+current_source = data_proc_pm2.coords["source"].values[0]
+data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
+data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
+
+# ###
+# save data to IF and native format
+# ###
+data_proc_if = data_proc_pm2.pr.to_interchange_format()
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + terminology_proc), data_proc_if)
+
+encoding = {var: compression for var in data_proc_pm2.data_vars}
+data_proc_pm2.pr.to_netcdf(
+    output_folder / (output_filename + terminology_proc + ".nc"),
+    encoding=encoding)

+ 433 - 0
UNFCCC_GHG_data/UNFCCC_reader/Nigeria/config_NGA_BUR2.py

@@ -0,0 +1,433 @@
+gwp_to_use = 'AR5GWP100'
+
+tables_trends = {
+    '70': { # GHG by main sector
+        'page': '70',
+        'area': ['177,430,450,142'],
+        'cols': ['208,260,311,355,406'],
+        'coords_defaults': {
+            'entity': f'KYOTOGHG ({gwp_to_use})',
+            'unit': 'GgCO2eq',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        #'remove_cols': ['Per capita emissions (t)',
+        #                'GDP emissions index (Year 2000 = 100)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total emissions': '0',
+                'Energy': '1',
+                'IPPU': '2',
+                'AFOLU': '3',
+                'Waste': '4',
+            },
+        },
+        'label_rows': [0, 1, 2],
+    },
+    '71': { # main gases by sector
+    'page': '71',
+        'area': ['82,760,509,454'],
+        'cols': ['124,186,249,326,388,454'],
+        'coords_defaults': {
+            'category': '0',
+            'unit': 'GgCO2eq',
+        },
+        'coords_cols': {
+            "entity": "Year",
+        },
+        'remove_cols': ['Total GHG emissions (CO₂-eq)',
+                        'Removals (CO₂) (CO₂-eq)',
+                        'CO₂ (Gg)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'entity': {
+                'Net emissions (CO₂-eq)': f'KYOTGHG ({gwp_to_use})',
+                'CO₂ (Gg)': 'CO2 emissions',
+                'CH₄ (CO₂-eq)': f'CH4 ({gwp_to_use})',
+                'N₂O (CO₂-eq)': f'N2O ({gwp_to_use})',
+            },
+        },
+        'label_rows':  [0, 1, 2, 3, 4],
+    },
+    '72_1': { # CO2 by main sector
+    'page': '72',
+        'area': ['122,760,496,472'],
+        'cols': ['159,212,265,311,355,406,456'],
+        'coords_defaults': {
+            'entity': 'CO2',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        'remove_cols': ['Total emissions'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total net emissions': '0',
+                'Energy': '1',
+                'IPPU': '2',
+                'AFOLU - emissions': 'M.3.EMI',
+                'AFOLU - removals': 'M.3.REM',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0, 1, 2],
+    },
+    '72_2': { # CH4 by sector
+    'page': '72',
+        'area': ['133,333,483,41'],
+        'cols': ['172,230,280,333,384,439'],
+        'coords_defaults': {
+            'entity': 'CH4',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        'remove_cols': ['Total (Gg CO₂-eq)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total': '0',
+                'Energy': '1',
+                'IPPU': '2',
+                'AFOLU - emissions': '3',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0, 1, 2],
+    },
+    '73': { # N2O by sector
+    'page': '73',
+        'area': ['155,666,643,364'],
+        'cols': ['194,265,309,366,419'],
+        'coords_defaults': {
+            'entity': 'N2O',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        'remove_cols': ['Total emissions (Gg CO₂-eq)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total': '0',
+                'Energy': '1',
+                'AFOLU': '3',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0, 1, 2],
+    },
+    '74': { # NOx by sector
+    'page': '74',
+        'area': ['148,457,467,166'],
+        'cols': ['190,254,304,359,421'],
+        'coords_defaults': {
+            'entity': 'NOX',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        #'remove_cols': [],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total emissions': '0',
+                'Energy': '1',
+                'IPPU': '2',
+                'AFOLU': '3',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0, 1, 2],
+    },
+    '75': { # CO by sector
+    'page': '75',
+        'area': ['161,763,456,472'],
+        'cols': ['199,256,307,359,410'],
+        'coords_defaults': {
+            'entity': 'CO',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total emissions': '0',
+                'Energy': '1',
+                'IPPU': '2',
+                'AFOLU': '3',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0, 1, 2],
+    },
+    '75_2': { # NMVOC by sector
+    'page': '75',
+        'area': ['177,325,441,50'],
+        'cols': ['219,287,340,395'],
+        'coords_defaults': {
+            'entity': 'NMVOC',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total emissions': '0',
+                'Energy': '1',
+                'IPPU': '2',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0, 1, 2],
+    },
+    '76_1': { # NMVOC by sector
+    'page': '76',
+        'area': ['175,782,448,675'],
+        'cols': ['216,282,340,390'],
+        'coords_defaults': {
+            'entity': 'NMVOC',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total emissions': '0',
+                'Energy': '1',
+                'IPPU': '2',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0, 1, 2],
+    },
+    '76_2': { # SO2 by sector
+    'page': '76',
+        'area': ['197,562,421,226'],
+        'cols': ['243,331,381'],
+        'coords_defaults': {
+            'entity': 'SO2',
+            'unit': 'Gg',
+        },
+        'coords_cols': {
+            "category": "Year",
+        },
+        #'remove_cols': ['Total emissions (Gg CO2-eq)'],
+        'coords_value_mapping': {
+            "unit": "PRIMAP1",
+            'category': {
+                'Total emissions': '0',
+                'Energy': '1',
+                'Waste': '4',
+            },
+        },
+        'label_rows':  [0],
+    },
+}
+
+pages_inventory = {
+    '78': 1,
+    '79': 0,
+    '80': 0,
+    '81': 0,
+    '82': 0,
+}
+
+year_inventory = 2017
+entity_row = 1
+unit_row = 0
+
+
+###
+index_cols = "Categories"
+units_inv = {
+    'Emissions (Gg)': 'Gg',
+    'Emissions CO2 Equivalents (Gg)': 'GgCO2eq',
+}
+# special header as category UNFCCC_GHG_data and name in one column
+header_long = ["category", "entity", "unit", "time", "data"]
+
+
+# manual category codes
+cat_codes_manual = {
+    'Total National Emissions and Removals': '0',
+    'International Bunkers': 'M.BK',
+}
+
+cat_code_regexp = r'(?P<code>^[a-zA-Z0-9\.]{1,9})\s.*'
+
+coords_cols = {
+    "category": "category",
+    "entity": "entity",
+    "unit": "unit",
+}
+
+# add_coords_cols = {
+#     "orig_cat_name": ["orig_cat_name", "category"],
+# }
+
+coords_terminologies = {
+    "area": "ISO3",
+    "category": "IPCC2006_PRIMAP",
+    "scenario": "PRIMAP",
+}
+
+coords_defaults = {
+    "source": "NGA-GHG-Inventory",
+    "provenance": "measured",
+    "area": "NGA",
+    "scenario": "BUR2",
+}
+
+coords_value_mapping = {
+    "unit": "PRIMAP1",
+    "category": "PRIMAP1",
+    "entity": {
+        'Net CO2 (1)(2)': 'CO2',
+        'CH4': f"CH4",
+        'N2O': f"N2O",
+        'HFCs': f"HFCS ({gwp_to_use})",
+        'PFCs': f"PFCS ({gwp_to_use})",
+        'SF6': f"SF6 ({gwp_to_use})",
+        #'NOx': 'NOX',
+        'CO': 'CO', # no mapping, just added for completeness here
+        'NMVOCs': 'NMVOC',
+        'SO2': 'SO2', # no mapping, just added for completeness here
+        'Other halogenated gases with CO2 eq conversion factors (3)':
+            f"UnspMixOfHFCs ({gwp_to_use})",
+    },
+}
+
+
+filter_remove = {
+    'f1': {
+        'entity': ['Other halogenated gases without CO2 eq conversion factors (4)']
+    },
+    'f2': {
+        'category': 'Memo'
+    },
+}
+
+filter_keep = {}
+
+meta_data = {
+    "references": "https://unfccc.int/documents/307085",
+    "rights": "",
+    "contact": "mail@johannes-guestchow.de",
+    "title": "Nigeria. Second Biennial Update Report (BUR2) to the United Nations "
+             "Framework Convention on Climate Change",
+    "comment": "Read fom pdf by Johannes Gütschow",
+    "institution": "UNFCCC",
+}
+
+# convert to mass units where possible
+entities_to_convert_to_mass = [
+    'CH4', 'N2O', 'SF6'
+]
+
+# CO2 equivalents don't make sense for these substances, so unit has to be Gg instead of Gg CO2 equivalents as indicated in the table
+entities_to_fix_unit = [
+    'NOx', 'CO', 'NMVOCs', 'SO2'
+]
+
+### processing
+
+processing_info_step1 = {
+    'aggregate_cats': {
+        '2.F': {'sources': ['2.F.2', '2.F.6'], # all 0, but for completeness
+              'name': 'Product uses as Substitutes for Ozone Depleting Substances'},
+        '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G'],
+              'name': 'IPPU'}, # for HFCs, PFCs, SO2, SF6, N2O (all 0)
+        '3': {'sources': ['M.3.EMI', 'M.3.REM'],
+              'name': 'AFOLU'}, # for CO2
+    },
+}
+
+processing_info_step2 =  {
+    'aggregate_cats': {
+        'M.AG.ELV': {'sources': ['3.C'], 'name': 'Agriculture excluding livestock emissions'},
+        'M.AG': {'sources': ['M.AG.ELV', '3.A'], 'name': 'Agriculture'},
+        'M.LULUCF': {'sources': ['3.B', '3.D'],
+                     'name': 'Land Use, Land Use Change, and Forestry'},
+        'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'], 'name': 'National Total Excluding LULUCF'},
+        '0': {'sources': ['1', '2', '3', '4', '5'], 'name': 'National Total'},
+    },
+    'downscale': {
+        'sectors': {
+            '1': {
+                'basket': '1',
+                'basket_contents': ['1.A', '1.B', '1.C'],
+                'entities': ['CO2', 'N2O', 'CH4'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            '1.A': {
+                'basket': '1.A',
+                'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
+                'entities': ['CO2', 'N2O', 'CH4'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            '1.B': {
+                'basket': '1.B',
+                'basket_contents': ['1.B.1', '1.B.2', '1.B.3'],
+                'entities': ['CO2', 'N2O', 'CH4'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            'IPPU': {
+                'basket': '2',
+                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.E',
+                                    '2.F', '2.G', '2.H'],
+                'entities': ['CO2', 'N2O', 'CH4'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            '3': {
+                'basket': '3',
+                'basket_contents': ['3.A', '3.B', '3.C', '3.D'],
+                'entities': ['CO2', 'CH4', 'N2O'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            # '3A': {
+            #     'basket': '3.A',
+            #     'basket_contents': ['3.A.1', '3.A.2'],
+            #     'entities': ['CH4', 'N2O'],
+            #     'dim': 'category (IPCC2006_PRIMAP)',
+            # },
+            # '3C': {
+            #     'basket': '3.C',
+            #     'basket_contents': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
+            #                         '3.C.6', '3.C.7', '3.C.8'],
+            #     'entities': ['CO2', 'CH4', 'N2O'],
+            #     'dim': 'category (IPCC2006_PRIMAP)',
+            # },
+            # '3D': {
+            #     'basket': '3.D',
+            #     'basket_contents': ['3.D.1', '3.D.2'],
+            #     'entities': ['CO2', 'CH4', 'N2O'],
+            #     'dim': 'category (IPCC2006_PRIMAP)',
+            # },
+        },
+    },
+    'remove_ts': {
+        'fgases': { # unnecessary and complicates aggregation for
+            # other gases
+            'category': ['5'],
+            'entities': [f'HFCS ({gwp_to_use})', f'PFCS ({gwp_to_use})', 'SF6',
+                         f'UnspMixOfHFCs ({gwp_to_use})'],
+        },
+    },
+}

+ 228 - 0
UNFCCC_GHG_data/UNFCCC_reader/Nigeria/read_NGA_BUR2_from_pdf.py

@@ -0,0 +1,228 @@
+# this script reads data from Nigeria's BUR2
+# Data is read from the pdf file
+
+import pandas as pd
+import primap2 as pm2
+import numpy as np
+import camelot
+import locale
+from copy import deepcopy
+from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
+from UNFCCC_GHG_data.helper import process_data_for_country, gas_baskets
+from config_NGA_BUR2 import tables_trends
+from config_NGA_BUR2 import pages_inventory, year_inventory, entity_row, unit_row, \
+   index_cols, header_long, units_inv
+from config_NGA_BUR2 import cat_code_regexp, cat_codes_manual
+from config_NGA_BUR2 import coords_cols, coords_defaults, coords_terminologies, \
+    coords_value_mapping, meta_data, filter_remove #, add_coords_cols
+from config_NGA_BUR2 import processing_info_step1, processing_info_step2
+
+# ###
+# configuration
+# ###
+# define locale to use for str to float conversion
+locale_to_use = 'en_NG.UTF-8'
+locale.setlocale(locale.LC_NUMERIC, locale_to_use)
+
+input_folder = downloaded_data_path / 'UNFCCC' / 'Nigeria' / 'BUR2'
+output_folder = extracted_data_path / 'UNFCCC' / 'Nigeria'
+if not output_folder.exists():
+   output_folder.mkdir()
+
+output_filename = 'NGA_BUR2_2021_'
+compression = dict(zlib=True, complevel=9)
+inventory_file = 'NIGERIA_BUR_2_-_Second_Biennial_Update_Report_%28BUR2%29.pdf'
+
+## read 2019 inventory
+df_inventory = None
+for page in pages_inventory.keys():
+    tables = camelot.read_pdf(str(input_folder / inventory_file), pages=str(page),
+                              flavor='lattice')
+    df_this_table = tables[pages_inventory[page]].df
+    # replace line breaks, double, and triple spaces in category names
+    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("\n", " ")
+    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("   ", " ")
+    df_this_table.iloc[:, 0] = df_this_table.iloc[:, 0].str.replace("  ", " ")
+    # replace line breaks in units and entities
+    df_this_table.iloc[entity_row] = df_this_table.iloc[entity_row].str.replace('\n',
+                                                                                '')
+    df_this_table.iloc[unit_row] = df_this_table.iloc[unit_row].str.replace('\n', '')
+
+    # fillna in unit row
+    df_this_table.iloc[unit_row][df_this_table.iloc[unit_row]==""] = np.nan
+    df_this_table.iloc[unit_row] = df_this_table.iloc[unit_row].fillna(
+        method='ffill')
+    df_this_table = pm2.pm2io.nir_add_unit_information(df_this_table, unit_row=unit_row,
+                                                       entity_row=entity_row,
+                                                       regexp_entity=".*",
+                                                       manual_repl_unit=units_inv,
+                                                       default_unit="")
+
+    # set index and convert to long format
+    df_this_table = df_this_table.set_index(index_cols)
+    df_this_table_long = pm2.pm2io.nir_convert_df_to_long(df_this_table, year_inventory,
+                                                          header_long)
+
+    # combine with tables for other sectors (merge not append)
+    if df_inventory is None:
+        df_inventory = df_this_table_long
+    else:
+        df_inventory = pd.concat([df_inventory, df_this_table_long], axis=0, join='outer')
+
+# replace cat names by codes in col "category"
+# first the manual replacements
+df_inventory["category"] = df_inventory["category"].replace(cat_codes_manual)
+# then the regex replacements
+repl = lambda m: m.group('code')
+df_inventory["category"] = df_inventory["category"].str.replace(cat_code_regexp, repl, regex=True)
+df_inventory = df_inventory.reset_index(drop=True)
+
+# ###
+# convert to PRIMAP2 interchange format
+# ###
+data_inv_if = pm2.pm2io.convert_long_dataframe_if(
+    df_inventory,
+    coords_cols=coords_cols,
+    #add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults,
+    coords_terminologies=coords_terminologies,
+    coords_value_mapping=coords_value_mapping,
+    filter_remove=filter_remove,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format='%Y',
+    )
+
+data_inv_pm2 = pm2.pm2io.from_interchange_format(data_inv_if)
+
+## trend tables
+data_trend_pm2 = None
+for table in tables_trends.keys():
+    print(table)
+    current_table = deepcopy(tables_trends[table])
+    tables = camelot.read_pdf(str(input_folder / inventory_file),
+                              pages=current_table["page"],
+                              table_areas=current_table["area"],
+                              columns=current_table["cols"],
+                              flavor='stream',
+                              split_text=True)
+    df_this_table = tables[0].df
+
+    # merge rows for entity and unit
+    rows_to_merge = df_this_table.iloc[current_table["label_rows"]]
+    indices_to_merge = rows_to_merge.index
+    # join the three rows
+    new_row = rows_to_merge.agg(' '.join)
+    df_this_table.loc[indices_to_merge[0]] = new_row
+    df_this_table = df_this_table.drop(indices_to_merge)
+    new_row = new_row.str.replace("  ", " ")
+    new_row = new_row.str.replace("   ", " ")
+    new_row = new_row.str.strip()
+
+    df_this_table.columns = new_row
+
+    # remove columns not needed
+    if 'remove_cols' in current_table.keys():
+        df_this_table = df_this_table.drop(columns=current_table["remove_cols"])
+
+    df_this_table = df_this_table.set_index("Year")
+
+    # transpose to wide format
+    df_this_table = df_this_table.transpose()
+
+    # remove "," (thousand sep) from data
+    for col in df_this_table.columns:
+        df_this_table.loc[:, col] = df_this_table.loc[:, col].str.strip()
+        repl = lambda m: m.group('part1') + m.group('part2')
+        df_this_table.loc[:, col] = df_this_table.loc[:, col].str.replace(
+            '(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
+        df_this_table[col][df_this_table[col].isnull()] = 'NaN'
+
+
+    # metadta in forst col instread of index
+    df_this_table = df_this_table.reset_index()
+    df_this_table = df_this_table.rename(columns={"index": "Year"})
+
+    # make sure we have str not a number format for the dates
+    df_this_table.columns = df_this_table.columns.map(str)
+
+    current_table["coords_defaults"].update(coords_defaults)
+    # convert to interchange format
+    data_current_if = pm2.pm2io.convert_wide_dataframe_if(
+        df_this_table,
+        coords_cols=current_table["coords_cols"],
+        coords_defaults=current_table["coords_defaults"],
+        coords_terminologies=coords_terminologies,
+        coords_value_mapping=current_table["coords_value_mapping"],
+        meta_data=meta_data,
+        convert_str=True,
+        time_format='%Y',
+    )
+# todo: convert to native format before merge
+    data_current_pm2 = pm2.pm2io.from_interchange_format(data_current_if)
+    if data_trend_pm2 is None:
+        data_trend_pm2 = data_current_pm2
+    else:
+        data_trend_pm2 = data_trend_pm2.pr.merge(data_current_pm2)
+
+data_pm2 = data_inv_pm2.pr.merge(data_trend_pm2, tolerance=0.05) # some rounding in
+# trends needs higher tolerance
+
+data_if = data_pm2.pr.to_interchange_format()
+
+# ###
+# save raw data to IF and native format
+# ###
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
+    data_if)
+
+encoding = {var: compression for var in data_pm2.data_vars}
+data_pm2.pr.to_netcdf(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
+    encoding=encoding)
+
+
+#### processing
+data_proc_pm2 = data_pm2
+terminology_proc = coords_terminologies["category"]
+
+# actual processing
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=[],
+    gas_baskets={},
+    processing_info_country=processing_info_step1,
+)
+
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=[],
+    gas_baskets=gas_baskets,
+    processing_info_country=processing_info_step2,
+    cat_terminology_out = terminology_proc,
+    #category_conversion = None,
+    #sectors_out = None,
+)
+
+# adapt source and metadata
+# TODO: processing info is present twice
+current_source = data_proc_pm2.coords["source"].values[0]
+data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
+data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
+
+# ###
+# save data to IF and native format
+# ###
+data_proc_if = data_proc_pm2.pr.to_interchange_format()
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + terminology_proc), data_proc_if)
+
+encoding = {var: compression for var in data_proc_pm2.data_vars}
+data_proc_pm2.pr.to_netcdf(
+    output_folder / (output_filename + terminology_proc + ".nc"),
+    encoding=encoding)

+ 493 - 0
UNFCCC_GHG_data/UNFCCC_reader/Singapore/config_SGP_BUR5.py

@@ -0,0 +1,493 @@
+table_def_templates = {
+    '66_1': {  # 66
+        "area": ['68,743,522,157'],
+        "cols": ['224,280,319,359,399,445,481'],
+        "rows_to_fix": {
+            # 2: ['and Sink Categories',],
+            3: ['1A2 Manufacturing Industries',
+                '1B3 Other Emissions from', '1C - Carbon Dioxide Transport',
+                '2 — INDUSTRIAL PROCESSES AND', '2D - Non-Energy Products from',
+                '2F - Product Uses as Substitutes for',
+                '2G - Other Product Manufacture'],
+        },
+    },
+    '66_2': {  # 66
+        "area": ['671,744,1117,265'],
+        "cols": ['824,875,912,954,996,1040,1082'],
+        "rows_to_fix": {
+            3: ['3 — AGRICULTURE, FORESTRY AND', '3C - Aggregate Sources and Non-CO2',
+                '4C - Incineration and Open Burning',
+                '4D -  Wastewater Treatment',
+                '5A - Indirect N2O emissions from the', 'CO2 from Biomass Combustion',
+                ],
+        },
+    },
+    '67_1': {  # 67
+        "area": ['70,727,554,159'],
+        "cols": ['207,254,291,319,356,400,442,468,503'],
+        "rows_to_fix": {
+            2: ['2 — INDUSTRIAL PROCESSES', '2A4 Other Process Uses',
+                '2B4 Caprolactam, Glyoxal and', '2B8 Petrochemical and',
+                ],
+            3: ['Total National Emissions',
+                ],
+        },
+    },
+    '67_2': {  # 67
+        "area": ['666,725,1150,119'],
+        "cols": ['801,847,889,915,952,996,1036,1063,1098'],
+        "rows_to_fix": {
+            2: ['2D - Non-Energy Products from', '2G - Other Product',
+                '2G2 SF6 and PFCs from', '2H2 Food and Beverages',
+                ],
+            3: ['Total National Emissions', '2E1 Integrated Circuit',
+                '2F - Product Uses as Substitutes for', '2F1 Refrigeration and',
+                ],
+        },
+    },
+    '68_1': {  # 68
+        "area": ['66,787,524,217'],
+        "cols": ['205,261,315,366,415,473'],
+        "rows_to_fix": {
+            2: ['2 — INDUSTRIAL PROCESSES', '2A4 Other Process Uses',
+                '2B4 Caprolactam, Glyoxal and', '2B8 Petrochemical and',
+                ],
+            3: ['Total National Emissions',
+                ],
+        },
+    },
+    '68_2': {  # 68
+        "area": ['666,787,1119,180'],
+        "cols": ['808,854,910,961,1017,1066'],
+        "rows_to_fix": {
+            2: ['2D - Non-Energy Products from',
+                '2F - Product Uses as Substitutes for', '2F1 Refrigeration and Air',
+                '2G2 SF6 and PFCs from Other', '2H2 Food and Beverages',
+                ],
+            3: ['Total National Emissions', '2E1 Integrated Circuit or',
+                '2G - Other Product Manufacture',
+                ],
+        },
+    },
+    '84_1': {  # 84
+        "area": ['70,667,525,112'],
+        "cols": ['193,291,345,396,440,480'],
+        "rows_to_fix": {},
+    },
+    '84_2': {  # 84
+        "area": ['668,667,1115,83'],
+        "cols": ['854,908,954,1001,1038,1073'],
+        "rows_to_fix": { },
+    },
+    '85_1': {  # 85
+        "area": ['70,680,531,170'],
+        "cols": ['275,328,375,414,456,489'],
+        "rows_to_fix": {},
+    },
+    '85_2': {  # 85
+        "area": ['663,675,1117,175'],
+        "cols": ['849,908,954,1001,1045,1073'],
+        "rows_to_fix": {
+            3: ['3C — Aggregate Sources and Non-CO2',
+                '3C4 - Direct N2O Emissions from', '3C5 - Indirect N2O Emissions from',
+                '3C6 - Indirect N2O Emissions from']
+        },
+    },
+    '92': {  # 92
+        "area": ['72,672,514,333'],
+        "cols": ['228,275,319,361,398,438,489'],
+        "rows_to_fix": {
+            3: ['4A1 Managed Waste',
+                '4A2 Unmanaged Waste', '4A3 Uncategorised Waste',
+                '4C - Incineration and', '4D - Wastewater Treatment',
+                '4D1 Domestic Wastewater', '4D2 Industrial Wastewater']
+        },
+    },
+    '95_1': {  # 95
+        "area": ['70,731,507,149'],
+        "cols": ['233,307,375,452'],
+        "drop_rows": [0, 1, 2, 3],
+        "rows_to_fix": {
+            3: ['Total (Net)', '1A2 Manufacturing Industries',
+                '2 — INDUSTRIAL PROCESSES', '3 — AGRICULTURE, FORESTRY',
+                '3C - Aggregate Sources and Non-CO2', '4C - Incineration and Open',
+                'Clinical Waste', '4D - Wastewater Treatment',
+                'CO2 from Biomass Combustion for']
+        },
+        "header": {
+            'entity': ['Greenhouse Gas Source and Sink Categories',
+                       'Net CO2', 'CH4', 'N2O', 'HFCs'],
+            'unit': ['', 'Gg', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq'],
+        },
+    },
+    '95_2': {  # 95
+        "area": ['666,731,1103,149'],
+        "cols": ['829,903,971,1048'],
+        "drop_rows": [0, 1, 2, 3, 4, 5],
+        "rows_to_fix": {
+            3: ['Total (Net)', '1A2 Manufacturing Industries',
+                '2 — INDUSTRIAL PROCESSES', '3 — AGRICULTURE, FORESTRY',
+                '3C - Aggregate Sources and Non-CO2', '4C - Incineration and Open',
+                'Clinical Waste', '4D - Wastewater Treatment',
+                'CO2 from Biomass Combustion for']
+        },
+        "header": {
+            'entity': ['Greenhouse Gas Source and Sink Categories',
+                       'PFCs', 'SF6', 'NF3', 'Total (Net) National Emissions'],
+            'unit': ['', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq'],
+        },
+    },
+}
+
+table_defs = {
+    '66': {
+        "templates": ['66_1', '66_2'],
+        # "header_rows": [0, 1],
+        "header": {
+            'entity': ['Greenhouse Gas Source and Sink Categories', 'Net CO2',
+                       'CH4', 'N2O', 'HFCs', 'PFCs', 'SF6', 'NF3'],
+            'unit': ['', 'Gg', 'Gg', 'Gg', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq', 'GgCO2eq'],
+        },
+        "drop_rows": [0, 1, 2, 3],
+        # "drop_cols": ['NF3', 'SF6'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2018,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "2018",
+    },
+    '67': {
+        "templates": ['67_1', '67_2'],
+        "header": {
+            'entity': ['Greenhouse Gas Source and Sink Categories', 'HFC-23', 'HFC-32',
+                       'HFC-41', 'HFC-125', 'HFC-134a', 'HFC-143a', 'HFC-152a',
+                       'HFC-227ea', 'HFC-43-10mee'],
+            'unit': ['', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg'],
+        },
+        "drop_rows": [0, 1, 2, 3],
+        # "drop_cols": ['NF3', 'SF6'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2018,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "2018_fgases",
+    },
+    '68': {
+        "templates": ['68_1', '68_2'],
+        "header": {
+            'entity': ['Greenhouse Gas Source and Sink Categories', 'PFC-14',
+                       'PFC-116', 'PFC-218', 'PFC-318', 'SF6', 'NF3'],
+            'unit': ['', 'kg', 'kg', 'kg', 'kg', 'kg', 'kg'],
+        },
+        "drop_rows": [0, 1, 2],
+         "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2018,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "2018_fgases",
+    },
+    '84': {
+        "templates": ['84_1', '84_2'],
+        "header": {
+            'entity': ['Categories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'NMVOC'],
+            'unit': ['', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg'],
+        },
+        "drop_rows": [0, 1, 2, 3, 4, 5],
+        "category_col": "Categories",
+        "year": 2018,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "2018",
+    },
+    '85': {
+        "templates": ['85_1', '85_2'],
+        "header": {
+            'entity': ['Categories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'NMVOC'],
+            'unit': ['', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg'],
+        },
+        "drop_rows": [0, 1, 2, 3, 4, 5],
+        "category_col": "Categories",
+        "year": 2018,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "2018",
+    },
+    '92': {
+        "templates": ['92'],
+        "header": {
+            'entity': ['Categories', 'CO2', 'CH4', 'N2O', 'NOx', 'CO', 'NMVOC', 'SO2'],
+            'unit': ['', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg', 'Gg'],
+        },
+        "drop_rows": [0, 1, 2],
+        "category_col": "Categories",
+        "year": 2018,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "2018",
+    },
+    '95': {
+        "templates": ['95_1', '95_2'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2016,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "other",
+    },
+    '96': {
+        "templates": ['95_1', '95_2'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2014,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "other",
+    },
+    '97': {
+        "templates": ['95_1', '95_2'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2012,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "other",
+    },
+    '98': {
+        "templates": ['95_1', '95_2'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2010,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "other",
+    },
+    '99': {
+        "templates": ['95_1', '95_2'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 2000,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "other",
+    },
+    '100': {
+        "templates": ['95_1', '95_2'],
+        "category_col": "Greenhouse Gas Source and Sink Categories",
+        "year": 1994,
+        # "unit_info": unit_info_2018,
+        "coords_value_mapping": "other",
+    },
+}
+
+cat_names_fix = {
+    '14Ab Residential': '1A4b Residential',
+}
+
+values_replacement = {
+#    '': '-',
+    ' ': '',
+}
+
+gwp_to_use = "AR5GWP100"
+
+index_cols = ["orig_cat_name"]
+cols_for_space_stripping = index_cols
+
+unit_row = "header"
+
+## parameters part 2: conversion to PRIMAP2 interchnage format
+
+cats_remove = ['Information items']
+
+cat_codes_manual = {
+    'CO2 from Biomass Combustion for Energy Production': 'M.BIO',
+    'Total National Emissions and Removals': '0',
+    'Total (Net) National Emissions': '0',
+    'Clinical Waste Incineration': 'M.4.C.1',
+    'Hazardous Waste Incineration': 'M.4.C.2',
+    #'3 AGRICULTURE': 'M.AG',
+    '3 AGRICULTURE, FORESTRY AND OTHER LAND USE': '3',
+    #'3 LAND USE, LAND-USE CHANGE AND FORESTRY': 'M.LULUCF',
+}
+
+
+cat_code_regexp = r'(?P<code>^[A-Za-z0-9]{1,7})\s.*'
+
+# special header as category code and name in one column
+header_long = ["orig_cat_name", "entity", "unit", "time", "data"]
+
+coords_terminologies = {
+    "area": "ISO3",
+    "category": "IPCC2006_PRIMAP", #two extra categories
+    "scenario": "PRIMAP",
+}
+
+coords_defaults = {
+    "source": "SGP-GHG-inventory ",
+    "provenance": "measured",
+    "area": "SGP",
+    "scenario": "BUR5"
+}
+
+coords_value_mapping = {
+    "2018": {
+        "unit": "PRIMAP1",
+        "entity": {
+            'HFCs': f'HFCS ({gwp_to_use})',
+            'PFCs': f'PFCS ({gwp_to_use})',
+            'CH4': 'CH4',
+            'N2O': 'N2O',
+            'NF3': f'NF3 ({gwp_to_use})',
+            'Net CO2': 'CO2',
+            'SF6': f'SF6 ({gwp_to_use})',
+            'Total (Net) National Emissions': 'KYOTOGHG (AR5GWP100)',
+        },
+    },
+    "2018_fgases": {
+        "unit": "PRIMAP1",
+        "entity": {
+            'HFC-125': 'HFC125',
+            'HFC-134a': 'HFC134a',
+            'HFC-143a': 'HFC143a',
+            'HFC-152a': 'HFC152a',
+            'HFC-227ea': 'HFC227ea',
+            'HFC-23': 'HFC23',
+            'HFC-32': 'HFC32',
+            'HFC-41': 'HFC41',
+            'HFC-43-10mee': 'HFC4310mee',
+            'NF3': 'NF3',
+            'PFC-116': 'C2F6',
+            'PFC-14': 'CF4',
+            'PFC-218': 'C3F8',
+            'PFC-318': 'cC4F8',
+            'SF6': 'SF6',
+        },
+    },
+    "other": {
+        "unit": "PRIMAP1",
+        "entity": {
+            'HFCs': f'HFCS ({gwp_to_use})',
+            'CH4': f'CH4 ({gwp_to_use})',
+            'N2O': f'N2O ({gwp_to_use})',
+            'NF3': f'NF3 ({gwp_to_use})',
+            'Net CO2': 'CO2',
+            'PFCs': f'PFCS ({gwp_to_use})',
+            'SF6': f'SF6 ({gwp_to_use})',
+            'Total (Net) National Emissions': f'KYOTOGHG ({gwp_to_use})',
+        },
+    },
+}
+
+coords_cols = {
+    "category": "category",
+    "entity": "entity",
+    "unit": "unit"
+}
+
+add_coords_cols = {
+    "orig_cat_name": ["orig_cat_name", "category"],
+}
+
+filter_remove = {
+    # "f1" :{
+    #     "entity": ["HFC-125", "HFC-134a", "HFC-143a", "HFC-152a", "HFC-227ea",
+    #                "HFC-23", "HFC-32", "HFC-41", "HFC-43-10mee", "PFC-116",
+    #                "PFC-14", "PFC-218", "PFC-318", "NF3", "SF6"],
+    #     "category": "2"
+    # }
+}
+
+meta_data = {
+    "references": "https://unfccc.int/documents/621650",
+    "rights": "",
+    "contact": "mail@johannes-guetschow.de",
+    "title": "Singapore's Fifth National Communication and Fifth Biannial Update "
+             "Report",
+    "comment": "Read fom pdf file by Johannes Gütschow",
+    "institution": "United Nations Framework Convention on Climate Change (UNFCCC)",
+}
+
+
+## processing
+aggregate_sectors = {
+    '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G', '2.H'],
+          'name': 'IPPU'},
+    'M.3.C.1.AG': {'sources': ['3.C.1.b', '3.C.1.c'], 'name': 'Emissions from Biomass Burning (Agriculture)'},
+    'M.3.C.1.LU': {'sources': ['3.C.1.a', '3.C.1.d'], 'name': 'Emissions from Biomass Burning (LULUCF)'},
+    'M.3.C.AG': {'sources': ['M.3.C.1.AG', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
+                             '3.C.6', '3.C.7', '3.C.8'],
+                 'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
+    'M.AG.ELV': {'sources': ['M.3.C.AG'], 'name': 'Agriculture excluding livestock emissions'},
+    'M.AG': {'sources': ['M.AG.ELV', '3.A'], 'name': 'Agriculture'},
+    'M.LULUCF': {'sources': ['M.3.C.1.LU', '3.B', '3.D'],
+                 'name': 'Land Use, Land Use Change, and Forestry'},
+    'M.0.EL': {'sources': ['1', '2', 'M.AG', '4', '5'], 'name': 'National Total Excluding LULUCF'},
+    '0': {'sources': ['1', '2', '3', '4', '5'], 'name': 'National Total'},
+}
+
+
+processing_info_step1 = {
+    # aggregate IPPU which is missing for individual fgases so it can be used in the
+    # next step (downscaling)
+    'aggregate_cats': {
+        '2': {'sources': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G', '2.H'],
+              'name': 'IPPU'},
+    },
+    'tolerance': 1, # because ch4 is inconsistent
+}
+
+processing_info_step2 =  {
+    'aggregate_cats': aggregate_sectors,
+    'downscale': {
+        'sectors': {
+            'IPPU': {
+                'basket': '2',
+                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.E',
+                                    '2.F', '2.G', '2.H'],
+                'entities': ['CO2', 'N2O', f'PFCS ({gwp_to_use})',
+                             f'HFCS ({gwp_to_use})', 'SF6', 'NF3'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            # AFOLU downscaling. Most is zero anyway
+            '3C': {
+                'basket': '3.C',
+                'basket_contents': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5',
+                                    '3.C.6', '3.C.7', '3.C.8'],
+                'entities': ['CO2', 'CH4', 'N2O'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            '3C1': {
+                'basket': '3.C.1',
+                'basket_contents': ['3.C.1.a', '3.C.1.b', '3.C.1.c', '3.C.1.d'],
+                'entities': ['CO2', 'CH4', 'N2O'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+            '3D': {
+                'basket': '3.D',
+                'basket_contents': ['3.D.1', '3.D.2'],
+                'entities': ['CO2', 'CH4', 'N2O'],
+                'dim': 'category (IPCC2006_PRIMAP)',
+            },
+        },
+        'entities': {
+            'HFCS': {
+                'basket': f'HFCS ({gwp_to_use})',
+                'basket_contents': ['HFC125', 'HFC134a', 'HFC143a', 'HFC23',
+                                    'HFC32', 'HFC4310mee', 'HFC227ea'],
+                'sel': {'category (IPCC2006_PRIMAP)':
+                            ['0', '2', '2.C', '2.E',
+                             '2.F', '2.G', '2.H']},
+            },
+            'PFCS': {
+                'basket': f'PFCS ({gwp_to_use})',
+                'basket_contents': ['C2F6', 'C3F8', 'CF4', 'cC4F8'],
+                'sel': {'category (IPCC2006_PRIMAP)':
+                            ['0', '2', '2.C', '2.E',
+                             '2.F', '2.G', '2.H']},
+            },
+        }
+    },
+    'remove_ts': {
+        'fgases': { # unnecessary and complicates aggregation for
+            # other gases
+            'category': ['5', '5.B'],
+            'entities': [f'HFCS ({gwp_to_use})', f'PFCS ({gwp_to_use})', 'SF6', 'NF3'],
+        },
+        'CH4': { # inconsistent with IPPU sector
+            'category': ['2.A', '2.B', '2.C', '2.D', '2.E', '2.F', '2.G', '2.H'],
+            'entities': ['CH4'],
+        },
+    },
+    # 'basket_copy': {
+    #     'GWPs_to_add': ["SARGWP100", "AR4GWP100", "AR6GWP100"],
+    #     'entities': ["HFCS", "PFCS"],
+    #     'source_GWP': gwp_to_use,
+    # },
+}
+
+
+

+ 260 - 0
UNFCCC_GHG_data/UNFCCC_reader/Singapore/read_SGP_BUR5_from_pdf.py

@@ -0,0 +1,260 @@
+# read Singapore fifth BUR from pdf
+
+
+import camelot
+import primap2 as pm2
+import pandas as pd
+#import numpy as np
+from pathlib import Path
+import locale
+
+from UNFCCC_GHG_data.helper import process_data_for_country, gas_baskets
+from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
+from UNFCCC_GHG_data.helper import fix_rows
+from primap2.pm2io._conversion import convert_ipcc_code_primap_to_primap2
+from config_SGP_BUR5 import table_def_templates, table_defs, index_cols
+from config_SGP_BUR5 import values_replacement, header_long, cats_remove, \
+    cat_codes_manual, cat_code_regexp, cat_names_fix
+from config_SGP_BUR5 import coords_cols, coords_terminologies, coords_defaults, \
+    coords_value_mapping, meta_data, add_coords_cols, filter_remove
+from config_SGP_BUR5 import processing_info_step1, processing_info_step2
+
+### genral configuration
+input_folder = downloaded_data_path / 'UNFCCC' / 'Singapore' / 'BUR5'
+output_folder = extracted_data_path / 'UNFCCC' / 'Singapore'
+if not output_folder.exists():
+    output_folder.mkdir()
+
+output_filename = 'SGP_BUR5_2022_'
+inventory_file_pdf = 'Singapore_-_NC5BUR5.pdf'
+#years_to_read = range(1990, 2018 + 1)
+
+# define locale to use for str to float conversion
+locale_to_use = 'en_SG.UTF-8'
+locale.setlocale(locale.LC_NUMERIC, locale_to_use)
+
+pagesToRead = table_defs.keys()
+
+compression = dict(zlib=True, complevel=9)
+
+## part 1: read the data from pdf
+### part 1.a: 2016 inventory
+
+data_pm2 = None
+for page in pagesToRead:
+    print(f"++++++++++++++++++++++++++++++++")
+    print(f"+++++ Working on page {page} ++++++")
+    print(f"++++++++++++++++++++++++++++++++")
+
+    df_this_page = None
+    for table_on_page in table_defs[page]["templates"]:
+        print(f"Reading table {table_on_page}")
+        area = table_def_templates[table_on_page]["area"]
+        cols = table_def_templates[table_on_page]["cols"]
+        tables = camelot.read_pdf(str(input_folder / inventory_file_pdf),
+                                  pages=str(page), flavor='stream',
+                                  table_areas=area, columns=cols, split_text=True)
+
+        df_current = tables[0].df.copy(deep=True)
+        # drop the old header
+        if "drop_rows" in table_defs[page].keys():
+            df_current = df_current.drop(table_defs[page]["drop_rows"])
+        elif "drop_rows" in table_def_templates[table_on_page].keys():
+            df_current = df_current.drop(
+                table_def_templates[table_on_page]["drop_rows"])
+        # add new header
+        if 'header' in table_defs[page].keys():
+            df_current.columns = pd.MultiIndex.from_tuples(
+                zip(table_defs[page]['header']['entity'],
+                    table_defs[page]['header']['unit']))
+        else:
+            df_current.columns = pd.MultiIndex.from_tuples(
+                zip(table_def_templates[table_on_page]['header']['entity'],
+                    table_def_templates[table_on_page]['header']['unit']))
+
+        # drop cols if necessary
+        if "drop_cols" in table_defs[page].keys():
+            # print(df_current.columns.values)
+            df_current = df_current.drop(columns=table_defs[page]["drop_cols"])
+        elif "drop_cols" in table_def_templates[table_on_page].keys():
+            df_current = df_current.drop(columns=table_defs[page]["drop_cols"])
+
+        # rename category column
+        df_current.rename(columns={table_defs[page]["category_col"]: index_cols[0]},
+                          inplace=True)
+
+        # replace double \n
+        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("\n", " ")
+        # replace double and triple spaces
+        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("   ", " ")
+        df_current[index_cols[0]] = df_current[index_cols[0]].str.replace("  ", " ")
+
+        # fix the split rows
+        for n_rows in table_def_templates[table_on_page]["rows_to_fix"].keys():
+            df_current = fix_rows(df_current,
+                                  table_def_templates[table_on_page]["rows_to_fix"][
+                                      n_rows], index_cols[0], n_rows)
+
+        # replace category names with typos
+        df_current[index_cols[0]] = df_current[index_cols[0]].replace(cat_names_fix)
+
+        # replace empty stings
+        df_current = df_current.replace(values_replacement)
+
+        # set index
+        # df_current = df_current.set_index(index_cols)
+        # strip trailing and leading  and remove "^"
+        for col in df_current.columns.values:
+            df_current[col] = df_current[col].str.strip()
+            df_current[col] = df_current[col].str.replace("^", "")
+
+        # print(df_current)
+        # aggregate dfs for this page
+        if df_this_page is None:
+            df_this_page = df_current.copy(deep=True)
+        else:
+            # find intersecting cols
+            cols_this_page = df_this_page.columns.values
+            # print(f"cols this page: {cols_this_page}")
+            cols_current = df_current.columns.values
+            # print(f"cols current: {cols_current}")
+            cols_both = list(set(cols_this_page).intersection(set(cols_current)))
+            # print(f"cols both: {cols_both}")
+            if len(cols_both) > 0:
+                df_this_page = df_this_page.merge(df_current, how='outer', on=cols_both,
+                                                  suffixes=(None, None))
+            else:
+                df_this_page = df_this_page.merge(df_current, how='outer',
+                                                  left_index=True, right_index=True,
+                                                  suffixes=(None, None))
+
+            df_this_page = df_this_page.groupby(index_cols).first().reset_index()
+            # print(df_this_page)
+            # df_all = df_all.join(df_current, how='outer')
+
+    # set index and convert to long format
+    df_this_page = df_this_page.set_index(index_cols)
+    df_this_page_long = pm2.pm2io.nir_convert_df_to_long(df_this_page,
+                                                         table_defs[page]["year"],
+                                                         header_long)
+
+    # drop the rows with memo items etc
+    for cat in cats_remove:
+        df_this_page_long = df_this_page_long.drop(
+            df_this_page_long.loc[df_this_page_long.loc[:, index_cols[0]] == cat].index)
+
+    # make a copy of the categories row
+    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:, index_cols[0]]
+
+    # replace cat names by codes in col "Categories"
+    # first the manual replacements
+    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:, "category"].replace(
+        cat_codes_manual)
+    # then the regex repalcements
+    repl = lambda m: convert_ipcc_code_primap_to_primap2('IPC' + m.group('code'))
+    df_this_page_long.loc[:, "category"] = df_this_page_long.loc[:,
+                                           "category"].str.replace(cat_code_regexp,
+                                                                   repl, regex=True)
+    df_this_page_long.loc[:, "category"].unique()
+
+    # strip spaces in data col
+    df_this_page_long.loc[:, "data"] = df_this_page_long.loc[:, "data"].str.strip()
+
+    df_this_page_long = df_this_page_long.reset_index(drop=True)
+
+    # make sure all col headers are str
+    df_this_page_long.columns = df_this_page_long.columns.map(str)
+
+    # remove thousands separators as pd.to_numeric can't deal with that
+    df_this_page_long.loc[:, "data"] = df_this_page_long.loc[:, "data"].str.replace(',',
+                                                                                    '')
+
+    # drop orig cat name as it's not unique over all tables (keep until here in case
+    # it's needed for debugging)
+    df_this_page_long = df_this_page_long.drop(columns='orig_cat_name')
+
+    data_page_if = pm2.pm2io.convert_long_dataframe_if(
+        df_this_page_long,
+        coords_cols=coords_cols,
+        #add_coords_cols=add_coords_cols,
+        coords_defaults=coords_defaults,
+        coords_terminologies=coords_terminologies,
+        coords_value_mapping=coords_value_mapping[
+            table_defs[page]["coords_value_mapping"]],
+        # coords_value_filling=coords_value_filling,
+        filter_remove=filter_remove,
+        # filter_keep=filter_keep,
+        meta_data=meta_data,
+        convert_str=True,
+        time_format='%Y',
+    )
+
+    # conversion to PRIMAP2 native format
+    data_page_pm2 = pm2.pm2io.from_interchange_format(data_page_if)
+
+    # combine with tables from other pages
+    if data_pm2 is None:
+        data_pm2 = data_page_pm2
+    else:
+        data_pm2 = data_pm2.pr.merge(data_page_pm2)
+
+# convert back to IF to have units in the fixed format
+data_if = data_pm2.pr.to_interchange_format()
+
+# ###
+# save data to IF and native format
+# ###
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw"), data_if)
+
+encoding = {var: compression for var in data_pm2.data_vars}
+data_pm2.pr.to_netcdf(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
+    encoding=encoding)
+
+
+#### processing
+data_proc_pm2 = data_pm2
+terminology_proc = coords_terminologies["category"]
+
+# actual processing
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=[],
+    gas_baskets={},
+    processing_info_country=processing_info_step1,
+)
+
+
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=[],
+    gas_baskets=gas_baskets,
+    processing_info_country=processing_info_step2,
+    cat_terminology_out = terminology_proc,
+    #category_conversion = None,
+    #sectors_out = None,
+)
+
+# adapt source and metadata
+# TODO: processing info is present twice
+current_source = data_proc_pm2.coords["source"].values[0]
+data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
+data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
+
+# ###
+# save data to IF and native format
+# ###
+data_proc_if = data_proc_pm2.pr.to_interchange_format()
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + terminology_proc), data_proc_if)
+
+encoding = {var: compression for var in data_proc_pm2.data_vars}
+data_proc_pm2.pr.to_netcdf(
+    output_folder / (output_filename + terminology_proc + ".nc"),
+    encoding=encoding)
+

+ 1 - 0
UNFCCC_GHG_data/UNFCCC_reader/Taiwan/read_TWN_2022-Inventory_from_pdf.py

@@ -1,5 +1,6 @@
 # this script reads data from Taiwan's 2022 national inventory
 # Data is read from the english summary pdf
+# TODO: add further GWPs and gas baskets
 
 import pandas as pd
 import primap2 as pm2

+ 363 - 0
UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR3.py

@@ -0,0 +1,363 @@
+# configuration for Thailand, BUR4
+# ###
+# for reading
+# ###
+
+# general
+gwp_to_use = "AR4GWP100"
+terminology_proc = 'IPCC2006_PRIMAP'
+
+header_inventory = ['Greenhouse gas source and sink categories',
+                   'CO2 emissions', 'CO2 removals',
+                   'CH4', 'N2O', 'NOx', 'CO', 'NMVOCs',
+                   'SO2', 'HFCs', 'PFCs', 'SF6']
+unit_inventory = ['Gg'] * len(header_inventory)
+unit_inventory[9] = "GgCO2eq"
+unit_inventory[10] = "GgCO2eq"
+
+# 2019 inventory
+inv_conf = {
+    'year': 2016,
+    'entity_row': 0,
+    'unit_row': 1,
+    'index_cols': "Greenhouse gas source and sink categories",
+    'header': header_inventory,
+    'unit': unit_inventory,
+    # special header as category UNFCCC_GHG_data and name in one column
+    'header_long': ["orig_cat_name", "entity", "unit", "time", "data"],
+    # manual category codes (manual mapping to primap1, will be mapped to primap2
+    # # automatically with the other codes)
+    'cat_codes_manual': {
+        '6. Other Memo Items (not accounted in Total Emissions)': 'MEMO',
+        'International Bunkers': 'MBK',
+        'CO2 from Biomass': 'MBIO',
+    },
+    'cat_code_regexp': r'^(?P<code>[a-zA-Z0-9]{1,4})[\s\.].*',
+}
+
+# primap2 format conversion
+coords_cols = {
+    "category": "category",
+    "entity": "entity",
+    "unit": "unit",
+}
+
+coords_terminologies = {
+    "area": "ISO3",
+    "category": "IPCC1996_2006_THA_Inv",
+    "scenario": "PRIMAP",
+}
+
+coords_defaults = {
+    "source": "THA-GHG-Inventory",
+    "provenance": "measured",
+    "area": "THA",
+    "scenario": "BUR3",
+}
+
+coords_value_mapping = {
+    "unit": "PRIMAP1",
+    "category": "PRIMAP1",
+    "entity": {
+        'HFCs': f"HFCS ({gwp_to_use})",
+        'PFCs': f"PFCS ({gwp_to_use})",
+        'NMVOCs': 'NMVOC',
+    },
+}
+
+filter_remove = {
+    'f_memo': {"category": "MEMO"},
+}
+filter_keep = {}
+
+meta_data = {
+    "references": "https://unfccc.int/documents/267629",
+    "rights": "",
+    "contact": "mail@johannes-guetschow.de",
+    "title": "Thailand. Biennial update report (BUR). BUR3",
+    "comment": "Read fom pdf by Johannes Gütschow",
+    "institution": "UNFCCC",
+}
+
+# main sector time series
+header_main_sector_ts = [
+    'Year', 'Energy', 'IPPU',
+    'Agriculture', 'LULUCF', 'Waste',
+    'Net emissions (Including LULUCF)',
+    'Net emissions (Excluding LULUCF)']
+unit_main_sector_ts = ['GgCO2eq'] * len(header_main_sector_ts)
+unit_main_sector_ts[0] = ''
+
+trend_conf = {
+    'header': header_main_sector_ts,
+    'unit': unit_main_sector_ts,
+    # manual category codes (manual mapping to primap1, will be mapped to primap2
+    # automatically with the other codes)
+    'cat_codes_manual': {
+        'Energy': "1",
+        'IPPU': "2",
+        'Agriculture': "3",
+        'LULUCF': "4",
+        'Waste': "5",
+        'Net emissions (Including LULUCF)': "0",
+        'Net emissions (Excluding LULUCF)': "M0EL",
+    },
+}
+
+coords_cols_main_sector_ts = {
+    "category": "category",
+    "unit": "unit",
+}
+
+coords_defaults_main_sector_ts = {
+    "source": "THA-GHG-Inventory",
+    "provenance": "measured",
+    "area": "THA",
+    "scenario": "BUR3",
+    "entity": f"KYOTOGHG ({gwp_to_use})",
+}
+
+# indirect gases time series
+header_indirect = ['Year', 'NOx', 'CO',
+                    'NMVOCs', 'SO2']
+unit_indirect = ['Gg'] * len(header_indirect)
+unit_indirect[0] = ''
+ind_conf = {
+    'header': header_indirect,
+    'unit': unit_indirect,
+    'cols_to_remove': ['Average Annual Growth Rate'],
+}
+
+coords_cols_indirect = {
+    "entity": "entity",
+    "unit": "unit",
+}
+
+coords_defaults_indirect = {
+    "source": "THA-GHG-Inventory",
+    "provenance": "measured",
+    "area": "THA",
+    "scenario": "BUR3",
+    "category": "0",
+}
+
+# ###
+# for processing
+# ###
+# aggregate categories
+country_processing_step1 = {
+    'aggregate_cats': {
+        '2.A.4': {'sources': ['2.A.4.b', '2.A.4.d'],
+                  'name': 'Other Process uses of Carbonates'},
+    },
+    'aggregate_gases': {
+        'KYOTOGHG': {
+            'basket': 'KYOTOGHG (AR4GWP100)',
+            'basket_contents': ['CO2', 'CH4', 'N2O', 'SF6',
+                                'HFCS (AR4GWP100)', 'PFCS (AR4GWP100)'],
+            'skipna': True,
+            'min_count': 1,
+            'sel': {f'category ({coords_terminologies["category"]})':
+                [
+                    '0', '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
+                    '1.A.4', '1.B', '1.B.1', '1.B.2',
+                    '1.C',
+                    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
+                    '2.B', '2.C', '2.D', '2.H',
+                    '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
+                    '3.H', '3.I',
+                    '4', '4.A', '4.B', '4.C', '4.D', '4.E',
+                    '5', '5.A', '5.B', '5.C', '5.D'
+                ]
+            }, # not tested
+        },
+    },
+}
+
+country_processing_step2 = {
+    'downscale': {
+        # main sectors present as KYOTOGHG sum. subsectors need to be downscaled
+        # TODO: downscale CO, NOx, NMVOC, SO2 (national total present)
+        'sectors': {
+            '1': {
+                'basket': '1',
+                'basket_contents': ['1.A', '1.B', '1.C'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '1.A': {
+                'basket': '1.A',
+                'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '1.B': {
+                'basket': '1.B',
+                'basket_contents': ['1.B.1', '1.B.2'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '2': {
+                'basket': '2',
+                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.H'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '2.A': {
+                'basket': '2.A',
+                'basket_contents': ['2.A.1', '2.A.2', '2.A.3', '2.A.4'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '3': {
+                'basket': '3',
+                'basket_contents': ['3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
+                                    '3.H', '3.I'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '4': {
+                'basket': '4',
+                'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '5': {
+                'basket': '5',
+                'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+        },
+        'entities': {
+            'KYOTO': {
+                'basket': 'KYOTOGHG (AR4GWP100)',
+                'basket_contents': ['CH4', 'CO2', 'N2O', 'HFCS (AR4GWP100)',
+                                    'PFCS (AR4GWP100)', 'SF6'],
+                'sel': {f'category ({coords_terminologies["category"]})':
+                    [
+                        '0', '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
+                        '1.A.4', '1.B', '1.B.1', '1.B.2', '1.C',
+                        '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
+                        '2.B', '2.C', '2.D', '2.H',
+                        '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
+                        '3.H', '3.I',
+                        '4', '4.A', '4.B', '4.C', '4.D', '4.E',
+                        '5', '5.A', '5.B', '5.C', '5.D']},
+            },
+        },
+    },
+    'basket_copy': {
+        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
+        'entities': ["HFCS", "PFCS"],
+        'source_GWP': gwp_to_use,
+    },
+}
+## not in BUR3: 1.A.1.a, 1.A.1.b, 1.A.3.a, 1.A.3.b, 1.A.3.c, 1.A.3.d, 1.A.5, 1.B.3,
+# 2.B.x, 2.F, 2.G
+# 4.E.x, 5.X.y M.BK.A, M.BK.M
+
+cat_conversion = {
+    'mapping': {
+        '0': '0',
+        'M.0.EL': 'M.0.EL',
+        '1': '1',
+        '1.A': '1.A',
+        '1.A.1': '1.A.1',
+        '1.A.2': '1.A.2',
+        '1.A.3': '1.A.3',
+        '1.A.4': '1.A.4',
+        '1.B': '1.B',
+        '1.B.1': '1.B.1',
+        '1.B.2': '1.B.2',
+        '1.C': '1.C',
+        '1.C.1': '1.C.1',
+        '1.C.2': '1.C.2',
+        '1.C.3': '1.C.3',
+        '2': '2',
+        '2.A': '2.A',
+        '2.A.1': '2.A.1',
+        '2.A.2': '2.A.2',
+        '2.A.3': '2.A.3',
+        '2.A.4': '2.A.4',
+        '2.A.4.b': '2.A.4.b',
+        '2.A.4.d': '2.A.4.d',
+        '2.B': '2.B',
+        '2.C': '2.C',
+        '2.C.1': '2.C.1',
+        '2.D': '2.D',
+        '2.D.1': '2.D.1',
+        '2.H': '2.H',
+        '2.H.1': '2.H.1',
+        '2.H.2': '2.H.2',
+        '3': 'M.AG',
+        '3.A': '3.A.1',
+        '3.B': '3.A.2',
+        '3.C': 'M.3.C.1.AG',  # field burning of agricultural residues
+        '3.D': '3.C.2',  # Liming
+        '3.E': '3.C.3',  # urea application
+        '3.F': '3.C.4',  # direct N2O from agri soils
+        '3.G': '3.C.5',  # indirect N2O from agri soils
+        '3.H': '3.C.6',  # indirect N2O from manure management
+        '3.I': '3.C.7',  # rice
+        '4': 'M.LULUCF',
+        '4.A': '3.B.1.a',  # forest remaining forest
+        '4.B': '3.B.2.a',  # cropland remaining cropland
+        '4.C': '3.B.2.b',  # land converted to cropland
+        '4.D': '3.B.6.b',  # land converted to other land
+        '4.E': 'M.3.C.1.LU',  # biomass burning (LULUCF)
+        '5': '4',
+        '5.A': '4.A',
+        '5.B': '4.B',
+        '5.C': '4.C',
+        '5.D': '4.D',
+        'M.BK': 'M.BK',
+        'M.BIO': 'M.BIO',
+    },
+    'aggregate': {
+        '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
+        '3.C.1': {'sources': ['M.3.C.1.AG', 'M.3.C.1.LU'],
+                  'name': 'Emissions from Biomass Burning'},
+        '3.C': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
+                'name': 'Aggregate sources and non-CO2 emissions sources on land'},
+        'M.3.C.AG': {
+            'sources': ['M.3.C.1.AG', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
+            'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
+        'M.AG.ELV': {'sources': ['M.3.C.AG'],
+                     'name': 'Agriculture excluding livestock emissions'},
+        'M.3.C.LU': {'sources': ['M.3.C.1.LU'],
+                     'name': 'Aggregate sources and non-CO2 emissions sources on land (Land use)'},
+        '3.B.1': {'sources': ['3.B.1.a'], 'name': 'Forest Land'},
+        '3.B.2': {'sources': ['3.B.2.a', '3.B.2.b'], 'name': 'Cropland'},
+        '3.B.6': {'sources': ['3.B.6.b'], 'name': 'Other Land'},
+        '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.6'], 'name': 'Land'},
+        'M.LULUCF': {'sources': ['3.B', 'N.3.C.LU'], 'name': 'LULUCF'},
+        '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
+    },
+}
+
+sectors_to_save = [
+    '1', '1.A', '1.A.1', '1.A.2', '1.A.3', '1.A.4',
+    '1.B', '1.B.1', '1.B.2', '1.C', '1.C.1', '1.C.2', '1.C.3',
+    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4', '2.A.4.b', '2.A.4.d',
+    '2.B', '2.C', '2.C.1', '2.H', '2.H.1', '2.H.2',
+    '3', 'M.AG', '3.A', '3.A.1', '3.A.2',
+    '3.C', '3.C.1', '3.C.2', '3.C.3', '3.C.4',
+    '3.C.5', '3.C.6', '3.C.7', 'M.3.C.1.AG', 'M.3.C.AG', 'M.AG.ELV',
+    'M.LULUCF', 'M.3.C.1.LU', 'M.3.C.LU', '3.B', '3.B.1', '3.B.1.a', '3.B.2', '3.B.2.a',
+    '3.B.2.b', '3.B.6', '3.B.6.b',
+    '4', '4.A', '4.B', '4.C', '4.D',
+    '0', 'M.0.EL', 'M.BK', 'M.BIO']
+
+
+# gas baskets
+gas_baskets = {
+    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
+    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
+    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
+    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
+    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
+    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
+}

+ 381 - 0
UNFCCC_GHG_data/UNFCCC_reader/Thailand/config_THA_BUR4.py

@@ -0,0 +1,381 @@
+# configuration for Thailand, BUR4
+# ###
+# for reading
+# ###
+
+# general
+gwp_to_use = "AR4GWP100"
+terminology_proc = 'IPCC2006_PRIMAP'
+
+# 2019 inventory
+inv_conf = {
+    'year': 2019,
+    'entity_row': 0,
+    'unit_row': 1,
+    'index_cols': "Greenhouse gas source and sink categories",
+    # special header as category UNFCCC_GHG_data and name in one column
+    'header_long': ["orig_cat_name", "entity", "unit", "time", "data"],
+    # manual category codes (manual mapping to primap1, will be mapped to primap2
+    # # automatically with the other codes)
+    'cat_codes_manual': {
+        'Total national emissions and removals': '0',
+        'Memo Items (not accounted in total Emissions)': 'MEMO',
+        'International Bunkers': 'MBK',
+        'Aviation International Bunkers': 'MBKA',
+        'Marine-International Bunkers': 'MBKM',
+        'CO2 from biomass': 'MBIO',
+    },
+    'cat_code_regexp': r'^(?P<code>[a-zA-Z0-9]{1,4})[\s\.].*',
+}
+
+# primap2 format conversion
+coords_cols = {
+    "category": "category",
+    "entity": "entity",
+    "unit": "unit",
+}
+
+coords_terminologies = {
+    "area": "ISO3",
+    "category": "IPCC1996_2006_THA_Inv",
+    "scenario": "PRIMAP",
+}
+
+coords_defaults = {
+    "source": "THA-GHG-Inventory",
+    "provenance": "measured",
+    "area": "THA",
+    "scenario": "BUR4",
+}
+
+coords_value_mapping = {
+    "unit": "PRIMAP1",
+    "category": "PRIMAP1",
+    "entity": {
+        'HFCs': f"HFCS ({gwp_to_use})",
+        'PFCs': f"PFCS ({gwp_to_use})",
+        'SF6': f'SF6 ({gwp_to_use})',
+        'NMVOCs': 'NMVOC',
+        'Nox': 'NOx',
+    },
+}
+
+filter_remove = {
+    'f_memo': {"category": "MEMO"},
+}
+filter_keep = {}
+
+meta_data = {
+    "references": "https://unfccc.int/documents/624750",
+    "rights": "",
+    "contact": "mail@johannes-guetschow.de",
+    "title": "Thailand. Biennial update report (BUR). BUR4",
+    "comment": "Read fom pdf by Johannes Gütschow",
+    "institution": "UNFCCC",
+}
+
+# main sector time series
+# manual category codes (manual mapping to primap1, will be mapped to primap2
+# automatically with the other codes)
+cat_codes_manual_main_sector_ts = {
+    'Energy': "1",
+    'Industrial Processes and Product Use': "2",
+    'Agriculture': "3",
+    'LULUCF': "4",
+    'Waste': "5",
+    'Net emissions (Include LULUCF)': "0",
+    'Total emissions (Exclude LULUCF)': "M0EL",
+}
+
+coords_cols_main_sector_ts = {
+    "category": "category",
+}
+
+coords_defaults_main_sector_ts = {
+    "source": "THA-GHG-Inventory",
+    "provenance": "measured",
+    "area": "THA",
+    "scenario": "BUR4",
+    "entity": f"KYOTOGHG ({gwp_to_use})",
+    "unit": "GgCO2eq",
+}
+
+# indirect gases time series
+coords_cols_indirect = {
+    "entity": "entity",
+}
+
+coords_defaults_indirect = {
+    "source": "THA-GHG-Inventory",
+    "provenance": "measured",
+    "area": "THA",
+    "scenario": "BUR4",
+    "category": "0",
+    "unit": "Gg",
+}
+
+# ###
+# for processing
+# ###
+# aggregate categories
+country_processing_step1 = {
+    'aggregate_cats': {
+        '2.A.4': {'sources': ['2.A.4.b', '2.A.4.d'],
+                  'name': 'Other Process uses of Carbonates'},
+        '2.B.8': {'sources': ['2.B.8.b', '2.B.8.c', '2.B.8.e', '2.B.8.f'],
+                  'name': 'Petrochemical and Carbon Black production'},
+    },
+    'aggregate_gases': {
+        'KYOTOGHG': {
+            'basket': 'KYOTOGHG (AR4GWP100)',
+            'basket_contents': ['CO2', 'CH4', 'N2O', 'SF6',
+                                'HFCS (AR4GWP100)', 'PFCS (AR4GWP100)'],
+            'skipna': True,
+            'min_count': 1,
+            'sel': {f'category ({coords_terminologies["category"]})':
+                [
+                    '0', '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
+                    '1.A.4', '1.A.5', '1.B', '1.B.1', '1.B.2', '1.B.3',
+                    '1.C',
+                    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
+                    '2.B', '2.C', '2.D', '2.F', '2.G', '2.H',
+                    '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
+                    '3.H', '3.I',
+                    '4', '4.A', '4.B', '4.C', '4.D',
+                    '4.E', '4.E.1', '4.E.2', '4.E.3',
+                    '5', '5.A', '5.B', '5.C', '5.D'
+                ]
+            }, # not tested
+        },
+    },
+}
+
+country_processing_step2 = {
+    'downscale': {
+        # main sectors present as KYOTOGHG sum. subsectors need to be downscaled
+        # TODO: downscale CO, NOx, NMVOC, SO2 (national total present)
+        'sectors': {
+            '1': {
+                'basket': '1',
+                'basket_contents': ['1.A', '1.B', '1.C'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '1.A': {
+                'basket': '1.A',
+                'basket_contents': ['1.A.1', '1.A.2', '1.A.3', '1.A.4', '1.A.5'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '1.B': {
+                'basket': '1.B',
+                'basket_contents': ['1.B.1', '1.B.2', '1.B.3'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '2': {
+                'basket': '2',
+                'basket_contents': ['2.A', '2.B', '2.C', '2.D', '2.F', '2.G', '2.H'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '2.A': {
+                'basket': '2.A',
+                'basket_contents': ['2.A.1', '2.A.2', '2.A.3', '2.A.4'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '3': {
+                'basket': '3',
+                'basket_contents': ['3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
+                                    '3.H', '3.I'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '4': {
+                'basket': '4',
+                'basket_contents': ['4.A', '4.B', '4.C', '4.D', '4.E'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '4.E': {
+                'basket': '4.E',
+                'basket_contents': ['4.E.1', '4.E.2', '4.E.3'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+            '5': {
+                'basket': '5',
+                'basket_contents': ['5.A', '5.B', '5.C', '5.D'],
+                'entities': ['KYOTOGHG (AR4GWP100)'],
+                'dim': f'category ({coords_terminologies["category"]})',
+            },
+        },
+        'entities': {
+            'KYOTO': {
+                'basket': 'KYOTOGHG (AR4GWP100)',
+                'basket_contents': ['CH4', 'CO2', 'N2O', 'HFCS (AR4GWP100)',
+                                    'PFCS (AR4GWP100)', 'SF6'],
+                'sel': {f'category ({coords_terminologies["category"]})':
+                    [
+                        '0', '1', '1.A', '1.A.1', '1.A.2', '1.A.3',
+                        '1.A.4', '1.A.5', '1.B', '1.B.1', '1.B.2', '1.B.3',
+                        '1.C',
+                        '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4',
+                        '2.B', '2.C', '2.D', '2.F', '2.G', '2.H',
+                        '3', '3.A', '3.B', '3.C', '3.D', '3.E', '3.F', '3.G',
+                        '3.H', '3.I',
+                        '4', '4.A', '4.B', '4.C', '4.D',
+                        '4.E', '4.E.1', '4.E.2', '4.E.3',
+                        '5', '5.A', '5.B', '5.C', '5.D']},
+            },
+        },
+    },
+    'basket_copy': {
+        'GWPs_to_add': ["SARGWP100", "AR5GWP100", "AR6GWP100"],
+        'entities': ["HFCS", "PFCS"],
+        'source_GWP': gwp_to_use,
+    },
+}
+
+cat_conversion = {
+    'mapping': {
+        '0': '0',
+        'M.0.EL': 'M.0.EL',
+        '1': '1',
+        '1.A': '1.A',
+        '1.A.1': '1.A.1',
+        '1.A.1.a': '1.A.1.a',
+        '1.A.1.b': '1.A.1.b',
+        '1.A.2': '1.A.2',
+        '1.A.3': '1.A.3',
+        '1.A.3.a': '1.A.3.a',
+        '1.A.3.b': '1.A.3.b',
+        '1.A.3.c': '1.A.3.c',
+        '1.A.3.d': '1.A.3.d',
+        '1.A.4': '1.A.4',
+        '1.A.5': '1.A.5',
+        '1.B': '1.B',
+        '1.B.1': '1.B.1',
+        '1.B.2': '1.B.2',
+        '1.B.3': '1.B.3',
+        '1.C': '1.C',
+        '1.C.1': '1.C.1',
+        '1.C.2': '1.C.2',
+        '1.C.3': '1.C.3',
+        '2': '2',
+        '2.A': '2.A',
+        '2.A.1': '2.A.1',
+        '2.A.2': '2.A.2',
+        '2.A.3': '2.A.3',
+        '2.A.4': '2.A.4',
+        '2.A.4.b': '2.A.4.b',
+        '2.A.4.d': '2.A.4.d',
+        '2.B': '2.B',
+        '2.B.2': '2.B.2',
+        '2.B.4': '2.B.4',
+        '2.B.8': '2.B.8',
+        '2.B.8.b': '2.B.8.b',
+        '2.B.8.c': '2.B.8.c',
+        '2.B.8.e': '2.B.8.e',
+        '2.B.8.f': '2.B.8.f',
+        '2.C': '2.C',
+        '2.C.1': '2.C.1',
+        '2.D': '2.D',
+        '2.D.1': '2.D.1',
+        '2.F': '2.F',
+        '2.F.1': '2.F.1',
+        '2.G': '2.G',
+        '2.G.1': '2.G.1',
+        '2.H': '2.H',
+        '2.H.1': '2.H.1',
+        '2.H.2': '2.H.2',
+        '3': 'M.AG',
+        '3.A': '3.A.1',
+        '3.B': '3.A.2',
+        '3.C': 'M.3.C.1.b.i',  # field burning of agricultural residues
+        '3.D': '3.C.2',  # Liming
+        '3.E': '3.C.3',  # urea application
+        '3.F': '3.C.4',  # direct N2O from agri soils
+        '3.G': '3.C.5',  # indirect N2O from agri soils
+        '3.H': '3.C.6',  # indirect N2O from manure management
+        '3.I': '3.C.7',  # rice
+        #'4': 'M.LULUCF',
+        '4.A': '3.B.1.a',  # forest remaining forest
+        '4.B': '3.B.2.a',  # cropland remaining cropland
+        '4.C': '3.B.2.b',  # land converted to cropland
+        '4.D': '3.B.6.b',  # land converted to other land
+        #'4.E': 'M.3.C.1.LU',  # biomass burning (LULUCF)
+        '4.E.1': '3.C.1.a', # biomass burning (Forest Land)
+        '4.E.2': 'M.3.C.1.b.ii', # biomass burning (Cropland)
+        '4.E.3': '3.C.1.d', # biomass burning (Other Land)
+        '5': '4',
+        '5.A': '4.A',
+        '5.A.1': '4.A.1',
+        '5.A.2': '4.A.2',
+        '5.B': '4.B',
+        '5.C': '4.C',
+        '5.C.1': '4.C.1',
+        '5.D': '4.D',
+        '5.D.1': '4.D.1',
+        '5.D.2': '4.D.2',
+        'M.BK': 'M.BK',
+        'M.BK.A': 'M.BK.A',
+        'M.BK.M': 'M.BM.M',
+        'M.BIO': 'M.BIO',
+    },
+    'aggregate': {
+        '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
+        '3.C.1.b': {'sources': ['M.3.C.1.b.i', 'M.3.C.1.b.ii'],
+                  'name': 'Biomass Burning In Cropland'},
+        'M.3.C.1.AG': {'sources': ['3.C.1.b', '3.C.1.c'],
+                  'name': 'Biomass Burning (Agriculture)'},
+        'M.3.C.1.LU': {'sources': ['3.C.1.a', '3.C.1.d'],
+                  'name': 'Biomass Burning (LULUCF)'},
+        '3.C.1': {'sources': ['M.3.C.1.AG', 'M.3.C.1.LU'],
+                  'name': 'Emissions from Biomass Burning'},
+        '3.C': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
+                'name': 'Aggregate sources and non-CO2 emissions sources on land'},
+        'M.3.C.AG': {
+            'sources': ['M.3.C.1.AG', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
+            'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
+        'M.AG.ELV': {'sources': ['M.3.C.AG'],
+                     'name': 'Agriculture excluding livestock emissions'},
+        'M.3.C.LU': {'sources': ['M.3.C.1.LU'],
+                     'name': 'Aggregate sources and non-CO2 emissions sources on land (Land use)'},
+        '3.B.1': {'sources': ['3.B.1.a'], 'name': 'Forest Land'},
+        '3.B.2': {'sources': ['3.B.2.a', '3.B.2.b'], 'name': 'Cropland'},
+        '3.B.6': {'sources': ['3.B.6.b'], 'name': 'Other Land'},
+        '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.6'], 'name': 'Land'},
+        'M.LULUCF': {'sources': ['3.B', 'N.3.C.LU'], 'name': 'LULUCF'},
+        '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
+    },
+}
+
+sectors_to_save = [
+    '1', '1.A', '1.A.1', '1.A.1.a', '1.A.1.b', '1.A.2', '1.A.3', '1.A.3.a', '1.A.3.b',
+    '1.A.3.c', '1.A.3.d', '1.A.4', '1.A.5',
+    '1.B', '1.B.1', '1.B.2', '1.B.3', '1.C', '1.C.1', '1.C.2', '1.C.3',
+    '2', '2.A', '2.A.1', '2.A.2', '2.A.3', '2.A.4', '2.A.4.b', '2.A.4.d',
+    '2.B', '2.B.2', '2.B.4', '2.B.8', '2.B.8.a', '2.B.8.c', '2.B.8.e', '2.B.8.f',
+    '2.C', '2.C.1', '2.F', '2.F.1', '2.G', '2.G.1', '2.H', '2.H.1', '2.H.2',
+    '3', 'M.AG', '3.A', '3.A.1', '3.A.2',
+    '3.C', '3.C.1', '3.C.1.a', '3.C.1.b', '3.C.1.d', '3.C.2', '3.C.3', '3.C.4',
+    '3.C.5', '3.C.6', '3.C.7', 'M.3.C.1.AG', 'M.3.C.AG', 'M.AG.ELV',
+    'M.LULUCF', 'M.3.C.1.LU', 'M.3.C.LU', '3.B', '3.B.1', '3.B.1.a', '3.B.2', '3.B.2.a',
+    '3.B.2.b', '3.B.6', '3.B.6.b',
+    '4', '4.A', '4.A.1', '4.A.2', '4.B', '4.C', '4.C.1', '4.D', '4.D.1', '4.D.2',
+    '0', 'M.0.EL', 'M.BK', 'M.BK.A', 'M.BK.M', 'M.BIO']
+
+
+# gas baskets
+gas_baskets = {
+    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
+    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
+    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (SARGWP100)'],
+    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR4GWP100)'],
+    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR5GWP100)'],
+    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'FGASES (AR6GWP100)'],
+}

+ 106 - 267
UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR3_from_pdf.py

@@ -1,12 +1,20 @@
 # this script reads data from Thailand's BUR3
 # Data is read from the pdf file
+
 import pandas as pd
 import primap2 as pm2
 import camelot
-import copy
 
+from UNFCCC_GHG_data.helper import process_data_for_country
 from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
-from primap2.pm2io._data_reading import matches_time_format
+from config_THA_BUR3 import inv_conf, trend_conf, ind_conf
+from config_THA_BUR3 import coords_cols, coords_defaults, coords_terminologies, \
+    coords_value_mapping, filter_remove, filter_keep, meta_data
+from config_THA_BUR3 import coords_cols_main_sector_ts, coords_defaults_main_sector_ts
+from config_THA_BUR3 import coords_defaults_indirect, coords_cols_indirect
+from config_THA_BUR3 import gas_baskets, cat_conversion, terminology_proc, \
+    sectors_to_save
+from config_THA_BUR3 import country_processing_step1, country_processing_step2
 
 # ###
 # configuration
@@ -23,150 +31,35 @@ compression = dict(zlib=True, complevel=9)
 
 # inventory tables
 pages_inventory = '68,69'
-header_inventory = ['Greenhouse gas source and sink categories',
-                   'CO2 emissions', 'CO2 removals',
-                   'CH4', 'N2O', 'NOx', 'CO', 'NMVOCs',
-                   'SO2', 'HFCs', 'PFCs', 'SF6']
-unit_inventory = ['Gg'] * len(header_inventory)
-unit_inventory[9] = "GgCO2eq"
-unit_inventory[10] = "GgCO2eq"
-
-year = 2016
-entity_row = 0
-unit_row = 1
-gwp_to_use = "AR4GWP100"
-
-index_cols = "Greenhouse gas source and sink categories"
-# special header as category UNFCCC_GHG_data and name in one column
-header_long = ["orig_cat_name", "entity", "unit", "time", "data"]
-
-# manual category codes
-cat_codes_manual = {
-    '6. Other Memo Items (not accounted in Total Emissions)': 'MEMO',
-    'International Bunkers': 'MBK',
-    'CO2 from Biomass': 'MBIO',
-}
-
-cat_code_regexp = r'^(?P<UNFCCC_GHG_data>[a-zA-Z0-9]{1,4})[\s\.].*'
-
-coords_cols = {
-    "category": "category",
-    "entity": "entity",
-    "unit": "unit",
-}
-
-
-coords_terminologies = {
-    "area": "ISO3",
-    "category": "IPCC1996_2006_THA_Inv",
-    "scenario": "PRIMAP",
-}
-
-coords_defaults = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR3",
-}
-
-coords_value_mapping = {
-    "unit": "PRIMAP1",
-    "category": "PRIMAP1",
-    "entity": {
-        'HFCs': f"HFCS ({gwp_to_use})",
-        'PFCs': f"PFCS ({gwp_to_use})",
-        'NMVOCs': 'NMVOC',
-    },
-}
-
-
-filter_remove = {
-    'f_memo': {"category": "MEMO"},
-}
-filter_keep = {}
-
-meta_data = {
-    "references": "https://unfccc.int/documents/267629",
-    "rights": "",
-    "contact": "mail@johannes-guetschow.de",
-    "title": "Thailand. Biennial update report (BUR). BUR3",
-    "comment": "Read fom pdf by Johannes Gütschow",
-    "institution": "UNFCCC",
-}
 
 # main sector time series
 page_main_sector_ts = '70'
-header_main_sector_ts = ['Year', 'Energy', 'IPPU',
-                    'Agriculture', 'LULUCF', 'Waste',
-                    'Net emissions (Including LULUCF)',
-                    'Net emissions (Excluding LULUCF)']
-unit_main_sector_ts = ['GgCO2eq'] * len(header_main_sector_ts)
-unit_main_sector_ts[0] = ''
-
-# manual category codes
-cat_codes_manual_main_sector_ts = {
-    'Energy': "1",
-    'IPPU': "2",
-    'Agriculture': "3",
-    'LULUCF': "4",
-    'Waste': "5",
-    'Net emissions (Including LULUCF)': "0",
-    'Net emissions (Excluding LULUCF)': "M0EL",
-}
-
-coords_cols_main_sector_ts = {
-    "category": "category",
-    "unit": "unit",
-}
-
-coords_defaults_main_sector_ts = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR3",
-    "entity": f"KYOTOGHG ({gwp_to_use})"
-}
 
 # indirect gases time series
 page_indirect = '72'
-header_indirect = ['Year', 'NOx', 'CO',
-                    'NMVOCs', 'SO2']
-unit_indirect = ['Gg'] * len(header_indirect)
-unit_indirect[0] = ''
-
-cols_to_remove = ['Average Annual Growth Rate']
-
-coords_cols_indirect = {
-    "entity": "entity",
-    "unit": "unit",
-}
-
-coords_defaults_indirect = {
-    "source": "THA-GHG-Inventory",
-    "provenance": "measured",
-    "area": "THA",
-    "scenario": "BUR3",
-    "category": "0"
-}
 
 
 # ###
 # read the inventory data and convert to PM2 IF
 # ###
-
 tables_inventory = camelot.read_pdf(str(input_folder / inventory_file), pages=pages_inventory,
                                     split_text=True, flavor="lattice")
 
 df_inventory = tables_inventory[0].df[1:]
-df_header = pd.DataFrame([header_inventory, unit_inventory])
+df_header = pd.DataFrame([inv_conf["header"], inv_conf["unit"]])
 
-df_inventory = pd.concat([df_header, df_inventory, tables_inventory[1].df.iloc[1:]], axis=0, join='outer')
+df_inventory = pd.concat([df_header, df_inventory, tables_inventory[1].df.iloc[1:]],
+                         axis=0, join='outer')
 
-df_inventory = pm2.pm2io.nir_add_unit_information(df_inventory, unit_row=unit_row, entity_row=entity_row,
-                                                  regexp_entity=".*", regexp_unit=".*", default_unit="Gg")
+df_inventory = pm2.pm2io.nir_add_unit_information(df_inventory,
+                                                  unit_row=inv_conf["unit_row"],
+                                                  entity_row=inv_conf["entity_row"],
+                                                  regexp_entity=".*", regexp_unit=".*",
+                                                  default_unit="Gg")
 # set index and convert to long format
-df_inventory = df_inventory.set_index(index_cols)
-df_inventory_long = pm2.pm2io.nir_convert_df_to_long(df_inventory, year, header_long)
+df_inventory = df_inventory.set_index(inv_conf["index_cols"])
+df_inventory_long = pm2.pm2io.nir_convert_df_to_long(df_inventory, inv_conf["year"],
+                                                     inv_conf["header_long"])
 df_inventory_long["orig_cat_name"] = df_inventory_long["orig_cat_name"].str[0]
 
 # prep for conversion to PM2 IF and native format
@@ -175,16 +68,22 @@ df_inventory_long["category"] = df_inventory_long["orig_cat_name"]
 
 # replace cat names by codes in col "category"
 # first the manual replacements
-df_inventory_long["category"] = df_inventory_long["category"].replace(cat_codes_manual)
+df_inventory_long["category"] = \
+    df_inventory_long["category"].replace(inv_conf["cat_codes_manual"])
 # then the regex replacements
-repl = lambda m: m.group('UNFCCC_GHG_data')
-df_inventory_long["category"] = df_inventory_long["category"].str.replace(cat_code_regexp, repl, regex=True)
+repl = lambda m: m.group('code')
+df_inventory_long["category"] = \
+    df_inventory_long["category"].str.replace(inv_conf["cat_code_regexp"], repl,
+                                              regex=True)
 df_inventory_long = df_inventory_long.reset_index(drop=True)
 
 # replace "," with "" in data
 repl = lambda m: m.group('part1') + m.group('part2')
-df_inventory_long.loc[:, "data"] = df_inventory_long.loc[:, "data"].str.replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-df_inventory_long.loc[:, "data"] = df_inventory_long.loc[:, "data"].str.replace(' ','', regex=False)
+df_inventory_long.loc[:, "data"] = \
+    df_inventory_long.loc[:, "data"].str.replace(
+        '(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
+df_inventory_long.loc[:, "data"] = df_inventory_long.loc[:, "data"].str.\
+    replace(' ','', regex=False)
 
 # make sure all col headers are str
 df_inventory_long.columns = df_inventory_long.columns.map(str)
@@ -202,7 +101,8 @@ data_inventory_IF = pm2.pm2io.convert_long_dataframe_if(
     filter_remove=filter_remove,
     #filter_keep=filter_keep,
     meta_data=meta_data,
-    convert_str=True
+    convert_str=True,
+    time_format="%Y",
     )
 
 # ###
@@ -214,7 +114,7 @@ tables_main_sector_ts = camelot.read_pdf(str(input_folder / inventory_file), pag
 df_main_sector_ts = tables_main_sector_ts[0].df.iloc[2:]
 #df_header = pd.DataFrame([header_main_sector_ts, unit_main_sector_ts])
 #df_main_sector_ts = pd.concat([df_header, df_main_sector_ts], axis=0, join='outer')
-df_main_sector_ts.columns = [header_main_sector_ts, unit_main_sector_ts]
+df_main_sector_ts.columns = [trend_conf["header"], trend_conf["unit"]]
 
 df_main_sector_ts = df_main_sector_ts.transpose()
 df_main_sector_ts = df_main_sector_ts.reset_index(drop=False)
@@ -225,13 +125,16 @@ df_main_sector_ts.columns = cols
 df_main_sector_ts = df_main_sector_ts.drop(0)
 
 # replace cat names by codes in col "category"
-df_main_sector_ts["category"] = df_main_sector_ts["category"].replace(cat_codes_manual_main_sector_ts)
+df_main_sector_ts["category"] = df_main_sector_ts["category"].replace(
+    trend_conf["cat_codes_manual"])
 
 repl = lambda m: m.group('part1') + m.group('part2')
 year_cols = list(set(df_main_sector_ts.columns) - set(['category', 'unit']))
 for col in year_cols:
-    df_main_sector_ts.loc[:, col] = df_main_sector_ts.loc[:, col].str.replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-    df_main_sector_ts.loc[:, col] = df_main_sector_ts.loc[:, col].str.replace(' ','', regex=False)
+    df_main_sector_ts.loc[:, col] = df_main_sector_ts.loc[:, col].str.\
+        replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
+    df_main_sector_ts.loc[:, col] = df_main_sector_ts.loc[:, col].str.\
+        replace(' ','', regex=False)
 
 data_main_sector_ts_IF = pm2.pm2io.convert_wide_dataframe_if(
     df_main_sector_ts,
@@ -244,7 +147,8 @@ data_main_sector_ts_IF = pm2.pm2io.convert_wide_dataframe_if(
     filter_remove=filter_remove,
     #filter_keep=filter_keep,
     meta_data=meta_data,
-    convert_str=True
+    convert_str=True,
+    time_format="%Y",
     )
 
 
@@ -257,7 +161,7 @@ tables_indirect = camelot.read_pdf(str(input_folder / inventory_file), pages=pag
 df_indirect = tables_indirect[0].df.iloc[2:]
 #df_header = pd.DataFrame([header_main_sector_ts, unit_main_sector_ts])
 #df_main_sector_ts = pd.concat([df_header, df_main_sector_ts], axis=0, join='outer')
-df_indirect.columns = [header_indirect, unit_indirect]
+df_indirect.columns = [ind_conf["header"], ind_conf["unit"]]
 
 df_indirect = df_indirect.transpose()
 df_indirect = df_indirect.reset_index(drop=False)
@@ -266,13 +170,15 @@ cols.iloc[0] = "entity"
 cols.iloc[1] = "unit"
 df_indirect.columns = cols
 df_indirect = df_indirect.drop(0)
-df_indirect = df_indirect.drop(columns=cols_to_remove)
+df_indirect = df_indirect.drop(columns=ind_conf["cols_to_remove"])
 
 repl = lambda m: m.group('part1') + m.group('part2')
 year_cols = list(set(df_indirect.columns) - set(['entity', 'unit']))
 for col in year_cols:
-    df_indirect.loc[:, col] = df_indirect.loc[:, col].str.replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
-    df_indirect.loc[:, col] = df_indirect.loc[:, col].str.replace(' ','', regex=False)
+    df_indirect.loc[:, col] = df_indirect.loc[:, col].str.\
+        replace('(?P<part1>[0-9]+),(?P<part2>[0-9\.]+)$', repl, regex=True)
+    df_indirect.loc[:, col] = df_indirect.loc[:, col].str.\
+        replace(' ','', regex=False)
 
 data_indirect_IF = pm2.pm2io.convert_wide_dataframe_if(
     df_indirect,
@@ -285,7 +191,8 @@ data_indirect_IF = pm2.pm2io.convert_wide_dataframe_if(
     #filter_remove=filter_remove,
     #filter_keep=filter_keep,
     meta_data=meta_data,
-    convert_str=True
+    convert_str=True,
+    time_format="%Y",
     )
 
 # ###
@@ -295,137 +202,69 @@ data_inventory_pm2 = pm2.pm2io.from_interchange_format(data_inventory_IF)
 data_main_sector_ts_pm2 = pm2.pm2io.from_interchange_format(data_main_sector_ts_IF)
 data_indirect_pm2 = pm2.pm2io.from_interchange_format(data_indirect_IF)
 
-data_all = data_inventory_pm2.pr.merge(data_main_sector_ts_pm2)
-data_all = data_all.pr.merge(data_indirect_pm2)
-
-# combine CO2 emissions and absorptions
-data_CO2 = data_all[['CO2 emissions', 'CO2 removals']].\
-    to_array().pr.sum("variable", skipna=True, min_count=1)
-data_all["CO2"] = data_CO2
-
-data_all_if = data_all.pr.to_interchange_format()
-
+data_all_pm2 = data_inventory_pm2.pr.merge(data_main_sector_ts_pm2)
+data_all_pm2 = data_all_pm2.pr.merge(data_indirect_pm2)
 
+data_all_if = data_all_pm2.pr.to_interchange_format()
 
 # ###
-# convert to IPCC2006 categories
+# save raw data to IF and native format
 # ###
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
+    data_all_if)
 
-cat_mapping = {
-    '3': 'M.AG',
-    '3.A': '3.A.1',
-    '3.B': '3.A.2',
-    '3.C': 'M.3.C.1.AG',  # field burning of agricultural residues
-    '3.D': '3.C.2',  # Liming
-    '3.E': '3.C.3',  # urea application
-    '3.F': '3.C.4',  # direct N2O from agri soils
-    '3.G': '3.C.5',  # indirect N2O from agri soils
-    '3.H': '3.C.6',  # indirect N2O from manure management
-    '3.I': '3.C.7',  # rice
-    '4': 'M.LULUCF',
-    '4.A': '3.B.1.a',  # forest remaining forest
-    '4.B': '3.B.2.a',  # cropland remaining cropland
-    '4.C': '3.B.2.b',  # land converted to cropland
-    '4.D': '3.B.6.b',  # land converted to other land
-    '4.E': 'M.3.C.1.LU',  # biomass burning (LULUCF)
-    '5': '4',
-    '5.A': '4.A',
-    '5.B': '4.B',
-    '5.C': '4.C',
-    '5.D': '4.D',
-}
-
-aggregate_cats = {
-    '2.A.4': {'sources': ['2.A.4.b', '2.A.4.d'],
-              'name': 'Other Process uses of Carbonates'},
-    '3.A': {'sources': ['3.A.1', '3.A.2'], 'name': 'Livestock'},
-    '3.C.1': {'sources': ['M.3.C.1.AG', 'M.3.C.1.LU'],
-              'name': 'Emissions from Biomass Burning'},
-    '3.C': {'sources': ['3.C.1', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-            'name': 'Aggregate sources and non-CO2 emissions sources on land'},
-    'M.3.C.AG': {
-        'sources': ['M.3.C.1.AG', '3.C.2', '3.C.3', '3.C.4', '3.C.5', '3.C.6', '3.C.7'],
-        'name': 'Aggregate sources and non-CO2 emissions sources on land (Agriculture)'},
-    'M.3.C.LU': {'sources': ['M.3.C.1.LU'],
-                 'name': 'Aggregate sources and non-CO2 emissions sources on land (Land use)'},
-    '3': {'sources': ['M.AG', 'M.LULUCF'], 'name': 'AFOLU'},
-    '3.B.1': {'sources': ['3.B.1.a'], 'name': 'Forest Land'},
-    '3.B.2': {'sources': ['3.B.2.a', '3.B.2.b'], 'name': 'Cropland'},
-    '3.B.6': {'sources': ['3.B.6.b'], 'name': 'Other Land'},
-    '3.B': {'sources': ['3.B.1', '3.B.2', '3.B.6'], 'name': 'Land'},
-    'M.AG.ELV': {'sources': ['M.3.C.AG'],
-                 'name': 'Agriculture excluding livestock emissions'},
-}
-
-data_if_2006 = copy.deepcopy(data_all_if)
-data_if_2006.attrs = copy.deepcopy(data_all_if.attrs)
-
-# map categories
-data_if_2006 = data_if_2006.replace({'category (IPCC1996_2006_THA_Inv)': cat_mapping})
-data_if_2006["category (IPCC1996_2006_THA_Inv)"].unique()
-
-# rename the category col
-data_if_2006.rename(
-    columns={'category (IPCC1996_2006_THA_Inv)': 'category (IPCC2006_PRIMAP)'},
-    inplace=True)
-data_if_2006.attrs['attrs']['cat'] = 'category (IPCC2006_PRIMAP)'
-data_if_2006.attrs['dimensions']['*'] = [
-    'category (IPCC2006_PRIMAP)' if item == 'category (IPCC1996_2006_THA_Inv)'
-    else item for item in data_if_2006.attrs['dimensions']['*']]
-# aggregate categories
-for cat_to_agg in aggregate_cats:
-    mask = data_if_2006["category (IPCC2006_PRIMAP)"].isin(
-        aggregate_cats[cat_to_agg]["sources"])
-    df_test = data_if_2006[mask]
-    # print(df_test)
-
-    if len(df_test) > 0:
-        print(f"Aggregating category {cat_to_agg}")
-        df_combine = df_test.copy(deep=True)
-
-        time_format = '%Y'
-        time_columns = [
-            col
-            for col in df_combine.columns.values
-            if matches_time_format(col, time_format)
-        ]
-
-        for col in time_columns:
-            df_combine[col] = pd.to_numeric(df_combine[col], errors="coerce")
-
-        df_combine = df_combine.groupby(
-            by=['source', 'scenario (PRIMAP)', 'provenance', 'area (ISO3)', 'entity',
-                'unit']).sum(min_count=1)
-
-        df_combine.insert(0, "category (IPCC2006_PRIMAP)", cat_to_agg)
-        # df_combine.insert(1, "cat_name_translation", aggregate_cats[cat_to_agg]["name"])
-        # df_combine.insert(2, "orig_cat_name", "computed")
-
-        df_combine = df_combine.reset_index()
-
-        data_if_2006 = pd.concat([data_if_2006, df_combine], axis=0, join='outer')
-        data_if_2006 = data_if_2006.reset_index(drop=True)
-    else:
-        print(f"no data to aggregate category {cat_to_agg}")
-
-# conversion to PRIMAP2 native format
-data_pm2_2006 = pm2.pm2io.from_interchange_format(data_if_2006)
-
-# convert back to IF to have units in the fixed format
-data_if_2006 = data_pm2_2006.pr.to_interchange_format()
+encoding = {var: compression for var in data_all_pm2.data_vars}
+data_all_pm2.pr.to_netcdf(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
+    encoding=encoding)
 
+# ###
+# ## process the data
+# ###
+data_proc_pm2 = data_all_pm2
+
+# combine CO2 emissions and removals
+data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].pr.sum\
+    (dim="entity", skipna=True, min_count=1)
+data_proc_pm2["CO2"].attrs['entity'] = 'CO2'
+
+# actual processing
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=['CO2 emissions', 'CO2 removals'],
+    gas_baskets={},
+    processing_info_country=country_processing_step1,
+)
+
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=[],
+    gas_baskets=gas_baskets,
+    processing_info_country=country_processing_step2,
+    cat_terminology_out = terminology_proc,
+    category_conversion = cat_conversion,
+    sectors_out = sectors_to_save,
+)
+
+# adapt source and metadata
+# TODO: processing info is present twice
+current_source = data_proc_pm2.coords["source"].values[0]
+data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
+data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
 
 # ###
 # save data to IF and native format
 # ###
-# data in original categories
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + coords_terminologies["category"]), data_all_if)
-
-encoding = {var: compression for var in data_all.data_vars}
-data_all.pr.to_netcdf(output_folder / (output_filename + coords_terminologies["category"] + ".nc"), encoding=encoding)
-
-# data in 2006 categories
-pm2.pm2io.write_interchange_format(output_folder / (output_filename + "IPCC2006_PRIMAP"), data_if_2006)
+data_proc_if = data_proc_pm2.pr.to_interchange_format()
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + terminology_proc), data_proc_if)
 
-encoding = {var: compression for var in data_pm2_2006.data_vars}
-data_pm2_2006.pr.to_netcdf(output_folder / (output_filename + "IPCC2006_PRIMAP" + ".nc"), encoding=encoding)
+encoding = {var: compression for var in data_proc_pm2.data_vars}
+data_proc_pm2.pr.to_netcdf(
+    output_folder / (output_filename + terminology_proc + ".nc"),
+    encoding=encoding)

+ 225 - 0
UNFCCC_GHG_data/UNFCCC_reader/Thailand/read_THA_BUR4_from_pdf.py

@@ -0,0 +1,225 @@
+# this script reads data from Thailand's BUR3
+# Data is read from two csv files which have been created manually from ocr processed
+# pdf files
+# pdftk Thailand_BUR4_final_28122022.pdf cat 65-67east output inventory_2019.pdf
+# ocrmypdf --force-ocr inventory_2019.pdf inventory_2019_ocr.pdf
+# pdftk Thailand_BUR4_final_28122022.pdf cat 69 output trends.pdf
+# ocrmypdf --force-ocr trends.pdf trends_ocr.pdf
+
+# values for HFCs and SF6 have been taken from Table2-9 where they are present in
+# CO2eq and thus HFC data can be used and SF6 data is not 0 as in the mein inventory
+# tables
+
+import pandas as pd
+import primap2 as pm2
+
+from UNFCCC_GHG_data.helper import process_data_for_country
+from UNFCCC_GHG_data.helper import downloaded_data_path, extracted_data_path
+from config_THA_BUR4 import gwp_to_use, inv_conf
+from config_THA_BUR4 import coords_cols, coords_defaults, coords_terminologies, \
+    coords_value_mapping, filter_remove, filter_keep, meta_data
+from config_THA_BUR4 import coords_cols_main_sector_ts, \
+    cat_codes_manual_main_sector_ts, coords_defaults_main_sector_ts
+from config_THA_BUR4 import coords_defaults_indirect, coords_cols_indirect
+from config_THA_BUR4 import gas_baskets, cat_conversion, terminology_proc, \
+    sectors_to_save
+from config_THA_BUR4 import country_processing_step1, country_processing_step2
+
+# ###
+# configuration
+# ###
+input_folder = downloaded_data_path / 'UNFCCC' / 'Thailand' / 'BUR4'
+output_folder = extracted_data_path / 'UNFCCC' / 'Thailand'
+if not output_folder.exists():
+    output_folder.mkdir()
+
+inventory_file = 'THA_inventory_2019.csv'
+trends_file = 'THA_trends_2000-2019.csv'
+indirect_file = 'THA_indirect_2000-2019.csv'
+output_filename = 'THA_BUR4_2022_'
+
+compression = dict(zlib=True, complevel=9)
+
+
+# ###
+# read the inventory data and convert to PM2 IF
+# ###
+df_inventory = pd.read_csv(input_folder /inventory_file, header=None)
+df_inventory = pm2.pm2io.nir_add_unit_information(
+    df_inventory, unit_row=inv_conf["unit_row"], entity_row=inv_conf["entity_row"],
+    regexp_entity=".*", regexp_unit=".*", default_unit="Gg")
+# set index and convert to long format
+df_inventory = df_inventory.set_index(inv_conf["index_cols"])
+df_inventory_long = pm2.pm2io.nir_convert_df_to_long(df_inventory, inv_conf["year"],
+                                                     inv_conf["header_long"])
+df_inventory_long["orig_cat_name"] = df_inventory_long["orig_cat_name"].str[0]
+
+# prep for conversion to PM2 IF and native format
+# make a copy of the categories row
+df_inventory_long["category"] = df_inventory_long["orig_cat_name"]
+
+# replace cat names by codes in col "category"
+# first the manual replacements
+df_inventory_long["category"] = \
+    df_inventory_long["category"].replace(inv_conf["cat_codes_manual"])
+# then the regex replacements
+repl = lambda m: m.group('code')
+df_inventory_long["category"] = \
+    df_inventory_long["category"].str.replace(inv_conf["cat_code_regexp"], repl,
+                                              regex=True)
+df_inventory_long = df_inventory_long.reset_index(drop=True)
+
+# make sure all col headers are str
+df_inventory_long.columns = df_inventory_long.columns.map(str)
+
+df_inventory_long = df_inventory_long.drop(columns=["orig_cat_name"])
+
+data_inventory_IF = pm2.pm2io.convert_long_dataframe_if(
+    df_inventory_long,
+    coords_cols=coords_cols,
+    #add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults,
+    coords_terminologies=coords_terminologies,
+    coords_value_mapping=coords_value_mapping,
+    #coords_value_filling=coords_value_filling,
+    filter_remove=filter_remove,
+    #filter_keep=filter_keep,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format="%Y",
+    )
+
+# ###
+# read the main sector time series and convert to PM2 IF
+# ###
+df_main_sector_ts = pd.read_csv(input_folder / trends_file)
+
+df_main_sector_ts = df_main_sector_ts.transpose()
+df_main_sector_ts = df_main_sector_ts.reset_index(drop=False)
+cols = df_main_sector_ts.iloc[0].copy(deep=True)
+cols.iloc[0] = "category"
+cols.iloc[1:] = cols.iloc[1:].astype(int).astype(str)
+df_main_sector_ts.columns = cols
+df_main_sector_ts = df_main_sector_ts.drop(0)
+
+# replace cat names by codes in col "category"
+df_main_sector_ts["category"] = \
+    df_main_sector_ts["category"].replace(cat_codes_manual_main_sector_ts)
+
+data_main_sector_ts_IF = pm2.pm2io.convert_wide_dataframe_if(
+    df_main_sector_ts,
+    coords_cols=coords_cols_main_sector_ts,
+    #add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults_main_sector_ts,
+    coords_terminologies=coords_terminologies,
+    coords_value_mapping=coords_value_mapping,
+    #coords_value_filling=coords_value_filling,
+    filter_remove=filter_remove,
+    #filter_keep=filter_keep,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format='%Y',
+    )
+
+
+# ###
+# read the indirect gases time series and convert to PM2 IF
+# ###
+df_indirect = pd.read_csv(input_folder / indirect_file)
+
+df_indirect = df_indirect.transpose()
+df_indirect = df_indirect.reset_index(drop=False)
+cols = df_indirect.iloc[0].copy(deep=True)
+cols.iloc[0] = "entity"
+cols.iloc[1:] = cols.iloc[1:].astype(int).astype(str)
+df_indirect.columns = cols
+df_indirect = df_indirect.drop(0)
+
+data_indirect_IF = pm2.pm2io.convert_wide_dataframe_if(
+    df_indirect,
+    coords_cols=coords_cols_indirect,
+    #add_coords_cols=add_coords_cols,
+    coords_defaults=coords_defaults_indirect,
+    coords_terminologies=coords_terminologies,
+    coords_value_mapping=coords_value_mapping,
+    #coords_value_filling=coords_value_filling,
+    #filter_remove=filter_remove,
+    #filter_keep=filter_keep,
+    meta_data=meta_data,
+    convert_str=True,
+    time_format="%Y",
+    )
+
+# ###
+# merge the three datasets
+# ###
+data_inventory_pm2 = pm2.pm2io.from_interchange_format(data_inventory_IF)
+data_main_sector_ts_pm2 = pm2.pm2io.from_interchange_format(data_main_sector_ts_IF)
+data_indirect_pm2 = pm2.pm2io.from_interchange_format(data_indirect_IF)
+
+data_all_pm2 = data_inventory_pm2.pr.merge(data_main_sector_ts_pm2)
+data_all_pm2 = data_all_pm2.pr.merge(data_indirect_pm2)
+
+data_all_if = data_all_pm2.pr.to_interchange_format()
+
+# ###
+# save raw data to IF and native format
+# ###
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw"),
+    data_all_if)
+
+encoding = {var: compression for var in data_all_pm2.data_vars}
+data_all_pm2.pr.to_netcdf(
+    output_folder / (output_filename + coords_terminologies["category"] + "_raw.nc"),
+    encoding=encoding)
+
+# ###
+# ## process the data
+# ###
+data_proc_pm2 = data_all_pm2
+
+# combine CO2 emissions and removals
+data_proc_pm2["CO2"] = data_proc_pm2[["CO2 emissions", "CO2 removals"]].pr.sum\
+    (dim="entity", skipna=True, min_count=1)
+data_proc_pm2["CO2"].attrs['entity'] = 'CO2'
+
+# actual processing
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=['CO2 emissions', 'CO2 removals'],
+    gas_baskets={},
+    processing_info_country=country_processing_step1,
+)
+
+data_proc_pm2 = process_data_for_country(
+    data_proc_pm2,
+    entities_to_ignore=[],
+    gas_baskets=gas_baskets,
+    processing_info_country=country_processing_step2,
+    cat_terminology_out = terminology_proc,
+    category_conversion = cat_conversion,
+    sectors_out = sectors_to_save,
+)
+
+# adapt source and metadata
+# TODO: processing info is present twice
+current_source = data_proc_pm2.coords["source"].values[0]
+data_temp = data_proc_pm2.pr.loc[{"source": current_source}]
+data_proc_pm2 = data_proc_pm2.pr.set("source", 'BUR_NIR', data_temp)
+
+# ###
+# save data to IF and native format
+# ###
+data_proc_if = data_proc_pm2.pr.to_interchange_format()
+if not output_folder.exists():
+    output_folder.mkdir()
+pm2.pm2io.write_interchange_format(
+    output_folder / (output_filename + terminology_proc), data_proc_if)
+
+encoding = {var: compression for var in data_proc_pm2.data_vars}
+data_proc_pm2.pr.to_netcdf(
+    output_folder / (output_filename + terminology_proc + ".nc"),
+    encoding=encoding)

+ 4 - 0
UNFCCC_GHG_data/UNFCCC_reader/folder_mapping.json

@@ -3,10 +3,14 @@
     "TWN": "Taiwan",
     "MEX": "Mexico",
     "THA": "Thailand",
+    "SGP": "Singapore",
     "ARG": "Argentina",
+    "NGA": "Nigeria",
     "MAR": "Morocco",
     "COL": "Colombia",
     "CHL": "Chile",
+    "MYS": "Malaysia",
     "MNE": "Montenegro",
+    "ISR": "Israel",
     "IDN": "Indonesia"
 }

+ 3 - 3
UNFCCC_GHG_data/UNFCCC_reader/read_UNFCCC_submission.py

@@ -27,7 +27,7 @@ print("")
 script_name = get_code_file(country, submission)
 
 if script_name is not None:
-    print(f"Found UNFCCC_GHG_data file {script_name}")
+    print(f"Found code file {script_name}")
     print("")
 
     # get possible input files
@@ -71,7 +71,7 @@ if script_name is not None:
     )
 else:
     # no UNFCCC_GHG_data found.
-    print(f"No UNFCCC_GHG_data found to read {submission} from {country}")
-    print(f"Use 'doit country_info --country={country} to get "
+    print(f"No code found to read {submission} from {country}")
+    print(f"Use 'doit country_info country={country} to get "
           f"a list of available submissions and datasets.")
 

+ 5 - 0
UNFCCC_GHG_data/helper/__init__.py

@@ -4,8 +4,10 @@ from .definitions import legacy_data_path
 from .definitions import downloaded_data_path, downloaded_data_path_UNFCCC
 from .definitions import dataset_path, dataset_path_UNFCCC
 from .definitions import custom_country_mapping, custom_folders
+from .definitions import GWP_factors, gas_baskets
 from .functions import get_country_code, get_country_name, convert_categories
 from .functions import create_folder_mapping, process_data_for_country, get_code_file
+from .functions import fix_rows
 
 __all__ = [
     "root_path",
@@ -20,9 +22,12 @@ __all__ = [
     "dataset_path_UNFCCC",
     "custom_country_mapping",
     "custom_folders",
+    "GWP_factors",
+    "gas_baskets",
     "get_country_code",
     "get_country_name",
     "convert_categories",
     "create_folder_mapping",
     "process_data_for_country",
+    "fix_rows",
 ]

+ 108 - 0
UNFCCC_GHG_data/helper/definitions.py

@@ -46,4 +46,112 @@ custom_folders = {
     'Democratic_Republic_of_the_Congo': "COD",
     'European_Union': 'EUA',
     'Taiwan': 'TWN',
+}
+
+GWP_factors = {
+    'SARGWP100_to_AR4GWP100': {
+        'HFCS': 1.1,
+        'PFCS': 1.1,
+        'UnspMixOfHFCs': 1.1,
+        'UnspMixOfPFCs': 1.1,
+        'FGASES': 1.1,
+    },
+    'SARGWP100_to_AR5GWP100': {
+        'HFCS': 1.2,
+        'PFCS': 1.2,
+        'UnspMixOfHFCs': 1.2,
+        'UnspMixOfPFCs': 1.2,
+        'FGASES': 1.2,
+    },
+    'SARGWP100_to_AR6GWP100': {
+        'HFCS': 1.4,
+        'PFCS': 1.3,
+        'UnspMixOfHFCs': 1.4,
+        'UnspMixOfPFCs': 1.3,
+        'FGASES': 1.35,
+    },
+    'AR4GWP100_to_SARGWP100': {
+        'HFCS': 0.91,
+        'PFCS': 0.91,
+        'UnspMixOfHFCs': 0.91,
+        'UnspMixOfPFCs': 0.91,
+        'FGASES': 0.91,
+    },
+    'AR4GWP100_to_AR5GWP100': {
+        'HFCS': 1.1,
+        'PFCS': 1.1,
+        'UnspMixOfHFCs': 1.1,
+        'UnspMixOfPFCs': 1.1,
+        'FGASES': 1.1,
+    },
+    'AR4GWP100_to_AR6GWP100': {
+        'HFCS': 1.27,
+        'PFCS': 1.18,
+        'UnspMixOfHFCs': 1.27,
+        'UnspMixOfPFCs': 1.18,
+        'FGASES': 1.23,
+    },
+    'AR5GWP100_to_SARGWP100': {
+        'HFCS': 0.83,
+        'PFCS': 0.83,
+        'UnspMixOfHFCs': 0.83,
+        'UnspMixOfPFCs': 0.83,
+        'FGASES': 0.83,
+    },
+    'AR5GWP100_to_AR4GWP100': {
+        'HFCS': 0.91,
+        'PFCS': 0.91,
+        'UnspMixOfHFCs': 0.91,
+        'UnspMixOfPFCs': 0.91,
+        'FGASES': 0.91,
+    },
+    'AR5GWP100_to_AR6GWP100': {
+        'HFCS': 1.17,
+        'PFCS': 1.08,
+        'UnspMixOfHFCs': 1.17,
+        'UnspMixOfPFCs': 1.08,
+        'FGASES': 1.125,
+    },
+}
+
+gas_baskets = {
+    'HFCS (SARGWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
+                     'HFC134a', 'HFC143',  'HFC143a', 'HFC152a', 'HFC227ea',
+                     'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc',  'HFC404a',
+                     'HFC407c', 'HFC410a', 'HFC4310mee', #'OTHERHFCS (SARGWP100)',
+                         'UnspMixOfHFCs (SARGWP100)'],
+    'HFCS (AR4GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
+                     'HFC134a', 'HFC143',  'HFC143a', 'HFC152a', 'HFC227ea',
+                     'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc',  'HFC404a',
+                     'HFC407c', 'HFC410a', 'HFC4310mee', 'UnspMixOfHFCs (AR4GWP100)'],
+    'HFCS (AR5GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
+                      'HFC134a', 'HFC143',  'HFC143a', 'HFC152a', 'HFC227ea',
+                      'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc',  'HFC404a',
+                      'HFC407c', 'HFC410a', 'HFC4310mee',
+                         'UnspMixOfHFCs (AR5GWP100)'],
+    'HFCS (AR6GWP100)': ['HFC23', 'HFC32', 'HFC41', 'HFC125', 'HFC134',
+                      'HFC134a', 'HFC143',  'HFC143a', 'HFC152a', 'HFC227ea',
+                      'HFC236fa', 'HFC245ca', 'HFC245fa', 'HFC365mfc',  'HFC404a',
+                      'HFC407c', 'HFC410a', 'HFC4310mee',
+                         'UnspMixOfHFCs (AR6GWP100)'],
+    'PFCS (SARGWP100)': ['C3F8', 'C4F10', 'CF4', 'C2F6', 'C6F14', 'C5F12', 'cC4F8',
+                      'UnspMixOfPFCs (SARGWP100)'],
+    'PFCS (AR4GWP100)': ['C3F8', 'C4F10', 'CF4', 'C2F6', 'C6F14', 'C5F12', 'cC4F8',
+                      'UnspMixOfPFCs (AR4GWP100)'],
+    'PFCS (AR5GWP100)': ['C3F8', 'C4F10', 'CF4', 'C2F6', 'C6F14', 'C5F12', 'cC4F8',
+                      'UnspMixOfPFCs (AR5GWP100)'],
+    'PFCS (AR6GWP100)': ['C3F8', 'C4F10', 'CF4', 'C2F6', 'C6F14', 'C5F12', 'cC4F8',
+                      'UnspMixOfPFCs (AR6GWP100)'],
+    'FGASES (SARGWP100)': ['HFCS (SARGWP100)', 'PFCS (SARGWP100)', 'SF6', 'NF3'],
+    'FGASES (AR4GWP100)': ['HFCS (AR4GWP100)', 'PFCS (AR4GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR5GWP100)':['HFCS (AR5GWP100)', 'PFCS (AR5GWP100)', 'SF6', 'NF3'],
+    'FGASES (AR6GWP100)':['HFCS (AR6GWP100)', 'PFCS (AR6GWP100)', 'SF6', 'NF3'],
+    'KYOTOGHG (SARGWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (SARGWP100)',
+                          'PFCS (SARGWP100)'],
+    'KYOTOGHG (AR4GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR4GWP100)',
+                          'PFCS (AR4GWP100)'],
+    'KYOTOGHG (AR5GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR5GWP100)',
+                            'PFCS (AR5GWP100)'],
+    'KYOTOGHG (AR6GWP100)': ['CO2', 'CH4', 'N2O', 'SF6', 'NF3', 'HFCS (AR6GWP100)',
+                            'PFCS (AR6GWP100)'],
 }

+ 74 - 13
UNFCCC_GHG_data/helper/functions.py

@@ -1,3 +1,5 @@
+import copy
+
 import pycountry
 import json
 import re
@@ -11,7 +13,7 @@ from pathlib import Path
 from .definitions import custom_country_mapping, custom_folders
 from .definitions import root_path, downloaded_data_path, extracted_data_path
 from .definitions import legacy_data_path, code_path
-
+from .definitions import GWP_factors
 
 def process_data_for_country(
         data_country: xr.Dataset,
@@ -76,7 +78,7 @@ def process_data_for_country(
     data_country = data_country.dropna(f'time', how='all')
     # remove variables only containing nan
     nan_vars_country = [var for var in data_country.data_vars if
-                        data_country[var].isnull().all().data is True]
+                        bool(data_country[var].isnull().all().data) is True]
     print(f"removing all-nan variables: {nan_vars_country}")
     data_country = data_country.drop_vars(nan_vars_country)
 
@@ -114,7 +116,7 @@ def process_data_for_country(
         # remove timeseries if desired
         if 'remove_ts' in processing_info_country:
             for case in processing_info_country['remove_ts']:
-                remove_info = processing_info_country['remove_ts'][case]
+                remove_info = copy.deepcopy(processing_info_country['remove_ts'][case])
                 entities = remove_info.pop("entities")
                 for entity in entities:
                     data_country[entity].pr.loc[remove_info] = \
@@ -128,19 +130,20 @@ def process_data_for_country(
         # subtract categories
         if 'subtract_cats' in processing_info_country:
             subtract_cats_current = processing_info_country['subtract_cats']
-            if 'entities' in subtract_cats_current.keys():
-                entities_current = subtract_cats_current['entities']
-            else:
-                entities_current = list(data_country.data_vars)
-            print(f"Subtracting categories for country {country_code}, entities "
-                  f"{entities_current}")
+            print(f"Subtracting categories for country {country_code}")
             for cat_to_generate in subtract_cats_current:
+                if 'entities' in subtract_cats_current[cat_to_generate].keys():
+                    entities_current = subtract_cats_current[cat_to_generate]['entities']
+                else:
+                    entities_current = list(data_country.data_vars)
+
                 cats_to_subtract = \
                     subtract_cats_current[cat_to_generate]['subtract']
                 data_sub = \
-                    data_country.pr.loc[{'category': cats_to_subtract}].pr.sum(
+                    data_country[entities_current].pr.loc[
+                        {'category': cats_to_subtract}].pr.sum(
                         dim='category', skipna=True, min_count=1)
-                data_parent = data_country.pr.loc[
+                data_parent = data_country[entities_current].pr.loc[
                     {'category': subtract_cats_current[cat_to_generate]['parent']}]
                 data_agg = data_parent - data_sub
                 nan_vars = [var for var in data_agg.data_vars if
@@ -228,6 +231,20 @@ def process_data_for_country(
                 else:
                     print(f"no data to aggregate category {cat_to_agg}")
 
+        # copy HFCs and PFCs with default factors
+        if 'basket_copy' in processing_info_country:
+            GWPs_to_add = processing_info_country["basket_copy"]["GWPs_to_add"]
+            entities = processing_info_country["basket_copy"]["entities"]
+            source_GWP = processing_info_country["basket_copy"]["source_GWP"]
+            for entity in entities:
+                data_source = data_country[f'{entity} ({source_GWP})']
+                for GWP in GWPs_to_add:
+                    data_GWP = data_source * \
+                               GWP_factors[f"{source_GWP}_to_{GWP}"][entity]
+                    data_GWP.attrs["entity"] = entity
+                    data_GWP.attrs["gwp_context"] = GWP
+                    data_country[f"{entity} ({GWP})"] = data_GWP
+
         # aggregate gases if desired
         if 'aggregate_gases' in processing_info_country:
             # TODO: why use different code here than below. Can this fill non-existen
@@ -338,7 +355,8 @@ def convert_categories(
 
     # redo the list of present cats after mapping, as we have new categories in the
     # target terminology now
-    cats_present_mapped = list(ds_converted.coords[f'category ({terminology_to})'])
+    cats_present_mapped = list(ds_converted.coords[f'category ('
+                                                   f'{terminology_to})'].values)
     # aggregate categories
     if 'aggregate' in conversion:
         aggregate_cats = conversion['aggregate']
@@ -807,4 +825,47 @@ def get_code_file(
     if code_file_path is not None:
         return code_file_path.relative_to(root_path)
     else:
-        return None
+        return None
+
+
+def fix_rows(data: pd.DataFrame, rows_to_fix: list, col_to_use: str, n_rows: int)->pd.DataFrame:
+    '''
+    Function to fix rows that have been split during reading from pdf
+    This is the version used for Malaysia BUR3,4. adapt for other BURs if needed
+
+    :param data:
+    :param rows_to_fix:
+    :param col_to_use:
+    :param n_rows:
+    :return:
+    '''
+    for row in rows_to_fix:
+        #print(row)
+        # find the row number and collect the row and the next two rows
+        index = data.loc[data[col_to_use] == row].index
+        #print(list(index))
+        if not list(index):
+            print(f"Can't merge split row {row}")
+            print(data[col_to_use])
+        #print(f"Merging split row {row} for table {page}")
+        loc = data.index.get_loc(index[0])
+        if n_rows == -3:
+            locs_to_merge = list(range(loc - 1, loc + 2))
+        elif n_rows == -5:
+            locs_to_merge = list(range(loc - 1, loc + 4))
+        else:
+            locs_to_merge = list(range(loc, loc + n_rows))
+        rows_to_merge = data.iloc[locs_to_merge]
+        indices_to_merge = rows_to_merge.index
+        # join the three rows
+        new_row = rows_to_merge.agg(' '.join)
+        # replace the double spaces that are created
+        # must be done here and not at the end as splits are not always
+        # the same and join would produce different col values
+        new_row = new_row.str.replace("  ", " ")
+        new_row = new_row.str.replace("N O", "NO")
+        new_row = new_row.str.replace(", N", ",N")
+        new_row = new_row.str.replace("- ", "-")
+        data.loc[indices_to_merge[0]] = new_row
+        data = data.drop(indices_to_merge[1:])
+    return data

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.csv

@@ -0,0 +1 @@
+../../../.git/annex/objects/4P/k9/MD5E-s14960507--b4abd51cedb987b20be7db439157005c.csv/MD5E-s14960507--b4abd51cedb987b20be7db439157005c.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.nc

@@ -0,0 +1 @@
+../../../.git/annex/objects/zj/mm/MD5E-s5762729--900475d5342ea889fdfafc88aa559a60.nc/MD5E-s5762729--900475d5342ea889fdfafc88aa559a60.nc

+ 41 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.yaml

@@ -0,0 +1,41 @@
+attrs:
+  references: https://di.unfccc.int
+  comment: Data read from the UNFCCC DI flexible query interface using the API. Data
+    read on 2023-05-24. Processed on 2023-07-17
+  rights: ''
+  contact: mail@johannes-guetschow.de
+  institution: United Nations Framework Convention on Climate Change (www.unfccc.int)
+  cat: category (IPCC2006_PRIMAP)
+  area: area (ISO3)
+  scen: scenario (Process_Date)
+  sec_cats:
+  - class
+  - measure
+  title: 'Data submitted by the following non-AnnexI countries and available in the
+    DI interface, converted to IPCC2006 categories and downscaled where applicable.
+    For download date see scenario. Countries: AFG, AGO, ALB, ARE, ARG, ARM, ATG,
+    AZE, BDI, BEN, BFA, BGD, BHR, BHS, BIH, BLZ, BOL, BRA, BRB, BRN, BTN, BWA, CAF,
+    CHL, CHN, CIV, CMR, COD, COG, COK, COL, COM, CPV, CRI, CUB, DJI, DMA, DOM, DZA,
+    ECU, EGY, ERI, ETH, FJI, FSM, GAB, GEO, GHA, GIN, GMB, GNB, GRD, GTM, GUY, HND,
+    HTI, IDN, IND, IRN, IRQ, ISR, JAM, JOR, KEN, KGZ, KHM, KIR, KNA, KOR, KWT, LAO,
+    LBN, LBR, LCA, LKA, LSO, MAR, MDA, MDG, MDV, MEX, MHL, MKD, MLI, MMR, MNE, MNG,
+    MOZ, MRT, MUS, MWI, MYS, NAM, NER, NGA, NIC, NIU, NPL, NRU, OMN, PAK, PAN, PER,
+    PHL, PLW, PNG, PRK, PRY, PSE, QAT, RWA, SAU, SDN, SEN, SGP, SLB, SLV, SMR, SRB,
+    SSD, STP, SUR, SWZ, SYC, SYR, TCD, TGO, THA, TJK, TKM, TLS, TON, TTO, TUN, TUV,
+    TZA, UGA, URY, UZB, VCT, VEN, VNM, VUT, WSM, YEM, ZAF, ZMB, ZWE'
+time_format: '%Y'
+dimensions:
+  '*':
+  - time
+  - source
+  - measure
+  - class
+  - scenario (Process_Date)
+  - area (ISO3)
+  - provenance
+  - category (IPCC2006_PRIMAP)
+  - entity
+  - unit
+additional_coordinates:
+  orig_cat_name: category (IPCC2006_PRIMAP)
+data_file: DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-17.csv

@@ -0,0 +1 @@
+DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-17.nc

@@ -0,0 +1 @@
+DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.nc

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-17.yaml

@@ -0,0 +1 @@
+DI_non_AnnexI_18f8be651585232bec9d1bca76c1fa04_hash.yaml

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18.csv

@@ -0,0 +1 @@
+DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18.nc

@@ -0,0 +1 @@
+DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.nc

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18.yaml

@@ -0,0 +1 @@
+DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.yaml

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18_raw.csv

@@ -0,0 +1 @@
+DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18_raw.nc

@@ -0,0 +1 @@
+DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.nc

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_2023-07-18_raw.yaml

@@ -0,0 +1 @@
+DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.yaml

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.csv

@@ -0,0 +1 @@
+../../../.git/annex/objects/j4/q5/MD5E-s15068741--c3b05d7248ef0abdec923319d1780e08.csv/MD5E-s15068741--c3b05d7248ef0abdec923319d1780e08.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.nc

@@ -0,0 +1 @@
+../../../.git/annex/objects/80/vg/MD5E-s5743047--58b8cb2cd448f29c5bc8669cf5e5f239.nc/MD5E-s5743047--58b8cb2cd448f29c5bc8669cf5e5f239.nc

+ 41 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.yaml

@@ -0,0 +1,41 @@
+attrs:
+  references: https://di.unfccc.int
+  rights: ''
+  contact: mail@johannes-guetschow.de
+  institution: United Nations Framework Convention on Climate Change (www.unfccc.int)
+  cat: category (IPCC2006_PRIMAP)
+  area: area (ISO3)
+  scen: scenario (Process_Date)
+  sec_cats:
+  - class
+  - measure
+  comment: Data read from the UNFCCC DI flexible query interface using the API. Data
+    read on 2023-05-24. Processed on 2023-07-18
+  title: 'Data submitted by the following non-AnnexI countries and available in the
+    DI interface, converted to IPCC2006 categories and downscaled where applicable.
+    For download date see scenario. Countries: AFG, AGO, ALB, ARE, ARG, ARM, ATG,
+    AZE, BDI, BEN, BFA, BGD, BHR, BHS, BIH, BLZ, BOL, BRA, BRB, BRN, BTN, BWA, CAF,
+    CHL, CHN, CIV, CMR, COD, COG, COK, COL, COM, CPV, CRI, CUB, DJI, DMA, DOM, DZA,
+    ECU, EGY, ERI, ETH, FJI, FSM, GAB, GEO, GHA, GIN, GMB, GNB, GRD, GTM, GUY, HND,
+    HTI, IDN, IND, IRN, IRQ, ISR, JAM, JOR, KEN, KGZ, KHM, KIR, KNA, KOR, KWT, LAO,
+    LBN, LBR, LCA, LKA, LSO, MAR, MDA, MDG, MDV, MEX, MHL, MKD, MLI, MMR, MNE, MNG,
+    MOZ, MRT, MUS, MWI, MYS, NAM, NER, NGA, NIC, NIU, NPL, NRU, OMN, PAK, PAN, PER,
+    PHL, PLW, PNG, PRK, PRY, PSE, QAT, RWA, SAU, SDN, SEN, SGP, SLB, SLV, SMR, SRB,
+    SSD, STP, SUR, SWZ, SYC, SYR, TCD, TGO, THA, TJK, TKM, TLS, TON, TTO, TUN, TUV,
+    TZA, UGA, URY, UZB, VCT, VEN, VNM, VUT, WSM, YEM, ZAF, ZMB, ZWE'
+time_format: '%Y'
+dimensions:
+  '*':
+  - time
+  - source
+  - measure
+  - class
+  - scenario (Process_Date)
+  - area (ISO3)
+  - provenance
+  - category (IPCC2006_PRIMAP)
+  - entity
+  - unit
+additional_coordinates:
+  orig_cat_name: category (IPCC2006_PRIMAP)
+data_file: DI_non_AnnexI_9177e6b829bcfcd93505d1355cae9ee4_hash.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.csv

@@ -0,0 +1 @@
+../../../.git/annex/objects/p3/JX/MD5E-s6413510--f2d957685308b69ea7d40b0066286f14.csv/MD5E-s6413510--f2d957685308b69ea7d40b0066286f14.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.nc

@@ -0,0 +1 @@
+../../../.git/annex/objects/W3/w5/MD5E-s3896037--789bd8caf59a6d20d94732f90607c0a6.nc/MD5E-s3896037--789bd8caf59a6d20d94732f90607c0a6.nc

+ 40 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.yaml

@@ -0,0 +1,40 @@
+attrs:
+  references: https://di.unfccc.int
+  comment: Data read from the UNFCCC DI flexible query interface using the API. Data
+    read on 2023-07-18.
+  rights: ''
+  contact: mail@johannes-guetschow.de
+  institution: United Nations Framework Convention on Climate Change (www.unfccc.int)
+  cat: category (BURDI)
+  area: area (ISO3)
+  scen: scenario (Access_Date)
+  sec_cats:
+  - class
+  - measure
+  title: 'Data submitted by the following non-AnnexI countries and available in the
+    DI interface on 2023-07-18: AFG, AGO, ALB, ARE, ARG, ARM, ATG, AZE, BDI, BEN,
+    BFA, BGD, BHR, BHS, BIH, BLZ, BOL, BRA, BRB, BRN, BTN, BWA, CAF, CHL, CHN, CIV,
+    CMR, COD, COG, COK, COL, COM, CPV, CRI, CUB, DJI, DMA, DOM, DZA, ECU, EGY, ERI,
+    ETH, FJI, FSM, GAB, GEO, GHA, GIN, GMB, GNB, GRD, GTM, GUY, HND, HTI, IDN, IND,
+    IRN, IRQ, ISR, JAM, JOR, KEN, KGZ, KHM, KIR, KNA, KOR, KWT, LAO, LBN, LBR, LCA,
+    LKA, LSO, MAR, MDA, MDG, MDV, MEX, MHL, MKD, MLI, MMR, MNE, MNG, MOZ, MRT, MUS,
+    MWI, MYS, NAM, NER, NGA, NIC, NIU, NPL, NRU, OMN, PAK, PAN, PER, PHL, PLW, PNG,
+    PRK, PRY, PSE, QAT, RWA, SAU, SDN, SEN, SGP, SLB, SLV, SMR, SRB, SSD, STP, SUR,
+    SWZ, SYC, SYR, TCD, TGO, THA, TJK, TKM, TLS, TON, TTO, TUN, TUV, TZA, UGA, URY,
+    UZB, VCT, VEN, VNM, VUT, WSM, YEM, ZAF, ZMB, ZWE'
+time_format: '%Y'
+dimensions:
+  '*':
+  - time
+  - measure
+  - scenario (Access_Date)
+  - class
+  - source
+  - provenance
+  - category (BURDI)
+  - area (ISO3)
+  - entity
+  - unit
+additional_coordinates:
+  orig_cat_name: category (BURDI)
+data_file: DI_non_AnnexI_d1e91da9f1581fbf3563fe4d276bfe1a_raw_hash.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_ef19c9a21441456740388c14aa7fe3e7_hash.csv

@@ -0,0 +1 @@
+../../../.git/annex/objects/f2/9W/MD5E-s14823877--e6c6e2db54e39d1799a3000aba66b093.csv/MD5E-s14823877--e6c6e2db54e39d1799a3000aba66b093.csv

+ 1 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_ef19c9a21441456740388c14aa7fe3e7_hash.nc

@@ -0,0 +1 @@
+../../../.git/annex/objects/jF/96/MD5E-s6456507--06919cb84addbb8c90c181391b6c42f4.nc/MD5E-s6456507--06919cb84addbb8c90c181391b6c42f4.nc

+ 41 - 0
datasets/UNFCCC/DI_non_AnnexI/DI_non_AnnexI_ef19c9a21441456740388c14aa7fe3e7_hash.yaml

@@ -0,0 +1,41 @@
+attrs:
+  references: https://di.unfccc.int
+  comment: Data read from the UNFCCC DI flexible query interface using the API. Data
+    read on 2023-05-24. Processed on 2023-07-17
+  rights: ''
+  contact: mail@johannes-guetschow.de
+  institution: United Nations Framework Convention on Climate Change (www.unfccc.int)
+  cat: category (IPCC2006_PRIMAP)
+  area: area (ISO3)
+  scen: scenario (Process_Date)
+  sec_cats:
+  - class
+  - measure
+  title: 'Data submitted by the following non-AnnexI countries and available in the
+    DI interface, converted to IPCC2006 categories and downscaled where applicable.
+    For download date see scenario. Countries: AFG, AGO, ALB, ARE, ARG, ARM, ATG,
+    AZE, BDI, BEN, BFA, BGD, BHR, BHS, BIH, BLZ, BOL, BRA, BRB, BRN, BTN, BWA, CAF,
+    CHL, CHN, CIV, CMR, COD, COG, COK, COL, COM, CPV, CRI, CUB, DJI, DMA, DOM, DZA,
+    ECU, EGY, ERI, ETH, FJI, FSM, GAB, GEO, GHA, GIN, GMB, GNB, GRD, GTM, GUY, HND,
+    HTI, IDN, IND, IRN, IRQ, ISR, JAM, JOR, KEN, KGZ, KHM, KIR, KNA, KOR, KWT, LAO,
+    LBN, LBR, LCA, LKA, LSO, MAR, MDA, MDG, MDV, MEX, MHL, MKD, MLI, MMR, MNE, MNG,
+    MOZ, MRT, MWI, MYS, NAM, NER, NGA, NIC, NIU, NPL, NRU, OMN, PAK, PAN, PER, PHL,
+    PLW, PNG, PRK, PRY, PSE, QAT, RWA, SAU, SDN, SEN, SGP, SLB, SLV, SMR, SRB, SSD,
+    STP, SUR, SWZ, SYC, SYR, TCD, TGO, THA, TJK, TKM, TLS, TON, TTO, TUN, TUV, TZA,
+    UGA, URY, UZB, VCT, VEN, VNM, VUT, WSM, YEM, ZAF, ZMB, ZWE'
+time_format: '%Y'
+dimensions:
+  '*':
+  - time
+  - source
+  - measure
+  - class
+  - scenario (Process_Date)
+  - area (ISO3)
+  - provenance
+  - category (IPCC2006_PRIMAP)
+  - entity
+  - unit
+additional_coordinates:
+  orig_cat_name: category (IPCC2006_PRIMAP)
+data_file: DI_non_AnnexI_ef19c9a21441456740388c14aa7fe3e7_hash.csv

+ 1 - 0
downloaded_data/UNFCCC/00_new_downloads_BUR-2023-07-17.csv

@@ -0,0 +1 @@
+../../.git/annex/objects/xP/wf/MD5E-s265--9710b23ebd088878ed74eac61027e24f.csv/MD5E-s265--9710b23ebd088878ed74eac61027e24f.csv

+ 1 - 0
downloaded_data/UNFCCC/00_new_downloads_CRF2023-2023-07-17.csv

@@ -0,0 +1 @@
+../../.git/annex/objects/xK/qj/MD5E-s1303--8bb86eb8b313b63a461522facfbd273a.csv/MD5E-s1303--8bb86eb8b313b63a461522facfbd273a.csv

+ 1 - 0
downloaded_data/UNFCCC/00_new_downloads_NC-2023-07-17.csv

@@ -0,0 +1 @@
+../../.git/annex/objects/wm/95/MD5E-s1--68b329da9893e34099c7d8ad5cb9c940.csv/MD5E-s1--68b329da9893e34099c7d8ad5cb9c940.csv

+ 1 - 0
downloaded_data/UNFCCC/Austria/CRF2023/asr2023_AUT.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/X9/vp/MD5E-s276584--7f3fd5f85ec5cb04497b3189ac6d8628.pdf/MD5E-s276584--7f3fd5f85ec5cb04497b3189ac6d8628.pdf

+ 1 - 0
downloaded_data/UNFCCC/Belarus/CRF2023/asr2023_BLR.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/kG/zW/MD5E-s258301--aeb45340a657a23a1e44dfd679fded5d.pdf/MD5E-s258301--aeb45340a657a23a1e44dfd679fded5d.pdf

+ 1 - 0
downloaded_data/UNFCCC/Canada/CRF2023/asr2023_CAN_0.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/9p/8G/MD5E-s258785--e711f4a130b74f561bb015a558c44767.pdf/MD5E-s258785--e711f4a130b74f561bb015a558c44767.pdf

+ 1 - 0
downloaded_data/UNFCCC/Cyprus/CRF2023/asr2023_CYP.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/6g/wW/MD5E-s254874--5eb09935ac690f0c6e81bf703bc91cf8.pdf/MD5E-s254874--5eb09935ac690f0c6e81bf703bc91cf8.pdf

+ 1 - 0
downloaded_data/UNFCCC/Guatemala/BUR1/2023_1IBA_GT.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/Qv/K4/MD5E-s5533705--f90230325af95de53d1d466076aa5e9d.pdf/MD5E-s5533705--f90230325af95de53d1d466076aa5e9d.pdf

+ 1 - 0
downloaded_data/UNFCCC/Ireland/CRF2023/asr2023_IRL.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/Qm/J5/MD5E-s272644--6da64c5f351dd440efb532c421131038.pdf/MD5E-s272644--6da64c5f351dd440efb532c421131038.pdf

+ 1 - 0
downloaded_data/UNFCCC/Kazakhstan/CRF2023/asr2023_KAZ.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/X0/7M/MD5E-s276885--708b8eabf47c537537336d9140c27d4d.pdf/MD5E-s276885--708b8eabf47c537537336d9140c27d4d.pdf

+ 1 - 0
downloaded_data/UNFCCC/Peru/BUR3/Tercer_BUR_Per%C3%BA_Jun2023.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/5P/Zj/MD5E-s15704118--2aa10bb469faa45f4801cd23876ac01a.pdf/MD5E-s15704118--2aa10bb469faa45f4801cd23876ac01a.pdf

+ 1 - 0
downloaded_data/UNFCCC/Republic_of_Korea/BUR4/1092386_Republic_of_Korea-BUR4-3-Fourth_Biennial_Update_Report_of_the_Republic_of_Korea_rev.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/xQ/pZ/MD5E-s4802196--ffaac3334785eaad5875e9fb58d101ff.pdf/MD5E-s4802196--ffaac3334785eaad5875e9fb58d101ff.pdf

+ 1 - 0
downloaded_data/UNFCCC/Russian_Federation/CRF2023/asr2023_RUS.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/ZJ/qP/MD5E-s285990--0e5f0f39159abfb913096ddefcb2cfa7.pdf/MD5E-s285990--0e5f0f39159abfb913096ddefcb2cfa7.pdf

+ 1 - 0
downloaded_data/UNFCCC/Sweden/CRF2023/asr2023_SWE.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/xv/5k/MD5E-s283722--d5810e12d54fe3392bfd70de0c72a97a.pdf/MD5E-s283722--d5810e12d54fe3392bfd70de0c72a97a.pdf

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/THA_indirect_2000-2019.csv

@@ -0,0 +1 @@
+../../../../.git/annex/objects/4V/gz/MD5E-s718--a71e1c2f5e60158552b03cdf207d9bf3.csv/MD5E-s718--a71e1c2f5e60158552b03cdf207d9bf3.csv

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/THA_inventory_2019.csv

@@ -0,0 +1 @@
+../../../../.git/annex/objects/FF/44/MD5E-s5482--63491cef34ffca8e26a86b3eaf3469dc.csv/MD5E-s5482--63491cef34ffca8e26a86b3eaf3469dc.csv

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/THA_trends_2000-2019.csv

@@ -0,0 +1 @@
+../../../../.git/annex/objects/9G/KF/MD5E-s1559--84ebb9c01164164595f6242cbd9527f4.csv/MD5E-s1559--84ebb9c01164164595f6242cbd9527f4.csv

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/indirect.pdf

@@ -0,0 +1 @@
+../../../../../.git/annex/objects/20/ff/MD5E-s417057--7b3eb95f9e2d6967b3010f40f6fe5bba.pdf/MD5E-s417057--7b3eb95f9e2d6967b3010f40f6fe5bba.pdf

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/indirect_ocr.pdf

@@ -0,0 +1 @@
+../../../../../.git/annex/objects/j9/VW/MD5E-s206616--5caadfe7d49c995a685fc02e99039429.pdf/MD5E-s206616--5caadfe7d49c995a685fc02e99039429.pdf

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/inventory_2019.pdf

@@ -0,0 +1 @@
+../../../../../.git/annex/objects/6w/Mx/MD5E-s1970438--660e0f9d59ccd629dac4368e9101a6e5.pdf/MD5E-s1970438--660e0f9d59ccd629dac4368e9101a6e5.pdf

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/inventory_2019_ocr.pdf

@@ -0,0 +1 @@
+../../../../../.git/annex/objects/gj/75/MD5E-s880391--ac488b16406850cac2659ca0df31c5aa.pdf/MD5E-s880391--ac488b16406850cac2659ca0df31c5aa.pdf

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/trends.pdf

@@ -0,0 +1 @@
+../../../../../.git/annex/objects/5X/GP/MD5E-s812736--aa9ead9eb6f6a900854fda733a559c28.pdf/MD5E-s812736--aa9ead9eb6f6a900854fda733a559c28.pdf

+ 1 - 0
downloaded_data/UNFCCC/Thailand/BUR4/processed_pdf/trends_ocr.pdf

@@ -0,0 +1 @@
+../../../../../.git/annex/objects/xg/1q/MD5E-s256479--3ea01da08f5f625b66ec849027b6fa7c.pdf/MD5E-s256479--3ea01da08f5f625b66ec849027b6fa7c.pdf

+ 1 - 0
downloaded_data/UNFCCC/Türkiye/CRF2023/asr2023_TUR.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/P1/V1/MD5E-s261195--93c632c98ab1a92b3099d5ac922b7d07.pdf/MD5E-s261195--93c632c98ab1a92b3099d5ac922b7d07.pdf

+ 1 - 0
downloaded_data/UNFCCC/Ukraine/CRF2023/asr2023_UKR.pdf

@@ -0,0 +1 @@
+../../../../.git/annex/objects/5j/zp/MD5E-s182285--cc830f8ce977a9f0e9390aada38e5f26.pdf/MD5E-s182285--cc830f8ce977a9f0e9390aada38e5f26.pdf

+ 1 - 1
downloaded_data/UNFCCC/submissions-annexI_2023.csv

@@ -1 +1 @@
-../../.git/annex/objects/82/gJ/MD5E-s21538--900b06e4a8925595abedaa6ed14b2b10.csv/MD5E-s21538--900b06e4a8925595abedaa6ed14b2b10.csv
+../../.git/annex/objects/JM/1W/MD5E-s23582--5de989b24f8aa5ec6a4ba17da2c4503d.csv/MD5E-s23582--5de989b24f8aa5ec6a4ba17da2c4503d.csv

+ 1 - 1
downloaded_data/UNFCCC/submissions-bur.csv

@@ -1 +1 @@
-../../.git/annex/objects/fQ/GP/MD5E-s48289--d044f64cda09b4fe0ff3842f29bad8a0.csv/MD5E-s48289--d044f64cda09b4fe0ff3842f29bad8a0.csv
+../../.git/annex/objects/WQ/Qk/MD5E-s48477--b0cd720e42e0a9120eb5a926849e6f22.csv/MD5E-s48477--b0cd720e42e0a9120eb5a926849e6f22.csv

+ 1 - 1
extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-05-24.csv

@@ -1 +1 @@
-AFG_DI_59a0212c38103f8d3462ec9bf72357b4_hash.csv
+AFG_DI_88365af8429188c90d963dc666e57b6e_hash.csv

+ 1 - 1
extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-05-24.nc

@@ -1 +1 @@
-AFG_DI_59a0212c38103f8d3462ec9bf72357b4_hash.nc
+AFG_DI_88365af8429188c90d963dc666e57b6e_hash.nc

+ 1 - 1
extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-05-24.yaml

@@ -1 +1 @@
-AFG_DI_59a0212c38103f8d3462ec9bf72357b4_hash.yaml
+AFG_DI_88365af8429188c90d963dc666e57b6e_hash.yaml

+ 1 - 0
extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-07-18_raw.csv

@@ -0,0 +1 @@
+AFG_DI_4f8be4fe6093240f111a4861566443fb_raw_hash.csv

+ 1 - 0
extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-07-18_raw.nc

@@ -0,0 +1 @@
+AFG_DI_4f8be4fe6093240f111a4861566443fb_raw_hash.nc

+ 1 - 0
extracted_data/UNFCCC/Afghanistan/AFG_DI_2023-07-18_raw.yaml

@@ -0,0 +1 @@
+AFG_DI_4f8be4fe6093240f111a4861566443fb_raw_hash.yaml

+ 1 - 0
extracted_data/UNFCCC/Afghanistan/AFG_DI_88365af8429188c90d963dc666e57b6e_hash.csv

@@ -0,0 +1 @@
+../../../.git/annex/objects/13/jZ/MD5E-s43984--948b6d3a326361725a9eb975e6240719.csv/MD5E-s43984--948b6d3a326361725a9eb975e6240719.csv

+ 1 - 0
extracted_data/UNFCCC/Afghanistan/AFG_DI_88365af8429188c90d963dc666e57b6e_hash.nc

@@ -0,0 +1 @@
+../../../.git/annex/objects/Wv/x1/MD5E-s141510--a5171f7eead40e554af45b716c3851f4.nc/MD5E-s141510--a5171f7eead40e554af45b716c3851f4.nc

+ 31 - 0
extracted_data/UNFCCC/Afghanistan/AFG_DI_88365af8429188c90d963dc666e57b6e_hash.yaml

@@ -0,0 +1,31 @@
+attrs:
+  references: https://di.unfccc.int
+  title: Data submitted to the UNFCCC by country Afghanistan as available in the DI
+    interface on 2023-05-24. Processed on 2023-07-17
+  comment: Data read from the UNFCCC DI flexible query interface using the API. Data
+    read on 2023-05-24. Processed on 2023-07-17
+  rights: ''
+  contact: mail@johannes-guetschow.de
+  institution: United Nations Framework Convention on Climate Change (www.unfccc.int)
+  cat: category (IPCC2006_PRIMAP)
+  area: area (ISO3)
+  scen: scenario (Access_Date)
+  sec_cats:
+  - class
+  - measure
+time_format: '%Y'
+dimensions:
+  '*':
+  - time
+  - source
+  - measure
+  - class
+  - scenario (Access_Date)
+  - area (ISO3)
+  - provenance
+  - category (IPCC2006_PRIMAP)
+  - entity
+  - unit
+additional_coordinates:
+  orig_cat_name: category (IPCC2006_PRIMAP)
+data_file: AFG_DI_88365af8429188c90d963dc666e57b6e_hash.csv

+ 1 - 1
extracted_data/UNFCCC/Albania/ALB_DI_2023-05-24.csv

@@ -1 +1 @@
-ALB_DI_95104ebaaf8be5c42606a4db2abf76d5_hash.csv
+ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.csv

+ 1 - 1
extracted_data/UNFCCC/Albania/ALB_DI_2023-05-24.nc

@@ -1 +1 @@
-ALB_DI_95104ebaaf8be5c42606a4db2abf76d5_hash.nc
+ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.nc

+ 1 - 1
extracted_data/UNFCCC/Albania/ALB_DI_2023-05-24.yaml

@@ -1 +1 @@
-ALB_DI_95104ebaaf8be5c42606a4db2abf76d5_hash.yaml
+ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.yaml

+ 1 - 0
extracted_data/UNFCCC/Albania/ALB_DI_2023-07-18_raw.csv

@@ -0,0 +1 @@
+ALB_DI_81886afef7c571b60699a44198be0042_raw_hash.csv

+ 1 - 0
extracted_data/UNFCCC/Albania/ALB_DI_2023-07-18_raw.nc

@@ -0,0 +1 @@
+ALB_DI_81886afef7c571b60699a44198be0042_raw_hash.nc

+ 1 - 0
extracted_data/UNFCCC/Albania/ALB_DI_2023-07-18_raw.yaml

@@ -0,0 +1 @@
+ALB_DI_81886afef7c571b60699a44198be0042_raw_hash.yaml

+ 1 - 0
extracted_data/UNFCCC/Albania/ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.csv

@@ -0,0 +1 @@
+../../../.git/annex/objects/Mk/9k/MD5E-s229956--f62999796cfa5678786639b31280c03a.csv/MD5E-s229956--f62999796cfa5678786639b31280c03a.csv

+ 1 - 0
extracted_data/UNFCCC/Albania/ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.nc

@@ -0,0 +1 @@
+../../../.git/annex/objects/1v/jQ/MD5E-s465883--cf9b74ac46a5882162122bafab9f9f14.nc/MD5E-s465883--cf9b74ac46a5882162122bafab9f9f14.nc

+ 31 - 0
extracted_data/UNFCCC/Albania/ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.yaml

@@ -0,0 +1,31 @@
+attrs:
+  references: https://di.unfccc.int
+  title: Data submitted to the UNFCCC by country Albania as available in the DI interface
+    on 2023-05-24. Processed on 2023-07-17
+  comment: Data read from the UNFCCC DI flexible query interface using the API. Data
+    read on 2023-05-24. Processed on 2023-07-17
+  rights: ''
+  contact: mail@johannes-guetschow.de
+  institution: United Nations Framework Convention on Climate Change (www.unfccc.int)
+  cat: category (IPCC2006_PRIMAP)
+  area: area (ISO3)
+  scen: scenario (Access_Date)
+  sec_cats:
+  - class
+  - measure
+time_format: '%Y'
+dimensions:
+  '*':
+  - time
+  - source
+  - measure
+  - class
+  - scenario (Access_Date)
+  - area (ISO3)
+  - provenance
+  - category (IPCC2006_PRIMAP)
+  - entity
+  - unit
+additional_coordinates:
+  orig_cat_name: category (IPCC2006_PRIMAP)
+data_file: ALB_DI_5cf8443b430d6e371b9f7343fa00a201_hash.csv

+ 1 - 1
extracted_data/UNFCCC/Algeria/DZA_DI_2023-05-24.csv

@@ -1 +1 @@
-DZA_DI_f39854aad0a8fc83518ae8ac43da9cbf_hash.csv
+DZA_DI_8f53edd26fd8bb6afdb774fb001c25de_hash.csv

+ 1 - 1
extracted_data/UNFCCC/Algeria/DZA_DI_2023-05-24.nc

@@ -1 +1 @@
-DZA_DI_f39854aad0a8fc83518ae8ac43da9cbf_hash.nc
+DZA_DI_8f53edd26fd8bb6afdb774fb001c25de_hash.nc

+ 1 - 1
extracted_data/UNFCCC/Algeria/DZA_DI_2023-05-24.yaml

@@ -1 +1 @@
-DZA_DI_f39854aad0a8fc83518ae8ac43da9cbf_hash.yaml
+DZA_DI_8f53edd26fd8bb6afdb774fb001c25de_hash.yaml

+ 1 - 0
extracted_data/UNFCCC/Algeria/DZA_DI_2023-07-18_raw.csv

@@ -0,0 +1 @@
+DZA_DI_1379ca063b21fcfd4914106a4a4b3f3e_raw_hash.csv

+ 1 - 0
extracted_data/UNFCCC/Algeria/DZA_DI_2023-07-18_raw.nc

@@ -0,0 +1 @@
+DZA_DI_1379ca063b21fcfd4914106a4a4b3f3e_raw_hash.nc

+ 1 - 0
extracted_data/UNFCCC/Algeria/DZA_DI_2023-07-18_raw.yaml

@@ -0,0 +1 @@
+DZA_DI_1379ca063b21fcfd4914106a4a4b3f3e_raw_hash.yaml

+ 1 - 0
extracted_data/UNFCCC/Algeria/DZA_DI_8f53edd26fd8bb6afdb774fb001c25de_hash.csv

@@ -0,0 +1 @@
+../../../.git/annex/objects/P8/9V/MD5E-s74486--c1531c3f1d729945386b23cc0bd0da51.csv/MD5E-s74486--c1531c3f1d729945386b23cc0bd0da51.csv

部分文件因为文件数量过多而无法显示