Daniel Busch 7 hónapja
szülő
commit
42589454b5
1 módosított fájl, 136 hozzáadás és 0 törlés
  1. 136 0
      docs/source/contribution.md

+ 136 - 0
docs/source/contribution.md

@@ -0,0 +1,136 @@
+# Data reading FAQ
+
+## How to choose which tables to read from a PDF?
+
+## What's behind `gwp_to_use`?
+
+## How do I know what's the correct GWP (`gwp_to_use`) for a report?
+
+The report should mention which GWP conversion was used. Search for `gwp` in the report.
+
+## How to choose the tolerance when merging datasets?
+
+Should be at 0.1. The idea is to separate rounding error from actual inconsistencies in the data.
+
+## How to deal with inconsistencies in a data source?
+
+It is possible that a data source contains conflicting pieces of information. Sometimes it is obvious that one of the
+values was wrongly added to the data. In other cases, it is impossible which value is more trustworthy. In that case it
+is better to leave it out completely.
+
+Sometimes the reports hold incorrect values, which become obvious when trying to merge tables, for example the main
+table's value for 1.A.2 for CO2 is 0.003 and the sector table's value for 1.A.2 and CO2 is 0.006.
+Johannes: As A rule I would say: If it's rounding errors use `pr.merge` and tolerance. If the data is actually wrong
+either correct manually if we know the correct value or set to `nan` manually if we don't know the correct value.
+
+## How to aggregate categories
+
+The aggregation of categories means combining existing categories from the source, for example the PDF document, into a
+new category and adding up their values. Some categories help to interpret the data, but are rarely included in the
+reports. For example, it can be helpful to show all emissions without the mostly negative emissions from the LULUCF
+sector in one category. For this we use `M.0.EL` - National total emissions excluding LULUCF. There is a set of
+categories that must be present in primap-hist (where can I look it up?)
+
+It is not immediately obvious which categories need to be aggregated. In principle, there should be a parent category
+for each category. For example, if category 3.A.1 has been imported, category 3.A should also be present in the final
+data set. The parent categories are often already contained in the tables in the PDFs. In addition, the following extra
+categories should also be created, if not already in the report:
+
+* `0` - National total emissions
+* `M.0.EL` - National total emissions excluding LULUCF
+* `M.LULUCF` - LULUCF
+* `M.3.D.LU` - Other (LULUCF)
+* `M.AG` - Agriculture
+* `M.AG.ELV` - Agriculture excluding livestock
+* `M.3.C.AG` - Aggregate sources and non-CO2 emissions sources on land
+* `M.3.D.AG` - Other (Agriculture)
+
+To aggregate the categories, we use the `process_data_for_country` function. The parameter `processing_info_country`
+defines the categories and their subcategories to be aggregated. The following example shows all additional categories
+and their subcategories. Another place to look up the additional categories (the ones that start with M) and their
+children-categories is ???. The parameter can be passed into the function in this format.
+
+```python
+country_processing_step1 = {
+    "aggregate_cats" : {
+        "M.3.C.AG" : {
+            "sources" : [
+                "3.C.1",
+                "3.C.2",
+                "3.C.3",
+                "3.C.4",
+                "3.C.5",
+                "3.C.6",
+                "3.C.7",
+                "3.C.8",
+            ],
+        }
+        "M.3.D.AG" : {
+            "sources" : [
+                "3.D.2"
+            ],
+            "M.AG.ELV" : {
+                "sources" : [
+                    "M.3.C.AG",
+                    "M.3.D.AG"
+                ],
+            },
+            "M.AG" : {
+                "sources" : [
+                    "3.A",
+                    "M.AG.ELV"
+                ]
+            },
+            "M.3.D.LU" : {
+                "sources" : [
+                    "3.D.1"
+                ]
+            },
+            "M.LULUCF" : {
+                "sources" : [
+                    "3.B",
+                    "M.3.D.LU"
+                ],
+            },
+            "M.0.EL" : {
+                "sources" : [
+                    "1",
+                    "2",
+                    "M.AG",
+                    "4"
+                ],
+            },
+        },
+    }
+```
+
+We can use the concept of aggregation not only to create new categories, but also to perform **consistency checks**. We
+know that the sum of categories `1`, `2`, `3` and `4` must equal the value of category `0`. If the category is already
+contained in the data and we nevertheless aggregate a new category `0` from the subcategories, the original category `0`
+and the aggregated category `0` are merged. However, this only works if the difference of the values does not exceed a
+certain tolerance. We take 1% as the default value for the rounding error. This step can be incorporated into the
+function as follows:
+
+```python
+country_processing_step1 = {
+    "aggregate_cats" : {
+        "0" : {
+            "sources" : [
+                "1",
+                "2",
+                "3",
+                "4"
+            ]
+        },
+        "3" : {
+            "sources" : [
+                "M.AG",
+                "M.LULUCF"
+            ],
+        },
+    },
+}
+```
+
+Note that all the specified sources must be either already present in the dataset or aggregated in the same function
+call.