Intro

In this article, I want to point out some problems that I see with the way that research is presented by the IPCC.

The Intergovernmental Panel on Climate Change (IPCC) was created by the United Nations in 1988. Its purpose is to assess the current state of climate science and inform governments about what we know about climate change so that they can develop and enact policies that mitigate potential negative effects. It does not conduct its own research but rather aims to examine all literature on the topic and present an objective and neutral assessment to policy makers. Their latest assessment report, including a Summary for Policymakers (SPM), was published in 2021. This SPM is intended to be a policy-neutral summary of the full report (this is stated as one of their principles). The full report is about 1900 pages long (without annexes) and is written in a technical language that’s hard for non-scientists to digest. That’s why this SPM makes sense and I doubt that many policymakers read beyond that (if they even read or understand the SPM).

However, the SPM is not exactly policy-neutral. While they don’t explicitly recommend a specific policy, the summary makes it very clear which policies are preferable given the consequences they entail. It’s like saying “If you do this rather than that, something absolutely horrible will happen. It’s up to you though!”

I think not being policy-neutral isn’t a problem per se, I don’t even think it should be a goal of the IPCC. If there is reliable research showing that one policy is clearly worse than another, why would you want to be neutral and not recommend the better policy? However, the actual issue is that the scientific literature that’s behind the claims in the SPM is much less certain and straightforward than what is presented. This is apparent once you dig into the full report and underlying source material. It seems like the authors try to create the impression that the science is more settled than it actually is.

I will go into a number of examples to illustrate this point. In summary, what the SPM is saying is “Emitting more CO2 will have bad consequences. How bad? Extremely bad! How certain are we about this? Extremely certain!” While in truth it should be saying “Emitting more CO2 will probably have bad consequences. How bad? Could be slightly bad or very bad. How certain are we about this? We don’t know.”

Problem 1: The Fallacy of Model Democracy

While the IPCC covers a vast amount of different areas, one aspect of climate change stands out most prominently and that is global temperature. It is the cause of many other changes or at least correlates strongly with them. The future projection of temperature for different scenarios is presented in the SPM in the following graph:

Figure SPM.8 (a)

Let’s break down how this graph was created.

The data is taken from a CMIP6 (The Coupled Model Intercomparison Project Phase 6) paper, specifically from the sub-project ScenarioMIP (Scenario Model Comparison Project). As the name implies, ScenarioMIP compares models across different future scenarios. It comprises climate models developed by research groups throughout the world, for instance from the Max Planck Institute, the Met Office Hadley Centre (UK), NASA GISS, the Institute of Atmospheric Physics in China, and many more. In total there are 27 institutions contributing over 35 models to the project.

In assessing the outputs of these different models, the authors of ScenarioMIP look at the mean values across the entire ensemble, taking one output per model into account. Each model gets the same weight in this calculation, prompting the term “model democracy”: one model, one vote, so to speak. The authors provide a 90% confidence interval, i.e. they are 90% confident that the true values are within the given range. They calculate this assuming a normal distribution across model outputs and multiplying its standard deviation by 1.64 (in a normal distribution, 90% of values fall within 1.64 standard deviations of the mean).

Among the results are the global average temperature (near-surface air temperature) and the global average precipitation change (in percent), as well as their spatial patterns across the globe. In the graph above you can see the projected temperature change for 5 different scenarios. The lines are cleanly separated and it’s clear that the more emissions you have, the higher the temperature.

At first glance the approach taken seems reasonable: The authors consider a multitude of different models and calculate mean values to represent the expected outcomes, providing confidence intervals to signify uncertainties in the projections. This approach is common practice in experimental research where you are measuring variables and there is uncertainty inherent in the measurement. At first you aim to eliminate any systematic errors that would bias your measurements consistently. If you do a good job at that you can treat the remaining error as a random variable which obeys a particular probability distribution (often, a normal distribution). If you take enough measurements you can make robust statements about the most likely range of actual values of the variable you are measuring as well as about the remaining uncertainty.

However, applying this method to the evaluation of climate models fails in both aspects: you can neither reliably eliminate systematic bias nor can you treat model outputs as instantiations of a random variable. The models are interdependent due to the fact that the scientific community shares their knowledge and results and, in this specific case, often even code of the model implementation. Any bias that exists in one model will likely exist in others as well and won’t become apparent by comparing model outputs alone. Moreover, in order to assess and eliminate bias you would have to evaluate models and compare their predictions to actual outcomes. This is not possible since by the time we have the data (the year 2100) it will already be too late. Of course we could assess earlier, say in 2050, but that’s also too late since we need to put plans in action now to be able to sufficiently influence outcomes by 2100. To evaluate much sooner than 2050 is no solution either since we wouldn’t see changes significant enough to be of use to evaluate our predictions. Climate simply evolves too slowly and chaotically.

Looking beyond the problem of bias in model ensembles, we also cannot assume that model outputs follow a normal distribution (or any particular probability distribution, for that matter), for similar reasons. Models are not independent from each other. Even less so are the specific ensemble members chosen for the aggregate results. The authors chose only one member per model which discounts any variability that there may be across possible outputs.

It follows that both the mean values and the uncertainty ranges don’t carry the meaning that the authors imply with their terminology and visual presentation.

A simple thought experiment illustrates the severity of this fallacy: let’s assume one of the research groups came up with a radically different model which, for the sake of argument, was much better at predicting climate than all others. Let’s say it predicted a global temperature increase of 2° by 2100 and all other models predicted something in the range of 4-5°. Since the correct model is opposed to 30 incorrect ones, its results would be completely discounted. It would barely move the mean and would fall outside the 90% “confidence interval”. Therefore, scientific progress is masked by this way of looking at model ensembles.

Yet the approach of looking for “scientific consensus” in a democratic way is present throughout the assessment report, which I find deeply concerning. The most important scientific discoveries are deviations from—and often in complete opposition to—the prevailing consensus. Hopefully, the consensus will eventually shift to the superior theory, however this is hindered by the IPCC’s approach. Simply looking at the average of studies washes out the results of outliers. If an outlier turned out to point to the correct explanation of phenomena, we should discard all other theories in favor of it, no matter if they are in the majority. This seems too obvious to point out, but somehow it is only marginally addressed in the report (they do mention that “quantifying and accounting for model interdependence (shared building blocks) still remains challenging”, see Cross-Chapter Box Atlas.1 in the full assessment report).