The devil is in the detail: Unstable response functions in species distribution models challenge bulk ensemble modelling
Hannemann H., Willis KJ., Macias-Fauria M.
© 2016 John Wiley & Sons Ltd. Aim: Species distribution models (SDMs) are commonly used to determine threats to biodiversity and opportunities under climate change. Despite SDMs being based on the assumption of complete knowledge of the climate space of the modelled species, truncated occurrence datasets (and hence truncated climate spaces) such as national inventories are often employed. This may lead to prediction errors, which have been proposed to stem from: (1) the degree of climate space truncation and/or (2) instability of the modelling algorithms. Our aim was to explore the potential causes of prediction errors in SDMs using truncated training datasets. Location: Europe 11°W-32°E, 34°-72°N. Methods: SDMs employing commonly used bioclimatic variables were applied to seven forest tree species. We created two model training datasets covering: (1) Germany only (significantly truncated climate space) and (2) Europe (minimally truncated climate space). Differences between the climate space represented by Germany-only and European data were measured on two-dimensional climate spaces obtained through principal component analysis of the bioclimatic variables. Seven SDM algorithms were run, and the stability of the response function and variable selection for each species and model type were analysed. Results: The degree of climate space truncation was less important for model performance than the instability of model algorithms and indiscriminate variable selection. The latter led to irrelevant relationships of species occurrence with bioclimatic variables. These instabilities caused pronounced prediction errors. Main conclusions: Our results strongly suggest that erroneous model predictions stem from instability and ecological irrelevance of the statistical functions relating the probability of a species' occurrence to bioclimatic variables, compounded by a lack of consistency in variable selection. Models displaying these characteristics showed lower overall performance when trained with truncated datasets. Further, commonly used ensemble approaches do not compensate for the shortfalls of individual models. Detailed model-by-model and species-by-species analysis of response functions and variable importance is recommended.