Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Modelling population reference curves or normative modelling is increasingly used with the advent of large neuroimaging studies. In this paper we assess the performance of fitting methods from the perspective of clinical applications and investigate the influence of the sample size. Further, we evaluate linear and non-linear models for percentile curve estimation and highlight how the bias-variance trade-off manifests in typical neuroimaging data. We created plausible ground truth distributions of hippocampal volumes in the age range of 45 to 80 years, as an example application. Based on these distributions we repeatedly simulated samples for sizes between 50 and 50,000 data points, and for each simulated sample we fitted a range of normative models. We compared the fitted models and their variability across repetitions to the ground truth, with specific focus on the outer percentiles (1st, 5th, 10th) as these are the most clinically relevant. Our results quantify the expected decreasing trend in variance of the volume estimates with increasing sample size. However, bias in the volume estimates only decreases a modest amount, without much improvement at large sample sizes. The uncertainty of model performance is substantial for what would often be considered large samples in a neuroimaging context and rises dramatically at the ends of the age range, where fewer data points exist. Flexible models perform better across sample sizes, especially for non-linear ground truth. Surprisingly large samples of several thousand data points are needed to accurately capture outlying percentiles across the age range for applications in research and clinical settings. Performance evaluation methods should assess both bias and variance. Furthermore, caution is needed when attempting to go near the ends of the age range captured by the source data set and, as is a well known general principle, extrapolation beyond the age range should always be avoided. To help with such evaluations of normative models we have made our code available to guide researchers developing or utilising normative models.

Original publication




Journal article



Publication Date





Big data, Brain ageing, GAMLSS, MRI, Normative modelling