Modelling population reference curves or normative modelling is increasingly used with the advent of large neuroimaging studies. In this paper we assess the performance of fitting methods from the perspective of clinical applications and investigate the influence of the sample size. Further, we evaluate linear and non-linear models for percentile curve estimation and highlight how the bias-variance trade-off manifests in typical neuroimaging data. We created plausible ground truth distributions of hippocampal volumes in the age range of 45 to 80 years, as an example application. Based on these distributions we repeatedly simulated samples for sizes between 50 and 50,000 data points, and for each simulated sample we fitted a range of normative models. We compared the fitted models and their variability across repetitions to the ground truth, with specific focus on the outer percentiles (1st, 5th, 10th) as these are the most clinically relevant. Our results quantify the expected decreasing trend in variance of the volume estimates with increasing sample size. However, bias in the volume estimates only decreases a modest amount, without much improvement at large sample sizes. The uncertainty of model performance is substantial for what would often be considered large samples in a neuroimaging context and rises dramatically at the ends of the age range, where fewer data points exist. Flexible models perform better across sample sizes, especially for non-linear ground truth. Surprisingly large samples of several thousand data points are needed to accurately capture outlying percentiles across the age range for applications in research and clinical settings. Performance evaluation methods should assess both bias and variance. Furthermore, caution is needed when attempting to go near the ends of the age range captured by the source data set and, as is a well known general principle, extrapolation beyond the age range should always be avoided. To help with such evaluations of normative models we have made our code available to guide researchers developing or utilising normative models.
Big data, Brain ageing, GAMLSS, MRI, Normative modelling