One Size Does Not Fit All: The Limits of Structure-Based Models in Drug Discovery.
Ross GA., Morris GM., Biggin PC.
A major goal in computational chemistry has been to discover the set of rules that can accurately predict the binding affinity of any protein-drug complex, using only a single snapshot of its three-dimensional structure. Despite the continual development of structure-based models, predictive accuracy remains low, and the fundamental factors that inhibit the inference of all-encompassing rules have yet to be fully explored. Using statistical learning theory and information theory, here we prove that even the very best generalized structure-based model is inherently limited in its accuracy, and protein-specific models are always likely to be better. Our results refute the prevailing assumption that large data sets and advanced machine learning techniques will yield accurate, universally applicable models. We anticipate that the results will aid the development of more robust virtual screening strategies and scoring function error estimations.