SESCA: Structure-Based Empirical Spectrum Calculation Algorithm
Collaborations: Nykola Jones and Søren V. Hoffmann, ISA, Department of Physics & Astronomy, Aarhus University, DK
Funding: Alexander von Humboldt Foundation
Electronic circular dichroism (CD) spectroscopy is highly sensitive to changes in the backbone structure of proteins, and does not require special labelling, crystallization, or high protein concentrations. Therefore, it is one of the first methods applied to characterize proteins, and can be combined with other methods such as stopped-flow techniques to study the kinetics and reaction mechanism of proteins. SESCA is a computational method that allows the rapid and accurate prediction of the CD spectra of protein models from their three-dimensional structure. SESCA predictions are based on two things: the secondary structure composition of the proposed protein models, and a set of pre-calculated basis spectra (basis set). The Basis spectra encode the average CD contribution of secondary structure elements, derived from the known structures and CD spectra of an experimental reference protein set.
SESCA calculations allow a direct comparison between the measured CD spectrum of a target protein and the predicted CD spectra of model structures or structural ensembles for determining model quality. Although the CD spectrum of a typical globular protein can often be determined accurately from the secondary structure composition of its crystal structure, we applied several modifications to the original scheme to improve the prediction accuracy for short peptides and intrinsically disordered proteins. These modifications include scaling the spectrum intensity to account for normalization errors, basis spectra to address the contribution of amino acid side chains, and the use of structural ensembles to account for protein flexibility and conformation heterogeneity.
SESCA update on model validation
In our second publication (Nagy & Grubmüller, 2020a), we focus on the accuracy and precision of circular dicrhoism based model validation methods with respect to the experimental noise of the measured CD spectrum. This study allows us to determine typical deviations between the secondary structure signal and the measured CD spectrum. The updated version of SESCA uses typical CD deviations to estimate the expected secondary structure error of proposed protein models at a higher precision.
Bayesian secondary structure estimation
The precision of SESCA is enhanced further by using a Bayesian statistics approach to determine the likelihood of possible secondary structures of a target protein based on its measured CD spectrum. This likelihood is determined from the joint probability distribution of the two major CD deviations, namely, scaling errors and non-secondary-structure contributions. The Bayesian algorithm described in our third publication (Nagy & Grubmüller, 2020b) aids structural model validation by providing a more precise estimate on the protein secondary structure composition, and by allowing an easy comparison between the likelihood of proposed structural models, based on the estimated CD deviations of their predicted CD spectrum.