Robertson&Murphy ChemRev 1997
Robertson&Murphy ChemRev 1997
Contents
I. Introduction 1251
II. Determining the Thermodynamics of Unfolding 1252
for Globular Proteins
A. Differential Scanning Calorimetry 1253
B. Optical Spectroscopy 1253
C. Precision and Accuracy of Thermodynamic 1254
Data
III. Correlation of Unfolding Thermodynamics with 1256
Protein Structure
A. Database of Unfolding Thermodynamics for 1256
Proteins of Known Structure
B. Relationships between Unfolding 1258
Thermodynamics and Features of Protein
Structure Andrew D. Robertson was born in Manhattan Beach, CA, in 1959. He
IV. Summary 1263 received his B.A. in Biology from the University of California at San Diego
V. Acknowledgments 1266 in 1981 and his Ph.D. in Biochemistry from the University of Wisconsin,
Madison, in 1988. After postdoctoral training at Stanford University, he
VI. References 1266 joined the faculty in the Department of Biochemistry at the University of
Iowa in 1991, where he is now an Associate Professor. His major research
interest is the relationship between protein conformation and the energetics
I. Introduction of protein stability and function. Current research is focused on the
thermodynamics and kinetics of conformational interconversions in proteins
The tendency of proteins to spontaneously adopt a at the level of individual amino acid residues.
well-defined conformation in solution has intrigued
investigators for many decades.1 The key questions
in the study of this intramolecular recognition reac-
tion are the same as those driving research into
intermolecular recognition: what are the molecular
determinants of specificity and stability? The dis-
tinction between specificity and stability has a long
history in studies of intermolecular recognition (e.g.,
ref 2). In the area of protein folding, this distinction
has only recently been articulated in print.3 In the
context of the protein folding reaction, specificity for
a given polypeptide chain is reflected in the number
of distinct and well-populated conformations adopted
by the chain.4 The majority of native proteins
studied to date adopt a specific well-defined confor-
mation. The focus of this review is the relationship Kenneth P. Murphy was born in Lafayette, IN, in 1963. He received his
between the conformations of such proteins and the B.A. in Chemistry in 1986 from Metropolitan State College in Denver,
energetics of their stability. CO, and his Ph.D. in Chemistry from the University of Colorado, Boulder,
The identities of the noncovalent interactions con- in 1990. Following three years of postdoctoral studies at the Johns
Hopkins University, he was appointed Assistant Professor of Biochemistry
tributing to the stability of the native protein con- at the University of Iowa College of Medicine in 1993. His research has
formation have been established for some time,5 but focused on understanding the relationship between structure and energetics
considerable debate persists concerning whether and in protein stability and binding using calorimetry as a primary experimental
to what extent a given type of interaction favors the technique. He was awarded the Stig Sunner Memorial Award by the
native conformation.6-12 Configurational entropy is 50th Calorimetry Conference for his contributions to this field.
widely accepted as the major phenomenon opposing
protein stability, but the proposed values of this ers agree that the hydrophobic effect plays a key role
entropy range from about 17 J K-1 mol-1 per amino in stabilizing proteins, but a clear consensus defini-
acid residue to about 50 J K-1 mol-1 per residue.6,13 tion of the hydrophobic effect has not been
In contrast, Honig and Yang propose that the major reached.14-16 Nevertheless, many researchers agree
phenomenon opposing protein stability is desolvation that the hydrophobic effect contributes approximately
of polar groups upon protein folding.8 Most research- 8 kJ mol-1 per residue, on average, to the free energy
S0009-2665(96)00383-4 CCC: $28.00 © 1997 American Chemical Society
1252 Chemical Reviews, 1997, Vol. 97, No. 5 Robertson and Murphy
of unfolding of proteins at 25 °C.6,8,17 Hydrogen denatured state, D, and the native state, N. As the
bonding in proteins has been proposed to be some- experimental data in this review deal with thermal
what destabilizing,8 an indifferent or minor stabiliz- denaturation, the denatured state is operationally
ing force,11 and a principal contributor to the stability defined as the state of the protein that exists after
of the native state.6,9,12,18 thermal denaturation. The characteristics of that
Much of the disagreement derives from the neces- state, in terms of residual structure, extent of hydra-
sity of using models to interpret the thermodynamic tion, etc., remain a source of significant speculation
data for proteins in terms of specific features of and inquiry (see, e.g., refs 39-42).
protein structure.7,9 This follows from the fact that The equilibrium between the native and denatured
the number of experimental thermodynamic observ- states is defined as
ables in proteins is vanishingly small relative to the
thousands of interactions in a typical protein: in the K ) [D]/[N] (1)
best cases, the thermodynamic data consist of the and is related to the ∆Gu as
enthalpy of unfolding (∆Hu), the entropy of unfolding
(∆Su), and the heat capacity change upon unfolding ∆Gu ) -RT ln K (2)
(∆Cp). One can thus deconvolute the energetics of
protein stability with respect to atomic-level struc- where R is the universal gas constant and T is the
ture in a number of fundamentally different ways, absolute temperature. Note that eqs 1 and 2 apply
all of which will be compatible with the primary to the equilibrium between the native and denatured
thermodynamic data. states of a protein regardless of the possible presence
One approach to increasing and simplifying the of intermediate states.
information content relative to the thermodynamic The difference in Gibbs energy is dependent on
data has been to take advantage of the well- temperature according to
documented regularities in native protein struc-
tures.17,19-23 Data for many proteins of known struc- ∆Gu(T) ) ∆Hu(T) - T∆Su(T) (3)
ture have been used to derive empirical relationships
between the energetics of protein stability and fea- where ∆Hu and ∆Su are the differences in enthalpy
tures of protein structure.24-27 Similar relationships and entropy at the same temperature at which ∆Gu
have been established using thermodynamic data for is being evaluated.
model compounds, which have served as a basis for The temperature dependence of ∆Hu and ∆Su is
interpretation of and comparison with the protein defined by the heat capacity change, ∆Cp, between
data.12,28-37 the native and denatured states. The change in heat
All approaches to understanding the molecular capacity reflects the fact that the amount of heat
basis of protein stability ultimately depend on reli- required to raise the temperature of a solution of
able experimental determinations of the thermody- unfolded protein is greater than that required for a
namics of protein unfolding for proteins of known solution of folded protein of the same concentration.
structure. The number of proteins fulfilling this This increase in heat capacity upon unfolding results
criterion as of late 1996 is more than three times that primarily from restructuring of solvent.43,44 While
tabulated by either Privalov and Gill in 198838 or ∆Cp is itself slightly temperature dependent,45 the
Spolar and co-workers in 1992.27 In seeking relation- assumption of a constant ∆Cp does not lead to
ships between stability and structure, this expanded significant errors in any other parameter.38 The ∆Gu
database presents an opportunity to test the general- can thus be described as
ity of previous observations and the validity of ∆Gu(T) ) [∆Hu(TR) + ∆Cp(T - TR)] -
conclusions derived from these observations and,
perhaps, to identify trends that were not evident in T[∆Su(TR) + ∆Cp ln(T/TR)]
the smaller collection of proteins.
The focus of this review is on relationships between ) ∆Hu(TR) - T∆Su(TR) +
protein stability and protein structure that can be ∆Cp[(T - TR) - T ln(T/TR)] (4)
established with the primary observables, the ther-
modynamic parameters derived from calorimetric where TR is any convenient reference temperature.
and spectroscopic studies and the structural models If TR is equal to Tm, the midpoint for thermal
derived from X-ray crystallography and NMR spec- denaturation, then ∆Gu is equal to zero and ∆Su is
troscopy. This purely empirical approach will rely just ∆Hu/Tm. Thus eq 4 can be rewritten as
( )
on coarse but regular features of structure such as
T
solvent-exposed surface areas, secondary structure ∆Gu(T) ) ∆Hm 1 - + ∆Cp[(T - Tm) -
content, and numbers of disulfide bonds. The ques- Tm
tions at hand are (1) how much information regarding T ln(T/Tm)] (5)
the molecular origins of protein stability can be
gleaned from the protein data alone and (2) can these where ∆Hm is the value of ∆Hu at Tm. Equation 5 is
data be used to resolve some of the controversies now generally referred to as the modified Gibbs-Helm-
in the literature? holtz equation.
Experimental data are often fit to a modified form
II. Determining the Thermodynamics of Unfolding of eq 5 in which both sides are divided by -RT.
Experimental values of ln K as a function of temper-
for Globular Proteins ature can thus be fitted to yield values for Tm, ∆Hm,
The stability of a globular protein is quantified by and ∆Cp. It must be noted however that such a fit
the difference in Gibbs energy, ∆Gu, between the assumes that the experimental values are a true
Protein Structure and the Energetics of Protein Stability Chemical Reviews, 1997, Vol. 97, No. 5 1253
in the relative populations of native and denatured mental precision of the calorimetric data can be no
protein.54 The temperature dependence of ∆Gu is greater than 1 part in 50. In practice, the reproduc-
then fit to eq 5, the modified Gibbs-Helmholtz ibility in protein concentration is probably closer to
equation. 5%. Previous estimates for the minimum error in
The approach of Chen and Schellman involves determining ∆Cp range from 4% to 10%.54,59 Reported
thermal denaturation over a sufficient range of errors in determining ∆Hm range from 2% to 10%.60,61
temperature to detect heat- and cold-induced dena- In principle, the spectroscopic studies of denatur-
turation in a single thermal denaturation experi- ation and van’t Hoff analysis of calorimetric data do
ment.53,55 The data are also fit to the Gibbs- not depend on knowledge of protein concentration.
Helmholtz equation (eq 5). In the cases where this What is lost in this type of analysis is valuable
approach has been used, chemical denaturants were information concerning the possible presence of stable
added in order to observe low- and high-temperature intermediates. The least precise variable in the
transitions in the same experiment. In principle, the spectroscopic studies is likely to be the spectroscopic
fitted parameters thus reflect the thermodynamics observable. Although no systematic survey of preci-
of unfolding only in the presence of denaturant. For sion in such measurements has been published,
HPr, however, the ∆Cp obtained with this approach practical experience suggests that, at best, the preci-
was identical to that obtained using other proce- sion for a given determination may be 1 part in 100;
dures.53 The ∆Cp for the mutant T4 lysozyme studied a more accurate value may be 1 part in 20. For both
by Chen and Schellman was 9.1 kJ K-1 mol-1, similar calorimetric and spectroscopic experiments, the over-
to that obtained in the calorimetric study of wild-type
all precision for any determination is probably best
protein (Table 1).
assessed by evaluation of the fitting errors.62
One major advantage in the use of spectroscopy
over DSC to determine the thermodynamics of pro- The question of accuracy in the thermodynamic
tein unfolding is that much less protein is needed in parameters of unfolding is perhaps best addressed
the spectroscopic experiments. Sample concentra- by comparing multiple determinations for the same
tions can be as low as 0.01 mg/mL and a wider range protein (Tables 1 and 2). To some extent, this will
of concentrations can be examined, which can serve control for some of the systematic errors within
as a check for self-association reactions. Two sig- laboratories that might be associated with, for ex-
nificant disadvantages with spectroscopy are the lack ample, determining protein concentrations. For nine
of direct measures for intermediates in the unfolding of the 11 proteins for which there are multiple
process and the critical role of pre- and posttransition determinations, experiments have been performed in
baselines in fitting to obtain the thermodynamic different laboratories, but usually under similar
parameters. solution conditions. For the present discussion, rela-
The concern about baselines follows from the way tive differences in thermodynamic parameters have
in which progress through the unfolding transition been evaluated by dividing the difference between
is determined: pre- and posttransition baselines are reported values by the smaller of the reported values.
extrapolated into the observable transition zone and Three determinations are available for hen lysozyme
the relative concentrations of native and denatured and RNase A, so relative differences have been
protein are determined from the distances between calculated by dividing the standard deviation of the
the observed and extrapolated spectral values.54 For mean by the mean value.
proper evaluation of fitting errors, terms for baselines The relative differences in ∆Cp values range from
should be included in any equation used to fit the zero to about 80% for whale myoglobin, and the mean
spectroscopic data.56 relative difference is 14 ( 22%. The relative differ-
Nearly all spectroscopic studies rely on the as- ence for whale myoglobin is about four times the next
sumption of a two-state unfolding reaction. Spectro- largest difference, 19% for RNase A, and the mean
scopic tests for intermediates involve using multiple relative difference excluding whale myoglobin is 7 (
probes to follow the unfolding reaction,57 but a 6%. This value is very similar to previous estimates
negative result is only consistent with, and not proof for uncertainties in ∆Cp.54,59 Interestingly, whale
of, the absence of stable intermediates. It should be myoglobin is the only protein for which the indepen-
noted that issues of repeatability and scan rate dent determinations have been made under very
dependence discussed above in the context of DSC different solution conditions: one set of experiments
apply equally to spectroscopic techniques. were performed at acid pH while the second set were
done at alkaline pH.
C. Precision and Accuracy of Thermodynamic To facilitate comparison of ∆Hm values obtained at
Data different temperatures, the reported values have
In DSC experiments with modern calorimeters, the been extrapolated to 60 °C and reported as ∆H(60)
least precise variable is probably protein concentra- in Table 2. While this procedure propagates some
tion. The sources of uncertainty in determining of the deviations in ∆Cp values into ∆H(60), the
protein concentration are the precision of a given contributions are generally small because the ex-
method, the reproducibility of the method, and trapolations are over a short range of temperature.
systematic deviations between different methods. The For the 11 proteins for which multiple determina-
results of a recent investigation into various tech- tions have been made, the relative differences in
niques for determining concentrations and extinction ∆H(60) values range from 1% for OMTKY3 to 35%
coefficients for proteins suggest that, in the best for R-lactalbumin. The mean relative difference for
cases, the reproducibility in determining extinction multiple determinations is 12 ( 10%, which is in the
coefficients is about 2%.58 Thus, the overall experi- range of estimated experimental error.35
Protein Structure and the Energetics of Protein Stability Chemical Reviews, 1997, Vol. 97, No. 5 1255
Atanasov, B. P. Biopolymers 1971, 10, 1865. c Griko, Y. V.; Freire, E.; Privalov, P. L. Biochemistry 1994, 33, 1889. d The
thermodynamics were obtained from a global fit of data and are reported at 25 °C. e Xie, D.; Bhakuni, V.; Freire, E. Biochemistry
1991, 30, 10673. f Horvath, L. A.; Sturtevant, J. M.; Prestegard, J. H. Protein Sci. 1994, 3, 103. g Fukada, H.; Sturtevant, J. M.;
Quiocho, F. A. J. Biol. Chem. 1983, 258, 13193. h Reference 50. i Determined from optically monitored thermal melts. j Alexander,
P.; Fahnestock, S.; Lee, T.; Orban, J.; Bryan, P. Biochemistry 1992, 31, 3597. k Griko, Y. V.; Makhatadze, G. I.; Privalov, P. L.;
Hartley, R. W. Protein Sci. 1994, 3, 669. l Martinez, J. C.; El Harrous, M.; Filimonov, V. V.; Mateo, P. L.; Fersht, A. R. Biochemistry
1994, 33, 3919. m Agashe, V. R.; Udgaonkar, J. B. Biochemistry 1995, 34, 3286. n Makhatadze, G. I.; Kim, K.-S.; Woodward, C.;
Privalov, P. L. Protein Sci. 1993, 2, 2028. o Tatunashvili, L. V.; Privalov, P. L. Biofizika (USSR) 1986, 31, 578. p Jackson, S. E.;
Moracci, M.; elMasry, N.; Johnson, C. M.; Fersht, A. R. Biochemistry 1993, 32, 11259. q Pfeil, W.; Bendzko, P. Biochim. Biophys.
Acta 1980, 626, 73. r Potekhin, S.; Pfeil, W. Biophys. Chem. 1989, 34, 55. s Hagihara, Y.; Tan, Y.; Goto, Y. J. Mol. Biol. 1994, 237,
336. t Reference 52. u Liggins, J. R.; Sherman, F.; Mathews, A. J.; Nall, B. T. Biochemistry 1994, 33, 9209. v Thompson, K. S.;
Vinson, C. R.; Shuman, J. D.; Freire, E. Biochemistry 1993, 32, 5491. w Reference 53. x Makhatadze, G. I.; Clore, G. M.; Gronenborn,
A. M.; Privalov, P. L. Biochemistry 1994, 33, 9327. y Hinz, H.-J.; Cossman, M.; Beyreuther, K. FEBS Letts. 1981, 129, 246. z Kuroki,
K.; Taniyama, Y.; Seko, C.; Nakamura, H.; Kikuchi, M.; Ikehara, M. Proc. Natl. Acad. Sci. U.S.A. 1989, 86, 6903. aa Herning, T.;
Yutani, K.; Inaka, K.; Kuroki, R.; Matsushima, M.; Kikuchi, M. Biochemistry 1992, 31, 7077. bb Griko, Y. V.; Freire, E.; Privalov,
G.; Van Dael, H.; Privalov, P. L. J. Mol. Biol. 1995, 252, 447. cc Cooper, A.; Eyles, S. J.; Radford, S. E.; Dobson, C. M. J. Mol. Biol.
1992, 225, 939. dd Schwarz, F. P. Thermochim. Acta 1989, 147, 71. ee Pfeil, W.; Privalov, P. L. Biophys. Chem. 1976, 4, 23. ff Connelly,
P. R.; Ghosaini, L.; Hu, C.-Q.; Kitamura, S.; Tanaka, A.; Sturtevant, J. M. Biochemistry 1991, 30, 1887. gg Johnson, C. M.; Cooper,
A.; Stockley, P. G. Biochemistry 1992, 31, 9717. hh Kelly, L.; Holladay, L. A. Biochemistry 1990, 29, 5062. ii Privalov, P. L.; Griko,
Y. V.; Venyaminov, S. Y.; Kutyshenko, V. P. J. Mol. Biol. 1986, 190, 487. jj Swint, L.; Robertson, A. D. Protein Sci. 1993, 2, 2037.
kk Swint-Kruse, L.; Robertson, A. D. Biochemistry 1995, 34, 4724. ll Tiktopulo, E. I.; Privalov, P. L. FEBS Lett. 1978, 91, 57.
mm Filimonov, V. V.; Pfeil, W.; Tsalkova, T. N.; Privalov, P. L. Biophys. Chem. 1978, 8, 117. nn Privalov, P. L.; Mateo, P. L.;
Khechinashvili, N. N.; Stepanov, V. M.; Revina, L. P. J. Mol. Biol. 1981, 152, 445. oo Novokhatny, V. V.; Kudinov, S. A.; Privalov,
P. L. J. Mol. Biol. 1984, 179, 215. pp Plaza del Pino, I. M.; Pace, C. N.; Freire, E. Biochemistry 1992, 31, 11196. qq Yu, Y.; Makhatadze,
G. I.; Pace, C. N.; Privalov, P. L. Biochemistry 1994, 33, 3312. rr Straume, M.; Freire, E. Anal. Biochem. 1992, 203, 259. ss Privalov,
P. L.; Tiktopulo, E. I.; Khechinashvili, N. N. Int. J. Pept. Protein Res. 1973, 5, 229. tt Steif, C.; Hinz, H.-J.; Cesareni, G. Proteins:
Struct., Funct., Genet. 1995, 23, 83. uu McCrary, B. S.; Edmondson, S. P.; Shriver, J. W. J. Mol. Biol. 1996, 264, 784. vv Viguera,
A. R.; Martinez, J. C.; Filimonov, V. V.; Mateo, P. L.; Serrano, L. Biochemistry 1994, 33, 2142. ww Tanaka, A.; Flanagan, J.;
Sturtevant, J. M. Protein Sci. 1993, 2, 567. xx Zerovnik, E.; Lohner, K.; Jerala, R.; Laggner, P.; Turk, V. Eur. J. Biochem. 1992,
210, 217. yy Tamura, A.; Kimura, K.; Takahara, H.; Akasaka, K. Biochemistry 1991, 30, 11307. zz Pantoliano, M. W.; Whitlow, M.;
Wood, J. F.; Dodd, S. W.; Hardman, K. D.; Rollence, M. L.; Bryan, P. N. Biochemistry 1989, 28, 7205. aaa Renner, M.; Hinz, H.-J.;
Scharf, M.; Engels, J. W. J. Mol. Biol. 1992, 223, 769. bbb Santoro, M. M.; Bolen, D. W. Biochemistry 1992, 31, 4901. ccc Ladbury,
J. E.; Wynn, R.; Hellinga, H. W.; Sturtevant, J. M. Biochemistry 1993, 32, 7526. ddd Bae, S. J.; Chou, W. Y.; Matthews, K.; Sturtevant,
J. M. Proc. Natl. Acad. Sci. U.S.A. 1988, 85, 6731. eee Wintrode, P. L.; Makhatadze, G. I.; Privalov, P. L. Proteins: Struct., Funct.,
Genet. 1994, 18, 246.
The ∆Sm values are dependent upon pH, as de- ditional source of error when comparing ∆Sm or
scribed above, and this should introduce an ad- ∆S(60) values obtained from independent studies. In
1256 Chemical Reviews, 1997, Vol. 97, No. 5 Robertson and Murphy
Table 2. Thermodynamic Parameters Used for most sets of independent determinations were made
Regression Analysisa at similar pH values (Table 1).
name of protein ∆Cp ∆H(60) ∆S(60) ∆H* ∆S*
R-chymotrypsin 12.8 709 2570 1230 4420 III. Correlation of Unfolding Thermodynamics with
R-chymotrypsinogen 14.5 590 1760 1180 3860 Protein Structure
R-lactalbumin 7.5 260 824 564 1910
7.6 400 1292 708 2400
A. Database of Unfolding Thermodynamics for
R-lactalbumin
acyl carrier protein (apo) 3.3 185 566 320 1050
acyl carrier protein (holo) 6.4 238 705 499 1640 Proteins of Known Structure
arabinose binding protein 13.2 853 2568 1390 4480
arc repressor 6.7 337 1029 608 2000 For this review, the minimal criteria for selection
B1 of protein G 2.6 187 509 292 886 of a protein for consideration are (1) ∆Hm, ∆Cp, and
B2 of protein G 2.9 182 511 299 932 Tm values have been published, (2) the unfolding
barnase 5.8 528 1609 762 2450
barnase 6.8 589 1800 864 2790 reaction is reversible, and (3) a structural model for
barstar 6.2 230 669 483 1570 the protein, or a closely related protein, has been
BPTI 2.0 229 592 310 882 deposited in the Protein Data Bank (PDB).63,64 Ther-
carbonic anhydrase B 16.0 725 2218 1370 4530 modynamic parameters for the unfolding of 49 dif-
CI2 2.5 246 706 347 1070
cyt b5 (tryp frag) 6.0 272 790 515 1660
ferent proteins are assembled in Table 1. For 11
cytochrome c (horse) 5.0 393 1180 596 1910 different proteins, at least two independent deter-
cytochrome c (horse) 5.3 307 922 523 1700 minations either from different laboratories or made
cytochrome c (yeast iso 1) 5.7 386 1180 617 2000 using alternative methods are included. The ∆Hm
cytochrome c (yeast iso 1) 5.2 312 948 523 1700 and Tm values generally correspond to values ob-
cytochrome c (yeast iso 2) 5.2 311 947 521 1700
GCN4 3.0 230 668 350 1100 tained under conditions of maximal stability and ∆Sm
HPr 4.9 183 524 379 1230 values have been calculated by dividing ∆Hm by Tm.
IL-1β 8.0 407 1250 731 2410 This database is a work in progress and the authors
lac repressor headpiece 1.3 112 330 164 518 invite corrections and additions to Table 1.
lysozyme (human) 7.2 434 1220 724 2250
lysozyme (human) 6.6 444 1300 712 2250 To put the thermodynamic parameters on a similar
lysozyme (apo equine)b 7.6 402 1610 709 2710 footing for correlation with features of protein struc-
lysozyme (holo equine)b 7.4 361 1450 661 2530 ture, ∆Hm and ∆Sm at 60 °C (∆Hu(60) and ∆Su(60))
lysozyme (hen) 6.3 427 1280 682 2190 have been calculated using the experimental values
lysozyme (hen) 6.4 409 1210 668 2140
lysozyme (hen) 6.7 462 1410 733 2380
and ∆Cp (Table 2). This temperature was chosen
lysozyme T4 10.1 595 1830 1000 3300 because it has been used in previous studies and
met repressor 8.9 566 1730 928 3030 because it is close to the mean and median Tm values,
myoglobin (horse) 7.6 394 1180 703 2280 65.5 ((2.0) °C and 62.5 °C, respectively, reported in
myoglobin (whale) 15.6 447 1210 1080 3470 Table 1. Adjustment of ∆Hm and ∆Sm from experi-
myoglobin (whale) 8.8 399 1120 754 2380
OMTKY3 2.7 173 500 283 891 mental Tm values to 60 °C means extrapolating over
OMTKY3 2.6 175 481 280 857 as much as 44 °C, but most experimental Tm values
papain 13.7 578 1590 1130 3570 are much closer to 60 °C: the mean deviation of the
parvalbumin 5.6 332 894 559 1706 experimental Tm values from 60 °C is 5.5°.
pepsin 18.8 1069 3180 1830 5910
pepsinogen 24.1 989 2910 1970 6410 When seeking patterns in diverse collections of
plasminogen K4 domain 5.2 305 909 516 1670 protein structures, two of the most widely used
RNase T1 4.9 419 1380 616 2080 regular features of protein structure are solvent-
RNase T1 4.9 502 1500 699 2210 accessible surface areas20,25,36,37,65-67 and secondary
RNaseA 6.6 379 1140 645 2090
RNaseA 4.8 462 1300 656 2000
structure.19,21,68 Tables 3 and 4 summarize these
RNaseA 4.8 448 1340 643 2040 structural features for the proteins whose thermo-
ROP 10.3 467 1350 884 2840 dynamic parameters are reported in Table 1 and 2.
Sac7d 3.6 120 316 265 837 All of the thermodynamic values reported in Table 2
SH3 spectrin 3.3 178 523 309 994 are used in the regression analyses discussed through-
Staphylococcus nuclease 9.3 392 1200 767 2540
stefin A 7.4 245 645 545 1720 out the remainder of the review. In cases where
stefin B 6.7 359 1110 630 2080 there are multiple thermodynamic entries in Table
subtilisin inhibitor 8.5 395 1220 738 2440 2, but a single structural entry in Table 3, each of
subtilisin BPN′ 20.1 400 1210 1214 4120 the experimental entries were regressed against the
tendamistat 2.9 212 565 329 985
thioredoxin 7.0 222 596 504 1600
same structural values. In those cases where mul-
thioredoxin 7.4 249 673 548 1740 tiple structure and thermodynamic entries are given,
trp repressor 6.1 263 701 510 1590 the thermodynamic entries were regressed against
ubiquitin 3.3 208 561 343 1040 structural entries in the same order in which they
a For cases in which the proteins are derived from different are given in Tables 2 and 3.
species, the order here is the same as in Table 1. ∆H(60) and For the proteins in Table 3, the reported surface
∆S(60) are the ∆H and ∆S of unfolding at 60 °C. ∆H* is the area is the sum of the differences (∆A) between the
∆H of unfolding at 100 °C and ∆S* is the ∆S of unfolding at surface of each residue in the native protein and the
112 °C. All units are as in Table 1. b Combined data for
transitions 1 and 2. solvent accessible surface area of the same type of
amino acid residue in an Ala-Xaa-Ala extended
fact, the range of relative differences in ∆S(60) tripeptide, corrected for the effects of termini. All
values, 4-36%, is similar to that for ∆H(60). The carbon atoms are classified as apolar, while all non-
mean relative difference is 15 ((9)%, which is again carbon atoms are classified as polar. Thus the total
quite similar to the mean and standard deviations change in accessible surface area, ∆Atot, is divided
seen for ∆H(60). The lack of significant additional into the change in apolar surface area, ∆Aap, and the
uncertainty in ∆S(60) may result from the fact that change in polar surface area, ∆Apol. For the native
Protein Structure and the Energetics of Protein Stability Chemical Reviews, 1997, Vol. 97, No. 5 1257
Table 3. Surface Area Changes for the Set of Proteins Used for the Regression Analysisa
PDB ∆Aap, ∆Apol, ∆Atot, PDB ∆Aap, ∆Apol, ∆Atot,
name of protein file Nres Å2 Å2 Å2 name of protein file Nres Å2 Å2 Å2
R-chymotrypsina 5CHA 237 13808 8648 22456 met repressorbb 1CMB 208 12030 8503 20533
R-chymotrypsinogenb 2CGA 245 14012 9127 23139 myoglobin (horse)cc 1YMB 153 8884 5523 14407
R-lactalbuminc 1HMLd 123 7027 4719 11746 myoglobin (whale)dd 4MBN 153 8873 5927 14800
R-lactalbumine 1ALCf 122 6773 4814 11586 myoglobin (whale) 1MBO 153 9143 5679 14822
acyl carrier proteing 1ACP 77 3346 2755 6101 OMTKY3ee 2OVO 56 2162 1874 4036
arabinose binding proteinh 1ABE 305 19374 12160 31534 papainff 9PAP 212 13071 8692 21762
arc repressori 1ARR 106 5503 4633 10136 parvalbumingg 5CPV 108 5750 4006 9756
B1 of protein Gj 1PGB 56 2712 1944 4655 pepsinhh 5PEP 326 19584 11717 31301
B2 of protein Gk 1PGX 56 2981 2117 5098 pepsinogenii 3PSG 365 22811 14298 37108
barnasel 1BNI 108 6190 4325 10515 plasminogen K4 domainjj 1PMK 78 3801 3408 7209
barnasel 1BNJ 109 6137 4281 10417 RNase T1kk 9RNT 104 5049 3828 8878
barstar,m 1BTA 89 5506 2835 8341 RNase T1ll 8RNT 104 5126 3812 8938
BPTIn 5PTI 58 2715 1956 4671 RNaseAmm 3RN3 124 5802 5468 11271
carbonic anhydrase Bo 2CAB 256 15949 10591 26540 ROPnn 1RPR 126 6195 6737 12932
CI2p 1COA 64 3368 2198 5566 Sac7doo 1SAP 66 3357 2509 5866
cyt b5 (tryp frag)q 1CYO 88 4341 3109 7449 SH3 spectrinpp 1SHG 57 3284 1994 5278
cytochrome c (horse)r 1HRC 104 5716 3788 9504 Staphylococcus nucleaseqq 1STN 136 8049 5173 13222
cytochrome c (yeast iso 1)s 1YCC 108 5669 4074 9743 stefin Arr 1CYV 98 5120 3635 8755
cytochrome c (yeast iso 2)t 1YEA 112 5630 4320 9950 stefin Bss 1STFtt 95 5217 3508 8725
GCN4u 2ZTA 62 2939 2364 5303 subtilisin inhibitoruu 3SICvv 107 4975 3568 8543
HPrv 2HPR 87 4555 3035 7590 subtilisin BPN′ ww 2ST1 275 15672 10308 25980
IL-1βw 6I1B 153 8817 5165 13982 tendamistatxx 3AIT 74 3338 2784 6122
lac repressor headpiecex 1LCD 51 2291 1622 3913 thioredoxinyy 2TRX 108 6317 3464 9781
lysozyme (human)y 1LZ1 130 7330 5548 12877 trp repressorzz 2WRP 105 6146 4122 10268
lysozyme (hen) 1LYS 129 7024 5315 12339 trp repressorzz 3WRP 101 5956 3953 9909
lysozyme (equine)z 2EQL 129 7147 5564 12711 ubiquitinaaa 1UBQ 76 4112 2606 6717
lysozyme T4aa 2LZM 164 9709 6709 16418
† The PDB file identifiers are taken from the Brookhaven Protein Data Bank.58,59 Number of residues, N , and ∆A values
res
were determined as described in the text. a Blevins, R. A.; Tulinsky, A. J. Biol. Chem. 1985, 20, 4264. b Wang, D.; Bode, W.;
Huber, R. J. Mol. Biol. 1985, 185, 595. c Ren, J.; Acharya, K. R.; Stuart, D. I. J. Biol. Chem. 1993, 268, 19292. d X-ray structure
is for the human protein. Sequence of the human protein differs from the bovine protein at 31 out of 123 residues. e Acharya, K.
R.; Ren, J.; Stuart, D. I.; C., P. D.; Fenna, R. E. J. Mol. Biol. 1991, 221, 571. f X-ray structure is for the baboon protein. Sequence
of the baboon protein differs from the bovine protein at 37 out of 123 residues. g Kim, Y.; Prestegard, J. H. Proteins: Struct.,
Func., Genet. 1990, 8, 377. h Vyas, N. K.; Quiocho, F. A. Nature 1984, 310, 381. i Bonvin, A. M. J. J.; Vis, H.; Burgering, M. J. M.;
Breg, J. N.; Boelens, R.; Kaptein, R. J. Mol. Biol. 1994, 236, 328. j Gallagher, T.; Alexander, P.; Bryan, P.; Gilliland, G. L.
Biochemistry 1994, 33, 4721. k Achari, A.; Hale, S. P.; Howard, A. J.; Clore, G. M.; Gronenborn, A. M.; Hardman, K. D.; Whitlow,
M. Biochemistry 1992, 31, 10449. l Buckle, A. M.; Henrick, K.; Fersht, A. R. J. Mol. Biol. 1993, 234, 847. m Lubienski, M. J.;
Bycroft, M.; Freund, S. M. V.; Fersht, A. R. Biochemistry 1994, 33, 8866. n Wlodawer, A.; Walter, J.; Huber, R.; Sjolin, L. J. Mol.
Biol. 1984, 180, 301. Wlodawer, A.; Nachman, J.; Gilliland, G. L.; Gallagher, W.; Woodward, C. J. Mol. Biol. 1987, 198, 469.
o Kannan, K. K.; Ramanadham, M.; Jones, T. A. Ann. N. Y. Acad. Sci. 1984, 429, 49. p Jackson, S. E.; Moracci, M.; elMasry, N.;
Johnson, C. M.; Fersht, A. R. Biochemistry 1993, 32, 11259. q Mathews, F. S.; Argos, P.; Levine, M. Cold Spring Harbor Symp.
Quant. Biol. 1972, 36, 387. r Bushnell, G. W.; Louie, G. V.; Brayer, G. D. J. Mol. Biol. 1990, 214, 585. s Louie, G. V.; Brayer, G.
D. J. Mol. Biol. 1990, 214, 527. t Murphy, M. E. P.; Nall, B. T.; Brayer, G. D. J. Mol. Biol. 1992, 227, 160. u O’Shea, E. K.; Klemm,
J. D.; Kim, P. S.; Alber, T. Science 1991, 254, 539. v Liao, D.-I.; Herzberg, O. Structure 1994, 2, 1203. w Clore, G. M.; Wingfield,
P. T.; Gronenborn, A. M. Biochemistry 1991, 30, 2315. x Chuprina, V. P.; Rullman, J. A. C.; Lamerichs, R. M. J. N.; Van Boom,
J. H.; Boelens, R.; Kaptein, R. J. Mol. Biol. 1993, 234, 446. y Artymiuk, P. J.; Blake, C. C. F. J. Mol. Biol. 1981, 152, 737. z Tsuge,
H.; Ago, H.; Noma, M.; Nitta, K.; Sugai, S.; Miyano, M. J. Biochem. 1992, 141, 111. aa Weaver, L. H.; Matthews, B. W. J. Mol.
Biol. 1987, 193, 189. bb Rafferty, J. B.; Somers, W. S.; Saint-Girons, I.; Phillips, S. E. V. Nature 1989, 341, 705. cc Evans, S. V.;
Brayer, G. D. J. Mol. Biol. 1990, 213, 885. dd Takano, T. In Methods and Applications in Crystallographic Computing; Oxford
University Press: Oxford, 1984. ee Bode, w.; Epp, O.; Huber, R.; Laskowski, M., Jr.; Ardelt, W. Eur. J. Biochem. 1985, 147, 387.
X-ray structure is for silver pheasant which differs from the turkey sequence at one residue. ff Kamphuis, I. G.; Kalk, K. H.;
Swarte, M. B. A.; Drenth, J. J. Mol. Biol. 1984, 179, 233. gg Swain, A. L.; Kretsinger, R. H.; Amma, E. L. J. Biol. Chem. 1989, 264,
16620. hh Cooper, J. B.; Khan, G.; Taylor, G.; Tickle, I. J.; Blundell, T. L. J. Mol. Biol. 1990, 214, 199. ii Hartsuck, J. A.; Koelsch,
G.; Remington, S. J. Proteins 1992, in press. jj Padmanabhan, K.; Wu, T.-P.; Ravichandran, K. G.; Tulinsky, A. Protein Sci. 1994,
3, 898. kk Martinez-Oyanedel, J.; Choe, H.-W.; Heinemann, U.; Saenger, W. J. Mol. Biol. 1991, 222, 335. ll Ding, J.; Choe, H.-W.;
Granzin, J.; Saenger, W. Acta Crystallogr., Sect. B 1992, 48, 185. mm Howlin, B.; Moss, D. S.; Harris, G. W. Acta Crystallogr.,
Sect. A 1989, 45, 851. nn Eberle, W.; Pastore, A.; Sander, C.; Roesch, P. J. Biomol. NMR 1991, 1, 71. oo Edmondson, S. P.; Qiu, L.;
Shriver, J. W. Biochemistry 1995, 34, 13289. pp Musacchio, A.; Noble, M.; Pauptit, R.; Wierenga, R.; Saraste, M. Nature 1992,
359, 851. qq Hynes, T. R.; Fox, R. O. Proteins: Struct., Funct., Genet. 1991, 10, 92. rr Tate, S.; Ushioda, T.; Utsunomiya-Tate, N.;
Shibuya, Y.; Ohyama, Y.; Nakano, Y.; Kaji, H.; Inagaki, F.; Samejima, T.; Kainosho, M. Biochemistry 1995, 34, 14637. ss Stubbs,
M. T.; Laber, B.; Bode, W.; Huber, R.; Jerala, R.; Lenarcic, B.; Turk, V. EMBO J. 1990, 9, 1939. tt Taken from the complex with
papain. uu Takeuchi, Y.; Noguchi, S.; Satow, Y.; Kojima, S.; Kumagai, I.; Miura, K.-I.; Nakamura, K. T.; Mitsui, Y. Protein Eng.
1991, 4, 501. vv Taken from the complex with subtilisin. ww Bott, R.; Ultsch, M.; Kossiakoff, A.; Graycar, T.; Katz, B.; Power, S. J.
Biol. Chem. 1988, 263, 7895. xx Billeter, M.; Schaumann, T.; Braun, W.; Wüthrich, K. Biopolymers 1990, 29, 695. yy Katti, S. K.;
LeMaster, D. M.; Eklund, H. J. Mol. Biol. 1990, 212, 167. zz Lawson, C. L.; Zhang, R.-G.; Schevitz, R. W.; Otwinowski, Z.; Joachimiak,
A.; Sigler, P. B. Proteins: Struct., Funct., Genet. 1988, 3, 18. aaa Vijay-Kumar, S.; Bugg, C. E.; Cook, W. J. J. Mol. Biol. 1987, 194,
531.
structure, the algorithm of Lee and Richards,65 as explicitly but instead are included by using slightly
implemented in the program ACCESS (Scott R. increased atomic radii for atoms covalently bonded
Presnell, University of California at San Francisco), to hydrogens.65 Consequently, hydrogens from NMR-
has been used to determine the solvent-accessible derived structures are ignored in the calculation.
surface area using a probe radius of 1.4 Å and a slice The appropriate solvent-accessible surface area for
width of 0.25 Å. The calculations use whole-atom the denatured protein is a subject of continuing
atomic radii, i.e., hydrogen atoms are not considered discussion.69 The use of a single standard model for
1258 Chemical Reviews, 1997, Vol. 97, No. 5 Robertson and Murphy
calculated values are compared to each of the experi- Table 7. Comparison of Calculated and Experimental
mental entries. In those cases where multiple struc- Values of ∆Hua
ture and thermodynamic entries are given, the ∆Hu error, error,
comparison is made between structures and thermo- name of protein (60 °C) % ∆H* %
dynamics in the same order in which they are given R-chymotrypsin 640 -9.7 1252 2.2
in Tables 2 and 3. The percentage error is calculated R-chymotrypsinogen 680 15.3 1294 9.9
as 100 × (calculated - experimental)/experimental. R-lactalbumin 353 35.9 650 15.3
363 645
The average error in calculating ∆Cp is 4 ( 22%. The
R-lactalbumin -9.1 -9.0
acyl carrier protein (apo) 212 14.9 407 27.2
average is expected to be small as overpredictions acyl carrier protein (holo) 212 -10.9 407 -18.5
and underpredictions cancel each other, but the arabinose binding protein 901 5.6 1611 16.1
standard deviation indicates that the error in the arc repressor 358 6.1 560 -7.9
prediction is larger than the estimated experimental B1 of protein G 147 -21.2 296 1.4
B2 of protein G 160 296
error.
-12.1 -1.1
barnase 326 -38.3 571 -25.1
The ∆Hu values at 60 °C are calculated using the barnase 322 -45.3 576 -33.4
parameters for both ∆Aap and ∆Apol given in Table 5 barstar 202 -12.1 470 -2.6
BPTI 148 306
and are summarized in Table 7. The average error carbonic anhydrase B 792
-35.4
9.2 1352
-1.2
-1.4
is again small, -2.8 ( 22%, but the standard devia- CI2 164 -33.3 338 -2.6
tion is large. Table 7 also lists the error in calculat- cyt b5 (tryp frag) 235 -13.6 465 -9.7
ing ∆H* (at TH* ) 100 °C) which has an average cytochrome c (horse) 283 -28.0 549 -7.8
error of 2 ( 16%. Thus, as evident in the regression cytochrome c (horse) 283 -7.8 549 5.0
cytochrome c (yeast iso 1) 308 571
coefficients, ∆H* is better predicted than ∆Hu at 60 cytochrome c (yeast iso 1) 308
-20.3
-1.3 571
-7.6
9.1
°C. cytochrome c (yeast iso 2) 330 6.1 592 13.5
Finally, the ∆Su values at 60 °C are calculated GCN4 181 -21.0 328 -6.3
HPr 227 24.2 460 21.2
using the parameters for Nres, ∆Aap, and ∆Apol given IL-1β 378 808 10.6
in Table 5 and are summarized in Table 8. The
-7.1
lac repressor headpiece 122 9.8 269 64.1
average error is 5 ( 26%. The calculated values of lysozyme (human) 423 -2.6 687 -5.1
∆S* (at TS* ) 112 °C) are also given in the table and lysozyme (human) 423 -4.9 687 -3.5
have an average error of 2 ( 17%. Again, the values lysozyme (apo equine) 425 5.9 682 -3.9
lysozyme (holo equine) 425 17.7 682 3.1
of ∆S* are better predicted than the values of ∆Su lysozyme (hen) 405 682 0.0
at 60 °C.
-5.1
lysozyme (hen) 405 -1.0 682 2.1
One possible explanation for error in the predic- lysozyme (hen) 405 -12.5 682 -7.1
tions is deviations from the mean structural charac- lysozyme T4 505 -15.3 866 -13.7
met repressor 642 13.4 1099 18.4
teristics of the proteins. For example, greater num- myoglobin (horse) 408 3.7 808 15.0
bers of disulfide bonds are expected to lead to myoglobin (whale) 443 -0.7 808 -25.1
decreases in ∆Su, so that one might expect ∆Su to be myoglobin (whale) 420 5.3 808 7.2
overpredicted for proteins with a greater than aver- OMTKY3 145 -17.0 296 5.7
age number of disulfides. In fact, no such correlation papain 650 12.5 1120 -1.1
parvalbumin 302 571 2.1
is seen between the number of disulfides and either
-9.2
pepsin 861 -19.5 1722 -6.0
∆Su at 60 °C or ∆S* (Figure 7). The correlation pepsinogen 1059 7.0 1928 -1.9
coefficients, R2, are less than 0.05 for both cases. In plasminogen K4 domain 265 -13.0 412 -20.1
fact, no correlation of the error in either ∆Su at 60 RNase T1 292 -30.4 549 -10.8
°C or ∆S* is observed with any of the structural RNase T1 290 -42.3 549 -21.4
RNaseA 427 12.8 655 1.6
features considered here, including the fraction of the RNaseA 427 -7.5 655 -0.2
buried surface area which is polar or apolar, the RNaseA 427 -4.6 655 1.9
percentage of the residues in any secondary structure ROP 534 14.4 666 -24.7
type (i.e., R-helix, β-sheet, β-turn, or the sum of all Sac7d 191 58.9 349 31.3
SH3 spectrin 147 301
three), and the number of residues. There is also no Staphylococcus nuclease 385
-17.2
718
-2.6
correlation with the experimental parameters such
-1.9 -6.3
stefin A 274 12.0 518 -5.0
as pH or Tm. stefin B 263 -26.7 502 -20.3
The same lack of correlation of the error in predic- subtilisin inhibitor 270 -31.8 565 -23.4
subtilisin BPN′ 769 92.5 1453 19.7
tion with any structural or experimental features is tendamistat 215 1.4 391 18.9
observed for ∆Hu at 60°C, ∆H*, and ∆Cp. It is thioredoxin 250 12.8 571 13.3
somewhat surprising that no correlation of error in thioredoxin 250 0.4 571 4.2
predicting ∆Hu is found with the percentage of trp repressor 309 17.4 555 8.8
residues in any secondary structural type as such a ubiquitin 193 -7.1 402 17.1
correlation has previously been noted for a smaller a ∆H (kJ mol-1) at 60 °C was calculated as a function of
u
data set.90 In fact, the only significant correlation ∆Aap and ∆Apol, and ∆H* (i.e., ∆H at 100 °C) was calculated
we have observed is between the error in ∆Hu and as a function of Nres, using the regression coefficients in Table
5. Errors are calculated in comparison to the experimental
the error in ∆Su. This is illustrated in Figure 8a. The values in Table 2.
line is the linear least-squares fit which has a slope
of 1 and an intercept of 7 with R2 ) 0.756. This
correlation is even more evident between the error
IV. Summary
in ∆H* and the error in ∆S*, as seen in Figure 8b in What conclusions can be drawn from the regression
which the slope is 1, the intercept is 0.4 and R2 ) analyses? The first conclusion is that, from a purely
0.926. empirical standpoint, the primary determinant of
1264 Chemical Reviews, 1997, Vol. 97, No. 5 Robertson and Murphy
captures most of the important features which de- stability result from a simple sum of independent
termine protein energetics. Further evidence in favor contributions from individual interactions. Decon-
of analyses based on surface areas is the recent work volution of protein stability in terms of polar and
of Hilser and Freire:91 calculations of protein energet- nonpolar surface areas is predicated on the assump-
ics based on surface areas were successfully used to tion that the contributions from such surfaces are
predict amide hydrogen exchange behavior in a linear functions of surface area. Nonadditivity may
number of proteins. well contribute to the scatter in the calculated vs
It is also interesting to note that the thermody- observed energetics, but no straightforward approach
namics at the convergence temperatures observed is available yet for evaluating its role.
previously in a much smaller set of proteins are Electrostatic interactions are the principal long-
better predicted than the thermodynamics at 60 °C, range interactions in proteins. With a database
even though the current set of proteins do not consisting of many different proteins, differences in
convincingly show convergence behavior. This is the extent to which electrostatic interactions con-
perhaps not surprising for the entropy, as the value tribute to stability in different proteins are going to
of TS* seems to be fairly universal.33 It is not clear contribute to the error in parameters derived from
why it should also occur for TH*. regression analysis of the database.
Calculated values of ∆Cp range from 57% to 182% No direct experimental data are available for
of the experimental values (Table 6). The range for assessing the amount of new surface area that is
∆Hu at 60 °C is 55% to 159% of the experimental exposed when a protein unfolds. Analyses of protein
values while that for ∆H* is 67% to 164% (Table 7). stability with respect to solvent-exposed surface areas
Similar distributions are observed in the differences typically rely on the assumption that solvent expo-
between calculated and experimental values of ∆Sm sure in the denatured state is modeled accurately by
(Table 8). The extent to which calculated values of an extended polypeptide chain or by summing cal-
∆Hm and ∆Sm are under- or overestimated relative culated surface areas for tripeptides.26,94 The use of
to experimental values is highly correlated, which different algorithms for these calculations leads to
probably reflects the fact that experimental ∆Hu significant differences in surface areas, but use of a
values are used to calculate ∆Su values: experimen- single algorithm in deconvoluting energetics in terms
tal errors in ∆Hu are thus manifested in the relative of structure is only expected to lead to systematic
errors in ∆Su. Interestingly, distributions of differ- deviations with respect to results obtained using
ences between calculated and experimental values other algorithms.69 If, however, proteins differ in the
are broader, (16-27% at one standard deviation, extent to which their denatured states are exposed
than might be anticipated on the basis of experimen- to solvent, then considerable error will be introduced
tal error alone, which is about 10% on average (7% into the analysis regardless of the algorithm used to
for ∆Cp, 12% for ∆H(60), and 15% for ∆S(60) as noted calculate surface area.
above in section II.C). A number of investigators have argued that the
A likely explanation for the broad distribution in denatured state is not accurately modeled by an
the differences between the calculated and observed extended or random-coil polypeptide chain.41,42,95-99
parameters is inaccuracies in the model used in the Moreover, the extent of solvent exposure is proposed
regression analysis, which is based primarily on to be sensitive to solution conditions, so no one value
surface area differences. Moreover, the calculations for solvent-exposed surface area in the denatured
rely on convergence temperatures and the protein state is applicable to any protein. Is the proposed
data show considerable scatter in this regard (Figure heterogeneity in the extent of unfolding a reasonable
6). Overall, empirical correlations of energetics with explanation for the disagreement between calculated
“regular” features of protein structure give rise to and experimental values for the energetics of protein
errors that appear to exceed experimental error. The stability?
model is thus either inappropriate or incomplete. One intriguing observation in this regard is the
Because the model based on surface areas appears underestimated ∆Hu and ∆Su values for barnase and
to capture much, but not all, of the relationship RNase T1 (Table 7); these two proteins fall at the
between protein structure and the energetics of extreme low end for both parameters. Confidence in
protein stability, the simplest explanation is that the the experimental determinations is high because at
model is incomplete. Inclusion of information about least two independent determinations have been
secondary structure and disulfide bonds provides no made for each protein. Interestingly, barnase and
insight into the origin of discrepancies between RNase T1 have very similar three-dimensional struc-
calculated and observed energetics. tures in spite of the fact that their amino acid
What is missed when the energetics of protein sequences are only 14% identical. Finally, the extent
stability are decomposed in terms of changes in of unfolding in the denatured state of barnase ap-
solvent-exposed surface areas? Some possible an- pears to be high relative to other proteins.100
swers to this question are (1) nonadditivity of ener- If the extent of solvent exposure for the denatured
getic contributions from the various groups that states of barnase and RNase T1 is indeed greater
make up polar and nonpolar surfaces, (2) long-range than the average for all proteins in the database then
interactions in proteins, and (3) heterogeneity in the one would expect that, for barnase and RNase T1,
extent to which the denatured states for different the thermodynamic parameters calculated from the
proteins are exposed to solvent. The possibility of mean behavior for all proteins would be lower than
nonadditivity in protein energetics is the subject of the true values, as is observed (Table 7). However,
considerable discussion.7,92,93 The principle of addi- this behavior is not observed for ∆Cp (Table 6). In
tivity is that the observed thermodynamics of protein addition, the extent of unfolding for RNase T1 ap-
1266 Chemical Reviews, 1997, Vol. 97, No. 5 Robertson and Murphy
pears to be close to that for other proteins.95,100 (2) The Biology of Nonspecific DNA-Protein Interactions; Revzin, A.,
Ed.; CRC Press: Boca Raton, 1990.
Nevertheless, at least some of the experimental data (3) Lattman, E. E.; Rose, G. D. Proc. Natl. Acad. Sci. U.S.A. 1993,
tabulated here and presented elsewhere are consis- 90, 439.
tent with variability in the extent of unfolding for (4) Yue, K.; Dill, K. A. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 146.
(5) Kauzmann, W. Adv. Protein Chem. 1959, 14, 1.
different proteins. (6) Makhatadze, G. I.; Privalov, P. L. Adv. Protein Chem. 1995, 47,
The possibility that the denatured state of barnase 307.
is more unfolded than the average protein suggests (7) Lazaridis, T.; Archontis, G.; Karplus, M. Adv. Protein Chem.
1995, 47, 231.
that the average extent of unfolding for all proteins (8) Honig, B.; Yang, A.-S. Adv. Protein Chem. 1995, 46, 27.
is overestimated with the current algorithms. A low (9) Rose, G. D.; Wolfenden, R. Annu. Rev. Biophys. Biomol. Struct.
1993, 22, 381.
estimate for the average extent of unfolding can be (10) Creighton, T. E. Curr. Opin. Struct. Biol. 1991, 1, 5.
obtained by using barnase as a reference for dena- (11) Dill, K. A. Biochemistry 1990, 29, 7133.
tured protein that is completely exposed to solvent. (12) Habermann, S. M.; Murphy, K. P. Protein Sci. 1996, 5, 1229.
(13) Baldwin, R. L. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 8069.
In conjunction with the observation that calculated (14) Dill, K. A. Science 1990, 250, 297.
∆Hu and ∆Su values are e75% of the predicted values (15) Herzfeld, J. Science 1991, 253, 88.
(Tables 7 and 8), this suggests that the average (16) Privalov, P. L.; Gill, S. J.; Murphy, K. P. Science 1990, 250, 297.
(17) Chothia, C. Nature 1975, 254, 304.
extent of unfolding is e75% of the values calculated (18) Myers, J. K.; Pace, C. N. Biophys. J. 1996, 71, 2033.
with the model-based algorithms. This value is (19) Levitt, M.; Chothia, C. Nature 1976, 261, 552.
similar to those suggested by Lee82 and Brandts101 (20) Chothia, C. J. Mol. Biol. 1976, 105, 1.
(21) Richardson, J. Adv. Protein Chem. 1981, 34, 167.
and is consistent with the conclusions of a recent (22) Chothia, C. Annu. Rev. Biochem. 1984, 53, 537.
computational study,69 where alternative models for (23) Thornton, J. M. In Protein Folding; Creighton, T. E., Ed.; W. H.
the denatured state yielded surface areas that aver- Freeman and Co.: New York, 1992; pp 59.
(24) Privalov, P. L. Adv. Protein Chem. 1979, 33, 167.
aged about 80% of the values obtained with tripep- (25) Spolar, R. S.; Ha, J.-H.; Record, M. T., Jr. Proc. Natl. Acad. Sci.
tides. U.S.A. 1989, 86, 8382.
(26) Livingstone, J. R.; Spolar, R. S.; Record, M. T., Jr. Biochemistry
An important conclusion from this analysis is that 1991, 30, 4237.
additional refinement of the calculations and a mo- (27) Spolar, R. S.; Livingstone, J. R.; Record, M. T., Jr. Biochemistry
lecular interpretation of the regression coefficients 1992, 31, 3947.
(28) Nozaki, Y.; Tanford, C. J. Biol. Chem. 1971, 246, 2211.
are unlikely to come from the protein data them- (29) Privalov, P. L.; Makhatadze, G. I. J. Mol. Biol. 1992, 224, 715.
selves. The inability to obtain unique coefficients (30) Makhatadze, G. I.; Privalov, P. L. J. Mol. Biol. 1993, 232, 639.
which relate structural features to unfolding energet- (31) Privalov, P. L.; Makhatadze, G. I. J. Mol. Biol. 1993, 232, 660.
(32) Murphy, K. P.; Gill, S. J. J. Chem. Thermodyn. 1989, 21, 903.
ics may reflect variability in the quality of the data (33) Murphy, K. P.; Privalov, P. L.; Gill, S. J. Science 1990, 247, 559.
or variability in the validity of the assumptions across (34) Murphy, K. P.; Gill, S. J. J. Mol. Biol. 1991, 222, 699.
the data set; the latter appears to be likely. Rather (35) Murphy, K. P.; Freire, E. Adv. Protein Chem. 1992, 43, 313.
(36) Ooi, T.; Oobatake, M.; Némethy, G.; Scheraga, H. A. Proc. Natl.
than simply compile additional protein unfolding Acad. Sci. U.S.A. 1987, 84, 3086.
thermodynamics for a wide variety of proteins, it may (37) Eisenberg, D.; McLachlan, A. D. Nature 1986, 319, 199.
(38) Privalov, P. L.; Gill, S. J. Adv. Protein Chem. 1988, 39, 191.
be more promising to pursue systematic structural (39) Privalov, P. L.; Tiktopulo, E. I.; Yenyaminov, S. Y.; Griko, Y.
and calorimetric studies of single-site mutations or V.; Makhatadze, G. I.; Khechinashvili, N. N. J. Mol. Biol. 1989,
structurally homologous proteins. The idea here is 205, 727.
(40) Robertson, A. D.; Baldwin, R. L. Biochemistry 1991, 30, 9907.
that the differences between the proteins in such (41) Dill, K. A.; Shortle, D. Annu. Rev. Biochem. 1991, 60, 795.
studies would more closely conform to those of a (42) Shortle, D. FASEB J. 1996, 10, 27.
homologous series. (43) Edsall, J. T. J. Am. Chem. Soc. 1935, 57, 1506.
(44) Madan, B.; Sharp, K. J. Phys. Chem. 1996, 100, 7713.
More data concerning the denatured state are (45) Gill, S. J.; Dec, S. F.; Olofsson, G.; Wadsö, I. J. Phys. Chem 1985,
essential for progress in understanding the energetics 89, 3758.
of protein stability. In this regard, calorimetric (46) Privalov, P. L.; Potekhin, S. A. Methods Enzymol. 1986, 131, 4.
(47) Freire, E. In Protein Stability and Folding; Shirley, B. A., Ed.;
experiments appear to offer some promise.39,69 Ad- Humana Press: Totowa, NJ, 1995; Vol. 40, pp 191.
ditionally, data on protein-protein interactions,102 in (48) Christensen, J. J.; Hansen, L. D.; Izatt, R. M. Handbook of
which the structures of both the initial and final Proton Ionization Heats and Related Thermodynamic Quantities;
John Wiley and Sons: New York, 1976.
states are well determined, will probably provide less (49) Freire, E.; Biltonen, R. L. Biopolymers 1978, 17, 463.
ambiguous regression values. Finally, model com- (50) Bowie, J. U.; Sauer, R. T. Biochemistry 1989, 28, 7139.
(51) Swint, L.; Robertson, A. D. Protein Sci. 1993, 2, 2037.
pound studies will continue to be the principal means (52) Cohen, D. S.; Pielak, G. J. Protein Sci. 1994, 3, 1253.
by which precise thermodynamic values for specific (53) Scholtz, J. M. Protein Sci. 1995, 4, 35.
interactions can be determined. These studies pro- (54) Pace, C. N.; Laurents, D. V. Biochemistry 1989, 28, 2520.
(55) Chen, B.-l.; Schellman, J. A. Biochemistry 1989, 28, 685.
vide a rich framework to guide design and interpre- (56) Santoro, M. M.; Bolen, D. W. Biochemistry 1988, 27, 8063.
tation of the protein studies. (57) Kim, P. S.; Baldwin, R. L. Annu. Rev. Biochem. 1982, 51, 459.
(58) Pace, C. N.; Vajdos, F.; Fee, L.; Grimsley, G.; Gray, T. Protein
Sci. 1995, 4, 2411.
V. Acknowledgments (59) Becktel, W. J.; Schellman, J. A. Biopolymers 1987, 26, 1859.
(60) Carra, J. H.; Anderson, E. A.; Privalov, P. L. Protein Sci. 1994,
The authors thank the reviewers and Professor 3, 944.
Ken A. Dill for critical reading and helpful comments. (61) DeKoster, G. T.; Robertson, A. D. Biochemistry 1997, 36, 2323.
(62) Bevington, P. R.; Robinson, D. K. Data Reduction and Error
We also thank Dr. Wesley Stites for providing a copy Analysis for the Physical Sciences, 2nd ed.; McGraw-Hill: New
of his contribution to this volume prior to publication. York, 1992; pp 328.
The authors are grateful to the National Institutes (63) Bernstein, F. C.; Koetzle, T. F.; Williams, G. J. B.; Meyer, E. F.,
Jr.; Brice, M. D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.;
of Health, National Science Foundation, American Tasumi, M. J. Mol. Biol. 1977, 112, 535.
Chemical SocietysPetroleum Research Fund, and (64) Abola, E. E.; Bernstein, F. C.; Bryant, S. H.; Koetzle, T. F.; Weng,
the University of Iowa for support of this work. J. In Crystallographic Databases - Information Content, Software
Systems, Scientific Applications; Allen, F. H., Bergerhoff, G.,
Sievers, R., Eds.; Data Commission of the International Union
VI. References of Crystallography: Bonn/Cambridge/Chester, 1987; p 107.
(65) Lee, B.; Richards, F. M. J. Mol. Biol. 1971, 55, 379.
(1) Anfinsen, C. B. Science 1973, 181, 223. (66) Russell, R. B.; Barton, G. J. J. Mol. Biol. 1994, 244, 332.
Protein Structure and the Energetics of Protein Stability Chemical Reviews, 1997, Vol. 97, No. 5 1267
(67) Flores, T. P.; Orengo, C. A.; Moss, D. S.; Thornton, J. M. Protein (86) Barone, G.; Del Vecchio, P.; Giancola, C.; Graziano, G. Int. J.
Sci. 1993, 2, 1811. Biol. Macromol. 1995, 17, 251.
(68) Chou, P. Y.; Fasman, G. D. Annu. Rev. Biochem. 1978, 47, 251. (87) Hilser, V. J.; Gómez, J.; Freire, E. Proteins 1996, 26, 123.
(69) Creamer, T. P.; Srinivasan, R.; Rose, G. D. Biochemistry 1995, (88) Xie, D.; Freire, E. Proteins 1994, 19, 291.
34, 16245. (89) D’Aquino, J. A.; Gómez, J.; Hilser, V. J.; Lee, K. H.; Amzel, L.
(70) Colloc’h, N.; Etchebest, C.; Thoreau, E.; Henrissat, B.; Mornon, M.; Freire, E. Proteins 1996, 25, 143.
J.-P. Protein Eng. 1993, 6, 377. (90) Makhatadze, G. I.; Clore, G. M.; Gronenborn, A. M.; Privalov,
(71) Frishman, D.; Argos, P. Proteins: Struct., Funct., Genet. 1995, P. L. Biochemistry 1994, 33, 9327.
23, 566. (91) Hilser, V. J.; Freire, E. J. Mol. Biol. 1996, 262, 756.
(72) Kabsch, W.; Sander, C. Biopolymers 1983, 22, 2577. (92) Dill, K. A. J. Biol. Chem. 1997, 272, 701.
(73) Murphy, K. P.; Bhakuni, V.; Xie, D.; Freire, E. J. Mol. Biol. 1992, (93) Mark, A. E.; van Gunsteren, W. F. J. Mol. Biol. 1994, 240, 167.
227, 293. (94) Shrake, A.; Rupley, J. A. J. Mol. Biol. 1973, 79, 351.
(74) Gill, S. J.; Wadsö, I. Proc. Natl. Acad. Sci. U.S.A. 1976, 73, 2955. (95) Pace, C. N.; Laurents, D. V.; Thomson, J. A. Biochemistry 1990,
(75) Makhatadze, G. I.; Privalov, P. L. J. Mol. Biol. 1990, 213, 375. 29, 2564.
(76) Gómez, J.; Hilser, V. J.; Xie, D.; Freire, E. Proteins 1995, 22, (96) Evans, P. A.; Topping, K. D.; Woolfson, D. N.; Dobson, C. M.
404. Proteins: Struct., Funct., Genet. 1991, 9, 248.
(77) Myers, J. K.; Pace, C. N.; Scholtz, J. M. Protein. Sci. 1995, 4, (97) Sosnick, T. R.; Trewhella, J. Biochemistry 1992, 31, 8329.
2138. (98) Neri, D.; Billeter, M.; Wider, G.; Wüthrich, K. Science 1992, 257,
1559.
(78) Graziano, G.; Barone, G. J. Am. Chem. Soc. 1996, 118, 1831.
(99) Fink, A. L.; Calciano, L. J.; Goto, Y.; Kurotsu, T.; Palleros, D.
(79) Murphy, K. P.; Gill, S. J. Thermochim. Acta 1990, 172, 11. R. Biochemistry 1994, 33, 12504.
(80) Privalov, P. L.; Khechinashvili, N. N. J. Mol. Biol. 1974, 86, 665. (100) Pace, C. N.; Laurents, D. V.; Erickson, R. E. Biochemistry 1992,
(81) Doig, A. J.; Williams, D. H. Biochemistry 1992, 31, 9371. 31, 2728.
(82) Lee, B. K. Proc. Natl. Acad. Sci. U.S.A. 1991, 88, 5154. (101) Brandts, J. F. J. Am.. Chem. Soc. 1964, 86, 4302.
(83) Baldwin, R. L.; Muller, N. Proc. Natl. Acad. Sci. U.S.A. 1992, (102) Stites, W. E. Chem. Rev. 1997, 97, 1233 (accompanying article
89, 7110. in this issue).
(84) Yang, A.-S.; Sharp, K. A.; Honig, B. J. Mol. Biol. 1992, 227, 889.
(85) Murphy, K. P. Biophys. Chem. 1994, 51, 311. CR960383C
1268 Chemical Reviews, 1997, Vol. 97, No. 5 Robertson and Murphy