6 MARCH 2015 • VOL 347 ISSUE 6226 1067 SCIENCE sciencemag.org
sis to avoid a confounding dependency
on common errors that would result if
a single estimate were used throughout.
Filtering of unreliable data and estimates
of stochastic mRNA-Seq errors, in addition, allowed Jovanovic et al. to calculate
that at steady state, mRNA levels explain
68% of the variance in protein expression,
translation rates 26%, and protein degradation rates 8%. Upon stimulation of cells
with LPS, mRNA levels appear to explain
90% of the changes in protein expression,
with translation and protein degradation
explaining only 4% and 6%, respectively.
Jovanovic et al. did find, though, that upon
LPS treatment, translation and protein
degradation rates changed more for ribosomal, mitochondrial, and other highly
expressed housekeeping proteins than for
other genes, indicating an important role
for these two steps in the control of some
Battle et al. took a different tack, examining human protein variation among
62 individuals from the Yoruba population of Ibadan, Nigeria (7). Genomic DNA
sequences for each individual were compared to mRNA-Seq, ribosome footprinting
(ribosome density per mRNA), and mass
spectrometry data for lymphoblastoid cells
derived from each person. Consistent with
previous results (3), the variation in measured protein levels between individuals
correlates poorly with the variation in measured mRNA abundances (mean R2 < 0.2)
(7). However, when only those differences
in expression that are associated with vari-
ation in the DNA sequence of a nearby gene
were considered, most gene loci showing
changes in protein levels between individu-
als also showed correlated differences in
mRNA expression, consistent with a domi-
nant role for transcription. In addition,
there was “a scarcity” of DNA sequence
changes that affected only ribosome foot-
print density and protein abundance, not
mRNA levels. In effect, by constraining
their analysis to only those differences in
expression associated with DNA sequence
variation, Battle et al. excluded much of the
variation due to measurement errors to ob-
tain a more accurate answer.
Li et al. (1) (our own study) reanalyzed
data in (5) with two approaches to account
for measurement errors. In the first, a nonlinear scaling error in protein abundance
estimates (from mass spectrometry data)
was corrected using classic data from the
literature, and a subset of the other errors
in the mRNA-Seq and protein abundance
data was estimated from replica and other
control data. In the second approach, variance in translation rates measured directly
by ribosome footprinting was substituted
for a larger variance that had been inferred
indirectly with a model in (5). The first approach suggests that the variance in true
mRNA levels explains a minimum of 56%
of the variance in true protein levels. The
second implies that true mRNA levels explain 84% of the variance in true protein
expression, transcription 73%, RNA degradation 11%, and translation and protein
degradation each only 8%.
Most controllers of gene expression
identified by classic genetic or biochemical
methods are either transcription factors
or proteins (such as kinases and signaling
receptors) that directly regulate the activi-
ties of proteins, not their abundances. In
addition, translation and mRNA degrada-
tion rates change only modestly upon cel-
lular differentiation or when microRNA
expression is perturbed (10–12). More-
over, improved statistical analyses show
that in contrast to earlier studies, mRNA
levels explain most of the variance in pro-
tein abundances in yeast (13, 14). Finally,
~40% of genes in a single mammalian cell
express no mRNA (1, 15); thus, for these
~8800 genes, transcriptional repression by
chromatin is likely the sole determinant of
the absence of protein expression.
Understanding the contributions of
transcriptional versus posttranscriptional
control is not simply a matter of academic
interest. For example, variation in protein
expression among 95 colorectal tumor
samples is only poorly explained by mea-
sured mRNA abundances (2), which might
imply that different responses of patients
to anticancer treatments are posttran-
scriptional effects. If, however, most of the
variation in protein levels is controlled by
transcription but this fact is obscured by
measurement errors, then differences in
drug action could be mainly explained by
variation at the transcriptional level.
Accurate quantitation of the control
of gene expression is in its infancy. Experimental protocols with fewer inherent
biases are needed, along with further improvements in statistical methods that can
estimate and take error into account. Before gene expression can be correctly modeled, an accurate accounting of molecular
abundances and expression rate constants
is vital. ■
REFERENCES AND NOTES
1. J. J. Li, P. J. Bickel, M. D. Biggin, PeerJ 2, e270 (2014).
2. B. Zhang etal. ,Nature513, 382 (2014).
3. A. Ghazalpour et al ., PLOS Genet.7, e1001393 (2011).
4. A. R. Kristensen, J. Gsponer, L. J. Foster, Mol. Syst. Biol. 9,
5. B. Sch wanhäusser et al ., Nature 473, 337 (2011).
6. M. Jovanovic et al., Science 347, 1259038 (2015).
7. A. Battle et al ., Science 347, 664 (2015).
8. E. Ahrné, L. Molzahn, T. Glatter, A. Schmidt, Proteomics 13,
9. M.S.Cheung, T.A.Down, I.Latorre,J.Ahringer, Nucleic
Acids Res. 39, e103 (2011).
10. D.Baek et al., Nature 455,64(2008).
11. M.Selbach et al., Nature 455,58(2008).
12. N. T. Ingolia, L. F. Lareau, J. S. Weissman, Cell 147, 789
13. G. Csardi et al ., http://biorxiv.org/content/
14. F. W.Albert, D.Muzzey,J.S. Weissman, L.Kruglyak,PLOS
Genet. 10, e1004692 (2014).
15. D.Hebenstreit, A.Deonarine, M.M.Babu,S.A. Teichmann,
Curr. Opin. Cell Biol. 24, 350 (2012).
ACKNOWLEDGMEN TS: J. J. L. was supported in part
by the Department of Statistics at UCLA. Work at Lawrence
Berkeley Laboratory National Laboratory was conducted under
U.S. Department of Energy contract DEAC02-05CH11231.
Control of protein expression. The charts show the percent contributions of the variance in the rates of each step
in gene expression to the variance in protein abundance for 4212 genes (from a mouse cell line). The left chart shows
estimates from ( 5); the right chart shows estimates from ( 1) that take into account stochastic and systematic errors
in the abundance data of ( 5).
Contribution to protein levels
Original data estimates Error-corrected estimates
degradation 5% Protein