form multivalent interactions and hence protein
networks and gels. Hundreds of proteins house
three or more LARKS (Fig. 3B). The 400 human
LCDs most enriched in LARKS average 14 LARKS,
with a median of 10 LARKS.
We assigned cellular function to these 400
proteins based on their UniProt annotations
(Fig. 3C): 16% are DNA binding, 17% are RNA
binding, and 4% are nucleotide binding, consistent with reports of nucleotide binding proteins in membraneless organelles (2, 8). Keratins
(5%), keratin-associated (9%), and cornified envelope proteins (4%) are also enriched in LARKS.
The finding of keratins is consistent with experiments (28) showing that keratin granules are
trafficked to the cell cortex, where they merge
and eventually mature into filaments. Also rich
in LARKS are proteins found in ribonucleoprotein
particles such as the spliceosome or nucleolus
(Fig. 4). Nucleoporins including nup54 and nup98
with FG repeats are enriched in predicted LARKS,
and purified FG repeats form a hydrogel (27, 29).
The possibility that the FG repeats of nucleoporins may form LARKS in the diffusion barrier
of the pore is supported by our structure of
GFGNFGTS from nup98. We assigned additional
cellular functions to these 400 proteins from
their associated gene ontology (GO) terms. We
found GO terms enriched in the human proteome
700 9 FEBRUARY 2018 • VOL 359 ISSUE 6376 sciencemag.org SCIENCE
Fig. 3. Three-dimensional profiling to identify LARKS in LCDs of
human proteins. (A) Side chains are removed from the backbones of
one of our atomic structures of a LARKS. Then the sequence of interest
(hnRNPA2 shown) is threaded through the six-residue template by
placing the query side chains on the template backbone. Side chains are
repacked and a Rosetta energy function is used to estimate whether the
structure is favorable for the threaded sequence. The sequence then advances
through the template by one-residue increments, producing successive
models. (B) The frequency of the number of LARKS in 1725 human proteins
predicted to house at least two LARKS. Proteins having two or more LARKS are
predicted to have the capacity to form networks and possibly gels. (C) The
annotated functions of the 400 proteins with the most predicted LARKS.
Fig. 4. Functions of proteins among the 400 proteins most enriched in LARKS and dynamic intracellular bodies of which they are known to be a part.