havior in a given organism is difficult enough,
but doing so for many behaviors has been almost
impossible. The first step in mapping a circuit
that mediates a behavior is to identify neurons
whose activity is causally related to the behavior.
Such a list of neurons provides a starting point
for identifying the connectivity patterns between
the relevant neurons. Thus, to map circuits underlying many behaviors, one would need a comprehensive neuron-behavior atlas of the nervous
system that would list all neurons causally related with each behavior.
Generating such neuron-behavior maps has
been difficult for several reasons. First, the experimental tools to selectively manipulate small
sets of neurons while simultaneously observing
natural behavior were lacking. Fortunately, recent
advances in genetic toolkits allow reasonably selective manipulation of neuron types in genetic model organisms, such as Drosophila (1–3).
Advances in behavior tracking methods allow
high-resolution monitoring of the effect of such
manipulations (4, 5). Neural manipulation screens
can therefore be coupled with high-resolution
monitoring of motor outputs to causally link complex behaviors to correspondingly complex neural circuits.
Second, establishing the causal links between
neural manipulations and the resulting time-varying
behavioral responses is a daunting computational
statistics challenge. Existing supervised machine-
learning methods can detect only predetermined
behaviors (6); moreover, they are limited by the
speed with which humans can annotate training
data sets. An alternative approach uses unsuper-
vised clustering of the multidimensional time series.
However, the high-content and high-throughput
nature of the time-varying behavior data presents
both computational and statistical challenges.
We developed a methodology for data-driven
neuron-behavior mapping and applied it to larval
Drosophila. The nervous system of larval Drosoph-
ila consists of a well-developed brain and nerve
cord containing only about 10,000 neurons, ren-
dering it sufficiently simple to obtain a relatively
comprehensive characterization of it. Moreover,
there exist more than 1000 genetic GAL4 lines in
Drosophila larvae with recently characterized sparse
neuronal expression patterns that together cover
most of the 10,000 neurons in the larval nervous
system ( http://flweb.janelia.org/cgi-bin/flew.cgi) (3).
Optogenetic Neural Activation Screen
We designed an optogenetic neural activation
screen (see supplementary materials) to obtain a
neuron line–behavior atlas of the larval nervous
system that would contain causal links between
neuron lines and the motor patterns they control.
We used 1049 distinct GAL4 lines to selectively
target channelrhodopsin-2 (ChR2) (7) to sparse
distinct subsets of neurons, with each line activating 2 to ~15 neurons. Because these lines
essentially span the entire set of larval neurons,
some lines activate sensory and motor neurons as
well as many neurons involved in decisions and
action selection. We included four positive control lines in the screen that drive expression in
nociceptive, mechanosensory, and proprioceptive
neurons, previously determined to reliably mediate distinct behaviors (8–10), as well as one
negative control line in which no neurons were
optogenetically activated (2, 10), for a total of
1054 lines and 37,780 animals tested. In each
experiment, we exposed dishes of larvae to
470-nm light stimuli (one exposure of 30 s followed by four exposures of 5 s, with a 30-s interval after the long exposure and a 10-s interval
between the short exposures) to optogenetically
activate Ch2-expressing neurons; we captured
video before, during, and after stimulation (Fig.
1A). The Multi-Worm Tracker (MWT) software
(4) tracked time-varying, two-dimensional closed
contours of larvae and sketched eight time-varying features that collectively characterize larval shape and motion (Fig. 1B). Streaming and
sketching reduced the data complexity by a factor
of more than 200,000, enabling a compressive yet
expressive representation of the data. These reduced data served as the input into the multiscale
unsupervised structure learning methodology to
reveal data-driven behavior types (Fig. 1C). Each
behavior type was then linked to the subset of lines
that mediate them (Fig. 1D).
Discovery of Behavior Types via Multiscale
Unsupervised Structure Learning
As a first step, we sought to discover a large, inclusive, and nonpredetermined set of statistically
distinguishable behavioral responses performed by
the 37,780 animals during the first (30-s) optogenetic activation period. Recently developed
methods for multiscale unsupervised structure
learning (11–13) can be thought of as generalizations of manifold learning techniques, in that
they can learn structures more general than manifolds, such as unions of manifolds. We adopted
iterative denoising tree (IDT) methodology (11, 14),
which offers demonstrated utility across several
domains (15, 16).
The input to IDT is the collection of all 37,780
larval sketches, irrespective of which line generated each sketch (Fig. 2A). IDT consists of five key
1Whiting School of Engineering, Johns Hopkins University,
Baltimore, MD 21218, USA. 2Department of Statistical Science,
Duke University, Durham, NC 27708, USA. 3Janelia Farm Research Campus, Ashburn, VA 20147, USA.
*These authors contributed equally to this work.
†These authors contributed equally to this work.
‡Corresponding author. E-mail: email@example.com (C.E.P.); zlaticm@
B Streaming & Sketching A Optogenetic Activation &
Big Data Video
D Neural-Behavioral Maps
via Manifold Screening
Fig. 1. Experimental design and methodology for obtaining neuron
line–behavior maps. (A) Optogenetic activation screen of 1054 lines while
digitally recording high-dimensional larval responses. (B) Streaming extracts
the contours of each larva from each video frame; sketching extracts eight
time-varying features from the contours that characterize the shape and
motion of each animal. (C) Machine-driven behavioral phenotyping learns
phenotype categories (called behaviotypes) from the sketches via multiscale
unsupervised structure learning. (D) Manifold testing discovers which neuron
lines evoke sets of behaviors that are different from negative controls, which
facilitates associating each such line with some number of behaviotypes.