24. P. Beltrán et al., J. Clin. Microbiol. 37, 581–590 (1999).
25. I. K. Wachsmuth et al., J. Infect. Dis. 167, 621–626 (1993).
26. A. Mutreja et al., Nature 477, 462–465 (2011).
27. M. Alam et al., Proc. Natl. Acad. Sci. U. S.A. 111, 9917–9922 (2014).
28. J. P. Guthmann, J. Trop. Med. Hyg. 98, 419–427 (1995).
29. C.-S. Chin et al., N. Engl. J. Med. 364, 33–42 (2011).
30. R. S. Hendriksen et al., MBio 2, e00157-e11 (2011).
31. J. W. Tappero, R. V. Tauxe, Emerg. Infect. Dis. 17, 2087–2093 (2011).
32. S. M. Moore, K. L. Shannon, C. E. Zelaya, A. S. Azman,
J. Lessler, PLOS Curr. 6, ecurrents.outbreaks.
33. J. A. Díaz-Quiñonez et al., Genome Announc. 2, e01123-14 (2014).
34. G. H. Rabbani, W. B. Greenough 3rd, J. Diarrhoeal Dis. Res. 17,
35. J. Martinez-Urtaza, J. Trinanes, N. Gonzalez-Escalona,
C. Baker-Austin, Nat. Microbiol. 1, 16018 (2016).
36. C. Seas et al., Am. J. Trop. Med. Hyg. 62, 513–517 (2000).
37. S. Y. Choi et al., MBio 7, e02160-e15 (2016).
38. World Health Organzation, Prevention and control of cholera
outbreaks: WHO policy and recommendations; www.who.int/
39. Centers for Disease Control and Prevention, Cholera (Vibrio
cholerae O1/O139). 1996 Case Definition; https://wwwn.cdc.
This study was supported by the Wellcome Trust (grant 098051), the
Institut Pasteur, Santé Publique France, and the French government’s
Investissement d’Avenir program, Laboratoire d’Excellence “Integrative
Biology of Emerging Infectious Diseases” (grant no. ANR-10-LABX-62-
IBEID). We thank J. Rauzier for technical assistance; A. J. Page, J. Keane,
and the sequencing teams at the Wellcome Trust Sanger Institute and
the Institut Pasteur; and M. Beale for helpful comments. M.J.D. is
supported by a Wellcome Trust Sanger Institute Ph.D. Studentship. A.E.M.
is supported by Biotechnology and Biological Sciences Research Council
fellowship BB/M014088/1. D.M.A. is supported by the Centre for
Genomic Pathogen Surveillance. M.-L.Q. is a member of the WHO
Global Task Force on Cholera Control and leader of the Laboratory
Surveillance Working Group, involved in developing and disseminating
technical guidance. J.P. is a member of the Institut Pasteur Scientific
Council and a paid consultant to Specific Technologies. The findings
and conclusions in this report are those of the author(s) and do not
necessarily represent the official position of the Centers for Disease
Control and Prevention. All data and code necessary to understand
and assess the conclusions of this research are available in the main
text, supplementary materials, and via the following repositories:
European Nucleotide Archive (ENA) ( www.ebi.ac.uk/ena), under
study accession numbers PRJEB9140, PRJEB2215, and PRJEB8764;
cholerae_in_the_Americas/5427253. Phylogeny and metadata can be
viewed interactively at https://microreact.org/project/
Materials and Methods
Tables S1 to S4
Figs. S1 to S11
27 June 2017; accepted 10 October 2017
A neural algorithm for a fundamental
Sanjoy Dasgupta,1 Charles F. Stevens,2,3 Saket Navlakha4*
Similarity search—for example, identifying similar images in a database or similar documents
on the web—is a fundamental computing problem faced by large-scale information retrieval
systems. We discovered that the fruit fly olfactory circuit solves this problem with a variant
of a computer science algorithm (called locality-sensitive hashing). The fly circuit assigns
similar neural activity patterns to similar odors, so that behaviors learned from one odor can
be applied when a similar odor is experienced. The fly algorithm, however, uses three
computational strategies that depart from traditional approaches. These strategies can be
translated to improve the performance of computational similarity searches. This
perspective helps illuminate the logic supporting an important sensory function and
provides a conceptually new algorithm for solving a fundamental computational problem.
An essential task of many neural circuits is to generate neural activity patterns in response to input stimuli, so that differ- ent inputs can be specifically identified. We studied the circuit used to process odors
in the fruit fly olfactory system and uncovered
computational strategies for solving a fundamental machine learning problem: approximate similarity (or nearest-neighbors) search.
The fly olfactory circuit generates a “tag” for
each odor, which is a set of neurons that fire when
that odor is presented (1). This tag is critical for
learning behavioral responses to different odors
(2). For example, if a reward (e.g., sugar water) or
a punishment (e.g., electric shock) is associated
with an odor, that odor becomes attractive (a fly
will approach the odor) or repulsive (a fly will
avoid the odor), respectively. The tags assigned
to odors are sparse—only a small fraction of the
neurons that receive odor information respond
to each odor (3–5)—and nonoverlapping: Tags for
two randomly selected odors share few, if any,
active neurons, so that different odors can be
easily distinguished (1).
The tag for an odor is computed by a three-step procedure (Fig. 1A). The first step involves
feedforward connections from odorant receptor
neurons (ORNs) in the fly’s nose to projection neurons (PNs) in structures called glomeruli. There
are 50 ORN types, each with a different sensitivity and selectivity for different odors. Thus, each
input odor has a location in a 50-dimensional
space determined by the 50 ORN firing rates.
For each odor, the distribution of ORN firing rates
across the 50 ORN types is exponential, with a
mean that depends on the concentration of the
odor (6, 7). For the PNs, this concentration de-
pendence is removed (7, 8); that is, the distri-
bution of firing rates across the 50 PN types is
exponential, with close to the same mean for all
odors and all odor concentrations (1). Thus, the
first step in the circuit essentially “centers the
mean”—a standard preprocessing step in many
computational pipelines—using a technique called
divisive normalization (8). This step is important
so that the fly does not mix up odor intensity with
The second step, where the main algorithmic
insight begins, involves a 40-fold expansion in
the number of neurons: Fifty PNs project to 2000
Kenyon cells (KCs), connected by a sparse, binary
random connection matrix (9). Each KC receives
and sums the firing rates from about six randomly
selected PNs (9). The third step involves a winner-
take-all (WTA) circuit in which strong inhibitory
feedback comes from a single inhibitory neuron,
called APL (anterior paired lateral neuron). As a
result, all but the highest-firing 5% of KCs are
silenced (1, 3, 4). The firing rates of these remaining
5% correspond to the tag assigned to the input odor.
From a computer science perspective, we view
the fly’s circuit as a hash function, whose input is
an odor and whose output is a tag (called a hash)
for that odor. Although tags should discriminate
odors, it is also to the fly’s advantage to associate
very similar odors with similar tags (Fig. 1B), so
that conditioned responses learned for one odor
can be applied when a very similar odor, or a
noisy version of the learned odor, is experienced.
This led us to conjecture that the fly’s circuit
produces tags that are locality-sensitive; that is,
the more similar a pair of odors (as defined by
the 50 ORN firing rates for that odor), the more
similar their assigned tags. Locality-sensitive hash
[LSH (10, 11)] functions serve as the foundation
for solving numerous similarity search problems
in computer science. We translated insights from
the fly’s circuit to develop a class of LSH algo-
rithms for efficiently finding approximate nearest
neighbors of high-dimensional points.
Imagine that you are provided an image of
an elephant and seek to find the 100 images—
out of the billions of images on the web—that
look most similar to your elephant image. This
is called the nearest-neighbors search problem,
which is of fundamental importance in infor-
mation retrieval, data compression, and machine
learning (10). Each image is typically represented
as a d-dimensional vector of feature values. (Each
odor that a fly processes is a 50-dimensional fea-
ture vector of firing rates.) A distance metric is
1Department of Computer Science and Engineering, University
of California San Diego, La Jolla, CA, USA. 2Molecular
Neurobiology Laboratory, The Salk Institute for Biological
Studies, La Jolla, CA, USA. 3Kavli Institute for Brain and Mind,
University of California San Diego, La Jolla, CA, USA.
4Integrative Biology Laboratory, The Salk Institute for Biological
Studies, La Jolla, CA, USA.
*Corresponding author. Email: email@example.com