Image: Techniques known as dimensionality reduction can help find patterns in the recorded activity of thousands of neurons. Instead of look at all responses at once, these methods find a smaller set of dimensions, in this case, three, that capture as much structure in the data as possible. Each trace in these graphics represents the activity of the whole brain during a single presentation of a moving stimulus, and different versions of the analysis capture structure related either to the passage of time (left) or the direction of the motion (right). The raw data are the same in both cases, but the analyses found different patterns (Photo courtesy of Jeremy Freeman, Nikita Vladimirov, Takashi Kawashima, Yu Mu, Nicholas).
New technologies for tracking brain activity are generating extraordinary quantities of information. This data may contain new clues into how the brain works, but only if researchers can interpret it. To help make sense of the data, neuroscientists can now exploit the power of distributed computing with Thunder, a library of tools.
In an age of “big data,” a single computer cannot always find the solution that a user requires. Instead, computational tasks must be distributed across a collection of computers that analyze a massive data set together. It is how Facebook and Google extract an individuals’ web history to present them with targeted ads, and how Amazon and Netflix recommend a favorite book or movie; however, big data is about more than only marketing.
Thunder was developed at the Howard Hughes Medical Institute’s (HHMI) Janelia Research Campus (Ashburn, VA, USA) and the application speeds the analysis of data sets that are so enormous and complex they would take days or weeks to analyze on a single workstation—if a single workstation could do it at all. Janelia group leaders Drs. Jeremy Freeman, Misha Ahrens, and other colleagues at Janelia and the University of California, Berkeley (USA), reported in the July 27, 2014, issue of the journal Nature Methods
that they have used Thunder to quickly find patterns in high-resolution images collected from the brains of active zebrafish and mice with multiple imaging techniques.
Significantly, they have employed Thunder to analyze imaging data from a new microscope that Ahrens and colleagues developed to monitor the activity of nearly every individual cell in the brain of a zebrafish as it behaves in response to visual stimuli. That technology is described in a companion paper published in the same issue of Nature Methods.
Thunder can run on a private cluster or on Amazon’s cloud computing services. Researchers can find everything they need to begin using the open source library of tools online.
New microscopes are capturing images of the brain faster, with better spatial resolution, and across wider regions of the brain than ever before. However, all these aspects come encrypted in gigabytes or even terabytes of data. On a single workstation, simple calculations can take hours. “For a lot of these data sets, a single machine is just not going to cut it,” Dr. Freeman noted.
It is not just the sheer volume of data that exceeds the limits of a single computer, the investigators noted, but also its complexity. “When you record information from the brain, you don’t know the best way to get the information that you need out of it. Every data set is different. You have ideas, but whether or not they generate insights is an open question until you actually apply them,” said Dr. Ahrens.
Distributed computing can accelerate analysis while exploring the full richness of a data set, but many alternatives are available. Dr. Freeman decided to build on a new platform called Spark. Developed at the University of California, Berkeley’s AMPLab, Spark is rapidly becoming a favored tool for large-scale computing across industry. Spark’s capabilities for data caching eliminates the logjam of loading a complete data set for all but the first step, making it well-suited for interactive, exploratory analysis, and for complex algorithms requiring repeated operations on the same data. Furthermore, Spark’s well-designed and versatile application programming interfaces (APIs) help simplify development. Thunder uses the Python API, which Dr. Freeman hopes will make it particularly easy for others to adopt, given Python’s increasing use in neuroscience and data science.
To make Spark suitable for analyzing a broad range of neuroscience data, Dr. Freeman first developed standardized representations of data that were amenable to distributed computing. He then worked to express typical neuroscience workflows into the computational language of Spark. From there, the biologic questions that he and his colleagues were curious about drove development.
Using the application, the investigators analyzed images of the brain in minutes, interacting with and revising analyses without the lengthy delays associated with previous methods. In images taken of a mouse brain with a two-photon microscope, for example, the researchers found cells in the brain whose activity varied with running speed. For analyzing much larger data sets, tools such as Thunder are not just helpful, they are vital, according to the scientists. This is true for the information gathered by the new microscope that the investigators developed for tracking whole-brain activity in response to visual stimuli.
In 2013, Drs. Ahrens and Janelia group leader Dr. Phillip Keller used high-speed light-sheet imaging to engineer a microscope that captures neuronal activity cell by cell across nearly the entire brain of an immature zebrafish. That microscope produced amazing images of neurons in the zebrafish brain firing while the fish was inactive. However, Dr. Ahrens wanted to use the technology to study the brain’s activity in more complex situations. Now, the scientists have combined their original technology with a virtual-reality swim simulator that Dr. Ahrens previously developed to provide fish with visual feedback that simulates movement.
Combining these two technologies lets Dr. Ahrens monitor activity throughout the brain as a fish modifies its behavior based on the sensory data it receives. The technique generates approximately a terabyte of data per hour--presenting a data analysis challenge that helped motivate the development of Thunder. When Drs. Freeman and Ahrens applied their new tools to the data, patterns quickly emerged. As examples, they identified cells whose activity was tied to movement in particular directions and cells that fired specifically when the fish was at rest, and were able to characterize the dynamics of those cells’ activities. Example analyses such as these, and example data sets, are available online (please see Related Links below).
Dr. Ahrens now plans to investigate more complex questions using the new technology, and both he and Dr. Freeman foresee expansion of Thunder. “At every level, this is really just the beginning,” Dr. Freeman stated.
Howard Hughes Medical Institute’s Janelia Research Campus
Example analyses and example data sets
Thunder Tool Library