Activity Testing in Mice with Behavioral Test Batteries

By April 24, 2019 No Comments

Introduction: Why Use Test Batteries?

Behavioral testing with mouse models forms a key part of scientific research in various fields. It is standard practice to use behavioral phenomena as an indicator of what is happening inside a model organism, in order to characterize the effects of diseases, drugs, genes and other factors on biological function. Standard behavioral tests are used to measure important behaviors, allowing experimental and control groups to be compared, and causal inferences to be made about any differences between them. This is a robust approach which has allowed for many important scientific discoveries.

However, the inherent complexity of living systems stands as an obstacle to making valid inferences from behavior to physiology. Organisms consist of a highly complex, hierarchical, dynamic architecture comprising many different systems, all constantly interacting with each other. This means for instance that a disease might not only affect one system in an organism and produce one single behavioral change but will more likely affect multiple systems and alter patterns of behavior in broad and unpredictable ways. A single behavioral test would thus be inadequate to characterize these changes.

The most common solution is to apply a test battery, in which the same model organisms are subject to a series of different tests, measuring different activities and different aspects of their behavioral change. This practice is intended to avoid narrowness and help capture the full range of behavioral changes resulting from a factor; by acknowledging that the organism is a whole, integrated system, it helps us to understand how behavior change emerges from the interaction of different processes.

In this article we will explore the practice of using behavioral test batteries, and explain how they have allowed for holistic characterization of a number of important diseases and conditions, as well as the investigation of genetic networks. We will explore the advantages and major uses of test batteries, but also question what confounding issues can arise in carrying them out, and how researchers can design their test batteries to best avoid these confounders.

Advantages of Test Batteries

Using a large battery of different tests in series, as opposed to only using a small number of tests or separating tests between multiple experiments, offers some major advantages:

  • Allowing us to address issues of complexity, and so to examine organisms and their behavior as an integrated system.
  • Being able to make comparisons between results in different paradigms with the exact same experimental subjects, as opposed to having to compare different mice.
  • Overcoming resource and time constraints in the lab, since running a single battery is more efficient than running multiple separate experiments.
  • Reducing ethical objections to using what some may see as an excessive number of experimental animals.
  • Mitigating biases that could be introduced by leaning on the correct set-up and execution of only a small number of tests.
  • Obtaining stronger reliability of results when similar results are seen across multiple paradigms (or inversely, questioning that reliability when paradigms give different results).

Applications of Test Batteries

Many different standard paradigms have been developed for studying behavior in mouse models. These include locomotor tests such as the rotorarod and the running wheel, mood tests such as forced swim and elevated plus maze, tests of learning and memory such as novel object recognition and the T maze, and tests of social behavior such as tube dominance and the resident-intruder test. These tests can be combined to form a near-infinite range of different test batteries for different applications.


In a recent paper, Gorina and colleagues report how they used a test-battery to examine the behavioral effects of ageing.[1] Groups of two-month-old and twelve-month-old mice were compared, with both groups being subjected to the elevated plus maze, light-dark box, open field, fear conditioning and a batch of five social interaction tests. There was an interval of at least a day between each test, with the battery lasting for eight days overall. The aged mice showed changes resembling those commonly seen in aged humans such as increased anxiety, the reduced scope of and deficits in social interaction.

A different research group at the University of Colorado developed a more narrow battery to look at how ageing affects motor function.[2] Mice at two months, 19 months and 24 months of age underwent the rotarod test, open field, tight-rope suspension, balance beam, and a cylinder test. Mice were tested for three hours per day, with the battery lasting four days; a whole day of rest between different kinds of rotarod test. A clear age-related decline in the different aspects of motor function was observed, and the progression of this decline in the mice was similar in many ways to that seen in humans.


The broad behavioral effects of a high-fat diet (HFD) were investigated via a test battery in a Japanese study published in 2016.[3] Paradigms employed here included motor tests (footprint test, rotarod), sensory tests (olfactory habituation and dishabituation), mood tests (open field, elevated plus maze), social behavior tests (social interaction test) and memory tests (novel object recognition). The researchers did not describe the overall battery time or inter-test interval. HFD was associated with deficits in olfaction and motor coordination, as well as increased social interaction and hyperhedonia, indicating the systemic consequences of this diet on health.

Learning & Memory

Wolf and colleagues created a battery for the comprehensive assessment of learning and memory performance, incorporating the Y maze, novel object recognition test, Morris water maze, and radial arm maze.[4] Tests were spread over eight days with a day’s interval left between the more invasive tests. They then used this battery to examine the differences between two mouse strains used to model dementia; while both strains showed memory deficits, the more fine-grained analysis permitted by having a range of tests showed some key interstrain differences, such as one showing much worse long-term memory.


Post-operative delirium is a condition common in elderly patients, where a person experiences temporary dementia and psychosis-like symptoms after a major operation under anesthesia. Despite affecting many patients all around the world, the condition is not well characterized. A 2016 paper from a Chinese research group attempted to rectify this by applying a behavioral battery to the study of delirium in mouse models[5] The battery comprised the buried foot test, open field, and Y maze. Inter-trial intervals were on the order of minutes. Results from using the battery suggested that energy deficits may play a key role in the progression of delirium.


One application of especial interest for test batteries has been their potential in investigating how networks of genes interact to affect different behavioral areas. The issue of complexity is at the forefront of genetics, with geneticists acutely aware of how the tens of thousands of genes in an organism encode hundreds of thousands of proteins, which in turn play a part in innumerable pathways and mechanisms, each contributing to a tiny part of the variance in an organism’s behavior.

Lad and colleagues, in a 2010 paper, present a battery designed to look comprehensively at how genomic differences between mouse strains affect behavioral patterns, and in turn link genetic loci with different behaviors.[6] The battery includes the open field test, novel object exploration, elevated plus maze, light-dark box, SHIRPA, puzzle box, Morris water maze, and tail suspension. Inter-trial resting periods lasted for a few minutes.

Design of Test Batteries

It is of paramount importance for researchers to design their test batteries in a way that both focuses on the particular aspects of mouse behavior they wish to examine as well as ensuring that their conclusions are well substantiated. The practice of running multiple different tests in series, while offering many key advantages, also introduces new difficulties in implementation and new opportunities for confounding to arise.

Choosing the Right Tests

Which tests a researcher includes in their battery will depend first on what aspects of behavior they are interested in, and how comprehensively they wish to characterize behavioral effects. While one researcher may be interested in the systemic consequences of a disease and thus require all kinds of paradigms, another may be interested only in the locomotor effects of disease and so have no need to include tests investigation variables like mood or cognition.

Researchers are further limited by time and resource constraints. Even if a comprehensive, systemic evaluation of mouse behavior is desired, it may simply not be possible in the time frame available to perform every single test that exists. In which case, researchers should look at the correlation between tests, and avoid administering two tests that appear to measure the same underlying factor to the same extent. Furthermore, researchers may prefer to opt more for tests that use less expensive equipment and are easier to administer.

Test Order as a Confounder

A major issue which arises when taking any measurement in scientific research is the possibility of hysteresis. Hysteresis is the dependence of a system on the history of that system; it means that what measurements you get from that system may depend on the order in which you take those measurements. In the case of activity testing, hysteresis arises when running test batteries in a different order per se produces significantly different results.[7][8]

Generally speaking, when we run a test battery we are not interested in characterizing the effects of test order on the mice, but only the holistic effects of some other factor(s). We want our measurements to perturb the mice as little as possible, allowing the values we obtain to reflect the states the mice would be in, due solely to the factor of interest (a disease, drug, or gene, etc) and not due to the experimental set up itself.

Historically, the most common tactic to ensure this has been to order the tests by how “stressful” they are. The theory is that more stressful tests will perturb the mice to a greater extent, and so alter their behavior more on subsequent tests. This tactic, however, does not remove the effects of stress but only shunts them towards the end of the battery. It also assumes that the level of stress caused by each test is well characterized and does not vary significantly between individual mice.

One must also consider that performance on one test may improve a mouse’s performance on a subsequent test by acting as a form of preparation. Moreover, some mice may learn better than others and so make better use of this preparation. It may seem that for some mice the test order improves their performance whereas for others it makes them worse, an issue which could not be solved simply by rearranging the tests.

One solution is to increase the resting period between tests. Many researchers recommend leaving a few days in between each round of testing, on the expectation that any confounding effects between tests will “decay” during that time period (any stress will be relieved and any advantages gained lost). Of course, the longer the interval between experiments, the less the test battery resembles a battery and the more it resembles a cluster of separate experiments. Not to mention that time is a scarce resource in a laboratory, and that the ageing of the mice could also affect test results.

Research on the effects of test order in batteries is scarce. One paper by Macllwain et al reports that tests sensitive to order include the forced swim, rotarod and hot plate, although most did not show this order dependency.[7] Another study found that the effect of learning from experience across the tests varied by strain.[9] Some researchers recommend leaving anxiety tests until the end, as these appear to be the most affected by prior experience. Grouping similar tests together are also highlighted as a good strategy.

Additional Confounding Factors

Test order is not the only confounding factor worth considering in designing test batteries. Running a large number of experiments in series will also require extensive handling of the mice. The mice will need to be removed from their enclosure and handled by the researchers repeatedly within a short time frame, which may induce additional stress. This additional stress could vary by age, gender, and strain, as well as ultimately by the individual, and may increase as more tests are performed (or perhaps decrease, depending on how acclimatized the mice become).

Choosing the right combination of test procedures can also be important. Some tests were developed specifically for use with mice while others, even though they are frequently used with mice, were developed originally for rat experiments. Protocols designed for mice and rats may not produce comparable results and so including them in the same test battery may lead to difficulties in data interpretation.

Finally, since part of the point of a test battery is to test the exact same mice on all paradigms, confounding could arise from failing to meet this ideal. For example, if one or more of the mice die before the battery is completed, or if a mouse becomes too unwell to perform a test properly. Since mice are highly social creatures, a change in the makeup of their social group from one test to the next might affect their future performance.


The use of test batteries is an experimental approach that allows researchers to characterize the activity of model mice in a rigorous, systematic way. Researchers avoid being too narrow in their focus, appreciating the integrated nature of their model systems and the complex effects that key factors can have on them.

Nevertheless, test batteries do require a well-thought-out set-up and throw up a few major hurdles to implementation, most notably test-order effects, and time and resource intensiveness. It is hoped that the information contained in this article will allow scientists to design and carry out more effective test batteries in the future.


  1. Ya. V. Gorina. Yu. K. Komleva, O. L. Lopatina, V. V. Volkova, A. I. Chernykh, A. A. Shabalova, A. A. Semenchukov, R. Ya. Olovyannikova, A. B. Salmina. 2017. The battery of tests for experimental behavioral phenotyping of aging animals. Advances in Gerontology. 7. 2.
  2. Jamie N. Justice & Christy S. Carter & Hannah J. Beck & Rachel A. Gioscia Ryan & Matthew McQueen & Roger M. Enoka & Douglas R. Seals. 2014. Battery of behavioral tests in mice that models age-associated changes in human motor function. Age. 36.
  3. Kenkichi Takase, Yousuke Tsuneoka, Satoko Oda, Masaru Kuroda, and Hiromasa Funato. 2016. High-Fat Diet Feeding Alters Olfactory-, Social-,
    and Reward-Related Behaviors of Mice Independent of Obesity. Obesity. 24. 4.
  4. Andrea Wolf, Björn Bauer, Erin L. Abner, Tal Ashkenazy-Frolinger, Anika M.
    S. Hartz. 2016. A Comprehensive Behavioral Test Battery to Assess Learning and Memory in 129S6/ Tg2576 Mice. PLOS One. 11(1).
  5. Mian Peng, Ce Zhang, Yuanlin Dong, Yiying Zhang, Harumasa Nakazawa, Masao Kaneki, Hui Zheng, Yuan Shen, Edward R. Marcantonio & Zhongcong Xie. 2016. Battery of behavioral tests in mice to study postoperative delirium. Nature Scientific Reports. 6. 29874.
  6. Lad HV, Liu L, Paya-Cano JL, Parsons MJ, Kember R, Fernandes C, Schalkwyk LC. 2010. Behavioural battery testing: evaluation and behavioural outcomes in 8 inbred mouse strains. Physiol Behav. Mar 3;99(3):301-16.
  7. Kellie L. McIlwain, Michelle Y. Merriweather, Lisa A. Yuva-Paylor, Richard Paylor. 2001. The use of behavioral test batteries: Effects of training history. Physiology and Behavior. 73.
  8. Richard E. Brown, Lianne Stanford, and Heather M. Schellinck. 2000. Developing Standardized Behavioral Tests for Knockout and Mutant Mice. ILAR Journal. 41. 3.
  9. V. Võikar, E. Vasar, H. Rauvala. 2004. Behavioral alterations induced by repeated testing in C57BL/6J and 129S2/ Sv mice: Implications for phenotyping screens. Genes, Brain Behav. 3.

About Adam Fitchett

Adam Fitchett has an MSc in neuroscience from University College London and a BSc in biochemistry from Sussex University. He has conducted research into the molecular underpinnings of long term memory, as well as the treatment of neurodegeneration. Adam enjoys writing on a range of scientific topics for both a professional and a general audience.