Positive Punishment Using Operant Conditioning

In our previous article, we discussed positive reinforcement, a powerful behavior that is established using operant conditioning and reinforcing stimuli, such as food, sucrose water or euphoric drugs. In positive reinforcement, a pleasurable stimulus is provided upon completion of a specific task, such as a nose poke or a lever press, thus resulting in the strengthening of this behavior. But reinforcement is not the only way to implement operant conditioning. Punishment is an equally successful way to manipulate behavior, only this time the goal is to weaken the expression of a specific behavior.

Punishment protocols can be positive and negative. Even though describing a punishment as positive seems self-contradictory, the terms positive and negative do not reflect the impact of the protocol on the subject, whether they like it or not. Positive punishment occurs when an aversive stimulus is provided after the expression of an undesirable behavior, while negative punishment happens when a pleasurable stimuli is removed upon the expression of an undesirable behavior. It is important to keep in mind that both positive and negative punishment protocols aim in the weakening of a behavior. The characterization of the protocol as positive or negative solely depends on the addition or removal of an external stimulus by the experimenter.

In the present article, we will discuss positive punishment and elaborate on the brain regions involved in the expression of the behavior, the most widely used experimental protocols, and the relevant scientific questions.

Which brain regions orchestrate positive punishment learning?

Punishment learning is a complex process and requires the coordinated function of numerous brain regions. Several lines of experimental evidence support that serotonergic and noradrenergic signaling mediate punishment. Lesions of serotonin containing terminals, systemic injections of serotonin antagonists or serotonin synthesis inhibitors have anti-punishment effects. Similarly, norepinephrine, as well as norepinephrine agonists, have strong anti-punishment effects, and anti-punishment anxiolytic drugs tend to increase norepinephrine activity and release.^[1^] However, until today the neural correlates of punishment have not been fully elucidated and new techniques allow us to further characterize specific aspects of this behavior. Furthermore, regarding positive punishment learning, our current understanding is poor and incomplete. Nonetheless, there are a few brain regions that are implicated in punishment learning, as lesions and pharmacologic disruption of their function disrupt the behavior.

Amygdala

The amygdala, and especially the basolateral nucleus (BLA), has long been implicated in coding fear in the brain. While its role in Pavlovian fear conditioning was well established for decades, its implication in punishment learning was less understood. Several studies have shown that BLA lesions and inactivations interfere with expression of punishment. ^[1^]

A recent study provided evidence that activation of the BLA is necessary for the acquisition and expression of positive punishment. The researchers performed local infusions of GABA agonists into the BLA and assessed positive punishment learning using an operant conditioning apparatus. Firstly they trained the rodents to press a lever in order to get a food reward. Then, they combined the lever pressing with a food reward plus a foot shock. The rodents of the control group quickly stopped pressing the lever, indicating the acquisition of punishment learning. However, the experimental group, which received the GABA agonists infusions into the BLA, failed to suppress lever pressing, indicating an impairment in punishment learning. Therefore, the intact function of the BLA is necessary for punishment learning.^[2]

Prefrontal cortex (PFC)

The PFC has been linked to aversion learning and behavioral control. However, there is no compelling evidence attributing a specific role of any PFC subregion to the positive punishment learning. Taking into account the well-established participation on PFC in the general encoding of fear related signals, scientists suggest that it is very challenging to attribute a function of PFC to punishment learning versus fear. ^[1^]

Ventral tegmental area (VTA)

The role of VTA and midbrain dopaminergic neurons in reinforcement of behavior is well characterized. Since stimulation of these neurons is rewarding, it has been proposed that their inhibition could mediate the encoding of punishment. There is evidence that inhibition of dopaminergic signaling participates in aversion coding,^[3] but the exact role of VTA in positive punishment learning has not been assessed yet.^[1^]

How to perform a positive punishment experiment

Positive punishment experiments are performed in the operant conditioning apparatus . This is a large box that typically contains one or more levers that the animal learns not to press to avoid punishment. Furthermore, the floor of the apparatus is electrifiable, in order for the experimenter to deliver mild electric shocks as a punishment.

Mild electric shocks represent the most widely used punishments in the literature, yet recent guidelines for animal research instruct the reduction of pain causing behavioral paradigms whenever possible. Therefore, researchers are shifting their strategies to other stressors, which do not cause pain to the rodents. Characteristic examples include auditory stimuli, such as loud broadband noise and normal intensity recordings of ultrasonic vocalizations, which are normally emitted by rodents in aversive situations.^[4][5] Recently, exposure to light was also found to be an efficient aversive stimulus in punishment experiments.^[6] Finally, brief air puffs can also serve as punishing stimuli for mice and rats.^[1]

The simplest experimental design for a positive punishment experiment is to teach the rodent that a specific action, such as a lever press or a nose poke, results in the administration of a punishment. Since positive punishment experiments are a type of operant conditioning, the behavior that is being punished must be spontaneously expressed. The experimenter must wait for the rodent to willfully press the lever and only then initiate the schedule of punishment, in order for the rodent to create the association of the behavior with the negative consequence.

Other protocols additionally involve the presentation of a discriminative stimulus. This stimulus, which can be visual or auditory, signals that the punishment protocol is under effect. When the discriminative signal is absent, then the punishment protocol is not applied and the rodent can express the behavior without receiving punishment. The discriminative stimulus adds a level of complexity to the experiment and empowers its sensitivity, but on the other hand, require additional time for successful completion of the learning process.

Finally, more complicated protocols may be used. For instance, in a recent study assessing the aversive properties of a drug, the researchers used the following strategy. The operant conditioning apparatus had two levers. If the rodent pressed lever 1, it would receive a food pellet. If it pressed lever 2, it would receive a food pellet plus an injection of the drug. Because the drug had aversive properties, the rodents learned to press only lever 1 and not lever 2. These types of studies are always combined with positive and negative controls, using drugs with well-known reinforcing or aversive properties, e.g. amphetamines or histamine.^[7]

Two important aspects of punishment experiments are contingency and contiguity. Contingency reflects the correlation between the behavior and outcome. Strong contingency leads to fast learning, while weak contingency leads to slow learning and random contingency leads to no learning. In the majority of positive punishment experiments, a strong contingency is used, and the behavior is consistently followed by punishment every time it occurs.

Contiguity refers to the time between the behavior and the outcome. For punishment experiments, it is important to maintain short contiguity to achieve better learning. If other actions occur between the behavior and the punishment, this may result in the association of the other actions with the punishment and impede the learning process.

How To Perform Positive Punishment Experiment

Comparing positive punishment with other operant conditioning protocols

When designing a positive punishment experiment, it is important to make sure that your experiment is indeed assessing positive punishment and not a similar yet distinct behavior. Thus, in the following section, we will briefly point out the differences of positive punishment with other aversive learning protocols, namely active avoidance and escape learning.

As we mentioned above, positive punishment occurs when a behavior leads to the addition of an aversive stimulus, so for example if the rodent presses the lever, it will hear a loud noise.

During active avoidance, the expression of a behavior prevents the presentation of an aversive stimulus, so if the rodent presses the lever it will not hear a loud noise that is played in a predetermined schedule, for example, every 20 seconds.

In the escape learning paradigm, the expression of the behavior results in the removal of a currently present aversive stimulus. Using our example, if the rodent presses the lever, a loud noise that is present in the background will stop. Both active avoidance and escape learning result in the strengthening of the behavior and thus are considered negative reinforcement protocols, while positive punishment results in the weakening of a behavior.

Last but not least, positive punishment must not be confused with pavlovian fear conditioning, where the presentation of a cue is followed by an aversive stimulus, such as a loud noise. In all operant conditioning paradigms, including positive punishment, it is necessary for the subject to operate, to express a specific behavior in order to receive an outcome.

Which scientific questions can be addressed using positive punishment

There are three main types of questions that can be studied using positive punishment experiments. The first one is related to the neural correlates of this type of operant conditioning. Over the years, reinforcement studies have been more popular and were preferred by the scientific community, resulting in a current lack of mechanistic insight on punishment. This may originate from Skinner’s belief that reinforcement is stronger behavior than punishment. However, until today punishment is applied in many aspects of our everyday life that shape our behavior, ranging from parenting punishment to justice and law. It is important to know how we learn through punishment, which brain regions are involved in the process and specifically how they facilitate punishment learning.

The second type of question concerns the efficiency of punishment protocols. We need to understand which type of punishment protocol is most efficient, both in the short- and in the long-term. Positive punishment correlates an undesired behavior with an unpleasant outcome, providing no information on the “correct” behavior. Thus, it is commonly observed that punished behaviors tend to reinstate very quickly when the punishing agent is removed. Since positive punishment requires the administration of aversive stimuli, a process that inherently must be limited in time, it is necessary to understand how to make the most out of it. Different administration schedules or even combination of undesired stimuli may be more successful, but scientific proof is needed to answer these questions. Moreover, taking into account the current reevaluation of acceptable experimental protocols in rodent research, we need to validate additional aversive stimuli, such as air puffs, intense light and ultrasounds, to be used in punishment experiments.

Finally, the third type of question that can be addressed with positive punishment experiments is relevant to pharmacological drugs with reinforcing or aversive stimuli. When a new compound is synthesized, it is important to prove its value as a therapeutic drug, but also to ensure that it doesn’t have any reinforcing or aversive properties, that will later interfere with the patients’ behavior. Combining punishment studies with the administration of drugs can provide critical information on their potential to modulate behavior upon punishment, and thus to affect human behavior.

Concluding remarks

To conclude, positive punishment learning is a field with several open questions. Considering its wide application for our lives, it is paradoxical that information concerning its neural basis remains scarce. Even though positive punishment in parenting is nowadays less acceptable, at least in terms of physical punishment, yet other ways of positive punishment such as giving additional chores or homework to children, are gaining more supporters. Depending on the severity of the punishing stimulus, the undesired behavior can be quickly and completely eliminated. Consequently, when a matter is of great importance and urgency, it is natural for parents to opt for this teaching strategy, compared to milder approaches.

Keeping in mind the potential translational gap between rodents and humans, it is necessary to broaden our understanding on punishment learning, in order to distinguish beneficial and damaging punishment and improve our strategies towards future generations and ourselves.

References

Jean-Richard-Dit-Bressel, P., Killcross, S., & McNally, G. P. (2018). Behavioral and neurobiological mechanisms of punishment: implications for psychiatric disorders. Neuropsychopharmacology, 43(8), 1639.
Jean-Richard-Dit-Bressel, P., & McNally, G. P. (2015). The role of the basolateral amygdala in punishment. Learning & Memory, 22(2), 128-137.
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B., & Uchida, N. (2012). Neuron-type-specific signals for reward and punishment in the ventral tegmental area. nature, 482(7383), 85.
Reed, P., & Yoshino, T. (2008). Effect of contingent auditory stimuli on concurrent schedule performance: An alternative punisher to electric shock. Behavioural Processes, 78(3), 421-428.
Portfors, C. V. (2007). Types and functions of ultrasonic vocalizations in laboratory rats and mice. Journal of the American Association for Laboratory Animal Science, 46(1), 28-34.
Barker, D. J., Sanabria, F., Lasswell, A., Thrailkill, E. A., Pawlak, A. P., & Killeen, P. R. (2010). Brief light as a practical aversive stimulus for the albino rat. Behavioural brain research, 214(2), 402-408.
Minervini, V., Osteicoechea, D. C., Casalez, A., & France, C. P. (2019). Punishment and reinforcement by opioid receptor agonists in a choice procedure in rats. Behavioural pharmacology, 30(4), 335-342.