LEARNING THEORY FOR TRAINERS
I shall be discussing an overview of the theory behind our dog training. I believe this is relevant to all who choose to educate themselves and the various individuals in their lives. There is a plethora of information about behavior and learning theory in books, tapes (audio and video), and the Internet. I shall not attempt to discuss all the various ramifications of each topic but rather give a brief explanation with examples. Perhaps this may encourage others to explore some of the topics more thoroughly. By understanding the why (and theory) behind various teaching/learning methods, we can then make more informed choices as to what method or how we choose to teach our dogs what is desired. I will, however, share the names of books that I have found to be extremely helpful. The many books and articles by Karen Pryor are focussed more on operant conditioning. Her books that I have and enjoy are: "Don’t Shoot the Dog!", "Lads Before the Wind", and a collection of her essays and research papers compiled in "On Behavior". Another author who I also enjoy is Pamela J. Read, PhD, who in addition to her articles in Dogs in Canada, has written a book titled Ex-cellerated Learning……explaining in pure English how dogs learn and how best to teach them. This article shall first focus on general learning theory and then I’ll discuss four other topics leading to the final topic of teaching our dogs using a conditioned reinforcer.
Learning Theory Vicarious Learning (Modeling) In this type of learning, an individual learns through observation of another’s behavior. Part of vicarious learning is not only knowing what (to do) but when (to do it). In a simpler term, it is mimicry. The individual watches others to see what behavior to do, when to do it, and what are the likeliest results that will occur if/when the behavior is done. A common example of vicarious learning is how a young pups learns various household routines from the older dig(s) in the house. To expand: each of my housegirls has a special spot that they retire to when we have "quiet times." Not only do the girls go to their respective spots, but they must also lie down before getting a treat. Puppies rapidly learn that the treats come more quickly the faster they go and lie down in "their" spots.
Classical Conditioning Pavlov recognized that a dog began to salivate involuntarily at the sound of a bell that preceded the dog’s feeding. He noted other associations that also stimulated the dog’s saliva production; the lab attendants walking to the laboratory and then the activity of the lab attendants at the food storage cupboard. Pavlov then conducted further experiments pairing the association of feeding to other stimuli such as different types of sounds. In the beginning of classical conditioning, the response of the animal (eg. dog) is involuntary. Then the animal begins to experience and eventually memorizes the various contingencies in its environment. It learns about anticipation. In anticipation, the animal learns that there are predictable relationships between events. An example is the lab attendant walking into a lab, Pavlov’s dog anticipating being fed, and the dog salivating. The animal then learns to respond to the first event in anticipation of the second event. Even though there is anticipation, it must be emphasized that the response to the stimulus is involuntary. (eg. the dog couldn’t help salivating while the food was being prepared.) Anticipation can apply to other behavior than those associated with food. For example, the leaping and howling that sled dogs do when they see their harnesses clearly demonstrate their association of the appearance of the harnesses with the joy of running though the snow. The Pavlovian conditioning or respondent conditioning (as termed by B.G. Skinner) is responsible for many actions and reactions in all organisms. It must be emphasized that the organism has little control as the environment elicits the (responding) behavior from it. Strong repetitive fear reactions are examples of classical conditioning to a previous event(s). For example, a thunderstorm terrifies a young puppy the first time he hears it. Every storm thereafter can elicit an involuntary fear reaction from the puppy. Counter conditioning is utilized to reverse the fear-based conditioning that occurred previously. This is done by presenting a different unconditioned stimulus which elicits an unconditioned response that is incompatible with fear. Because the procedure of counter conditioning can be difficult, desensitization is often added. Desensitization involves a gradual increase in the fear provoking stimulus until the full amount of the stimulus is realized. In the process, adding a new unconditioned stimulus is added to replace the fear provoking stimulus. In our example of a thunderstorm phobia, the counter conditioning ideally involves a recording of thunderstorms (such as the CD produced by Solitudes), lots of treats, patience, and timing. By timing, the counter-conditioning for thunderstorms should be done in the winter months when there are no thunderstorms. Initially play the recording very quietly and associate the sound of thunder with a treat. As the dog relaxes, the sound can be intensified. Once the dog is comfortable with the recorded sounds, the handler must remember to reinforce this counter conditioning at the onset and during actual thunderstorms until the dog is able to relax on its own. Counter conditioning is along process and, to be effective, should not be rushed at all. It is important to understand classical conditioning as Pamela Reid explains: "if you don’t understand how it works, you can end up incorporating procedures that interfere with what you are trying to accomplish. You can be in for an extremely difficult time if you try to teach a behavior that is incompatible with the behaviors that are elicited by reinforcement."
Operant Conditioning Edward L. Thorndike, a behaviorist pre-dating B.C.Skinner, proposes a "Law of Effect": If a consequence is pleasant, the preceeding behavior becomes more likely. If a consequence is unpleasant, the preceding behavior becomes less likely." Or, in another quote from Thordike, "Pleasure stamps in, pain stamps out." In operant conditioning, there are four possible scenarios: two that increase the likelihood of the behavior re-occuring and two that decrease the likelihood that the behavior will reoccur. These are:
Positive Reinforcement: produces a pleasant effect. Eg. dog sits on command and receives a treat. As Operant Conditioning is a large topic and constitutes the basis of the majority of our dog training, I shall explore it further in its own section.
Operant Conditioning To review, in operant conditioning, the subject changes its behavior in response to achieving a goal or desired effect. In 1938,m B.F. Skinner developed the basic concept of operant conditioning. He claimed this type of learning was not the result of stimulus-response learning (i.e. classical/Pavlovian conditioning) but rather of the subject making a choice and the resultant reinforcer of that choice. Skinner chose the term "operant" conditioning to denote that the subject is the operator as it were of tits choices and not just a passive participant. Skinner learned there are two kinds of reinforcement that strengthen the subject’s response thus increasing the probability that the behavior would reoccur. One of the reinforcements (positive) ADDS something pleasurable to the subject (eg. food treat). The other reinforcer (negative accomplishes its effect by REMOVING something unpleasant or adverse to the subject’s environment (eg. tight choke collar or ear pinch). He also learned there are two punishments which weaken the probability that the behavior will reoccur. The first, positive punishment, adds a direct or averse effect to the subject (eg. strong jerk of choke collar). The other punishment (negative punishment) has also been termed extinction. In this, an undesired behavior has no effect upon the subject’s environment and will then fade. (Eg. A dog who begs for a treat will eventually give up If he is totally ignored every time. Any attention or reaction from the person with the food whether it be positive or negative attention will only increase the begging behavior.) Skinner found that the two methods of punishment are not as effective as the reinforcement methods. In punishment, the focus is not on the actual action of the subject but what the behavior should be. It also can elicit some undesired emotions (eg. resentment, anger, apathy) and psychological problems and not be conducive to continued relationships. Specifically, in dog training these four effects of operant conditioning are generally known as Positive Reinforcement, Negative Reinforcement, Punishment, and Extinction. While the current trend is teaching dogs using positive reinforcement, there are times, I believe, when each of the other methods must be used. The danger in following one method exclusively is that the trainer becomes so blinded by "their method" that the individual dog and its unique needs are forgotten. What may work well with one dog (or a particular breed) may be totally different from what is needed with a different dog or breed. As mentioned previously, there is a plethora of information available on the various methods and cures for dog training and dog related problems. It is impossible to explore the scope of each method thoroughly in an article so I encourage readers to search out the information that is available. Every person who has trained many dogs has favorite methods, teachers, and ideas that they use. I urge each trainer to search for new ideas constantly. While you may not agree with all the ways a particular person uses, perhaps something they say or do might be useful to try. At this point, I’ll briefly cover the four methods of Operant Conditioning.
Positive Reinforcement
Negative Reinforcement
Punishment
Extinction It is important to understand the difference between reinforcement and reward. Reinforcement (positive or negative) can be termed as anything that is used in conjunction with a behavior can increase the probability of that behavior reoccuring. It occurs during or immediately after the conclusion of the desired behavior. (In this way it gives the subject/dog immediate information about its behavior.) Reward or punishment both occur after the act is completed. There is little or no direct correlation between the behavior performed and the reward/punishment as the emphasis is on the behavior desired. In the next section, I shall focus on the Positive Reinforcement method of Operant Conditioning.
Positive Reinforcement People who use positive reinforcement in their dog training and daily life must become aware of the positive effects that are created. Who among us does not like receiving a "well done", a reward/surprise or the opportunity to enjoy a favorite activity when we have done a task well. Our dogs are no different from us. They, too, like the praise, rewards, activities, and recognition that we strive for in our daily lives. In review, in positive reinforcement (of operant conditioning) a subject chooses to change its behavior to receive a positive effect (such as praise, reward, favorite activity) in its environment. A reinforcement occurs during or immediately after the behavior is done. In positive reinforcement, the subject wants to achieve the reward and changes its behavior until it gets it. With repeated attempts with similar behavior changes giving rewards ultimately changes the behavior. This is pure behavior modification using incentives. What can be used as reinforcements? The answer is simple – whatever works! The reinforcement must be species appropriate (eg. while a dolphin would work happily for a fish, a horse wouldn’t) and be something that the subject enjoys. Attempting to use food as an incentive for a ball-crazy dog is as useless as trying to use a ball as incentive for a strongly food orientated dog. Changing the reinforcements – even during a training session – adds variety and increases the attention and enjoyment factor of the subject. The size of the reinforcement in a training session is very important. The premise is to use as small an amount of the motivator as possible to achieve the result of the subject (eg. dog) doing the behavior. Small amounts of the motivator increase the interest in the subject getting more of it as well as keeping its focus in the training session and not, for instance, playing ball. When using food, use tiny amounts as the idea is to get as many behaviors as possible not to feed a meal during the graining session. If the dog becomes satiated, he won’t have the same desire to work. When using a toy or activity as the motivation, keep the sessions very brief otherwise the dog won’t be interested in working at all. There is one exception to using small amounts of the motivator. Karen Pryor terms this exception a "jackpot". These are much larger than the normal motivator. (For instance, instead of giving a tiny piece of a hot dog, a whole hotdog is given.) These are big surprises and should not be over-used to keep the surprise effect. Jackpots are wonderful in marking a large breakthrough in a training session and are extremely effective. Timing of giving the reinforcements is extremely important. Generally, if a new trainer is having difficulty in teaching his dog, the problem is usually with the timing of the reinforcement. Given too early, it doesn’t reinforce the actual behavior desired by rather the behavior that precedes the desired behavior. Too early a reinforcement is also bribery which is highly ineffective. If the reinforcement is given too late, the opportunity for acknowledging the actual behavior desired has been lost. Again, it is also ineffective. Correctly timed reinforcements do change behavior. Reinforcement communicate information to the subject about its behavior and must be given during or immediately after the desired behavior is achieved. Three are three types of schedules that outline how reinforcements should be given to the subject. The first schedule, constant reinforcement, is to be used in the learning phase. This means the subject (dog) receives the positive reinforcement (treat) each time it changes its behavior on cue (lies down). The reinforcer acts as information and the dog must learn that when the cue (in this case, "down") is given, he must change his behavior (i.e. lie down) in order to receive the positive reinforcement (treat). It has been shown through studies that while the subject continues to learn at a steady and moderate rate, overall there is a gradual slowing of the subject’s response times with brief and predictable pauses between the cue and the behavior shown. The second type of reinforcement schedule is the fixed ratio schedule. There is a fixed number of correct responses required by the subject before it receives a reward. This cold be as often as 2:1 (two responses to one reward) or more spread out such as five responses to one reward. Studies have shown that the subject responds at a high and steady rate except immediately after the reinforcement before the next behavior is given. This is termed the post-reinforcement pause. The more responses the subject must make before it is rewarded, the longer the post-reinforcement pauses become. The third schedule is the variable reinforcement schedule. In this, there is no set ratio of responses required for rewards. It allows great spontaneity for the trainer in rewarding the dog. For example, such a schedule might be one response/one reward; three responses; one reward; one response/one reward; five responses/one reward, two responses/one reward The strength of the variable ratio schedule is its unpredictability. Studies have shown that the subject responds at a high steady rate with a minimal post-reinforcement pause. A classic example is a slot machine. Generally speaking, once a subject (eg. dog) has learned a behavior, it should be put on a variable ratio schedule. There is, however, one exception to this. Any time a subject must make a choice or solve a puzzle (eg. scent discrimination), the subject must be rewarded every time (constant reinforcement schedule). This is the only exception and should be adhered to without question. Coupling the positive reinforcement techniques with an attitude of kindness, love, and clarity of purpose will give the trainer an obedient and educated dog. But this dog, unlike those trained exclusively with negative reinforcement, will be an individual who is confident, able to be flexible, has a developed sense of humor, can think and reason, and has a desire to learn more.
When a trainer has developed a clear and concise method of communicating what exactly is desired to his dog, then their education will take a quantum leap forward. Such a method is clicker training which, when used with shaping techniques, enables a trainer to teach his dog easily and without force. So before a clicker is picked up and clicked for the first time, a trainer must have an idea how to "shape" the behavior he is teaching.
Shaping The successive approximations are also used in a larger context to shape a series of learned behaviors/tasks. These are termed behavior chains. Teaching a behavior chain will not be successful, however, if even one of the components in the chain has not been solidly learned or the behavior of the subject (eg. dog) has not been brought under stimulus control. Dogs who successfully learn behavior chains become multi-tasking individuals such as service dogs, obedience dogs, movie dogs, and search and rescue dogs. The secret to behavior chains is teaching the last behavior in the chain first. The subject is rewarded after this behavior and learns to look for the reward after this particular behavior. Then the second to last behavior is taught coupled with the last behavior before the reward. Other behaviors are then added in front of the previous learned behavior until the subject has learned a series of tasks/behaviors before being rewarded.
To illustrate, consider the many behaviors and steps in the retrieve on the flat. In the full exercise, the dog must learn to accept the dumb bell, carry it, relinquish it upon command as well as physically fetch the dumbbell. The dog must also know the finish or return to heel exercise. The steps that I use to teach retrieve on the flat are:
The following is Karen Pryor’s Ten Laws of Shaping from "Don’t Shoot the Dog!". I refer readers to this book for her thorough discussion of each point. "Clicker training" is a popularized term for operant conditioning using a conditioned reinforcer to teach/train positively. It is a valuable tool in shaping behaviors precisely. The term comes from the widespread use of a popular metal clicker encased in a rectangular plastic box. (These can be obtained from The Clickerpet.com) But the training can be done using other tools. Other conditioned reinforcers that have been used are objects making a unique and distinct noise (hair clip, ballpoint pen, stapler, small metal bottle cap with a "freshness pop up seal"), a small flashlight, a special touch on the side of a dog’s face, or even a thumbs up sign. There is a distinct difference between an unconditioned and a conditioned reinforcer. An unconditioned or primary reinforcer is something the animal would want even without training. A conditioned or secondary reinforcer is an initially meaningless signal or stimulus that stands for one or more primary reinforcers. The animal, with training, learns to want the conditioned reinforce. In the training, the conditioned reinforcer is paired directly with a primary reinforcer that the animal does want (such as a treat). The animal then learns that the desired object/treat will come after the sound, for instance, of the conditioned reinforcer. Thus the animal learns that click = treat =something he wants. When the pairing of the click and treat is solid, then the equation becomes click = something he wants. The something the animal wants can ultimately evolve to desire for approval, recognition and/or a sense of accomplishment. The clicker is a teaching tool and like many other aids in training, once the animal has learned the behavior/task that it was taught with the clicker, the clicker can be phased out of use. The behavior that was learned is then maintained with praise, approval, and a variable schedule of reinforcement with a primary reinforcer such as a food treat. Then the trainer can use clicker work to teach other behaviors and tasks.
Clicker work or other conditioned reinforcers appear to be a more powerful tool than other methods. Here re some reasons why this is so. A trainer who starts to explore the usages of conditioned reinforcers and clicker work in particular can easily become a convert. The ability to teach a variety of behaviors and to a variety of different species is only limited by the trainer’s imagination and patience. Clicker work requires trainers to think, plan, and develop patience. While the behaviors may seem like they take longer to teach, once they are learned properly, there is little need for problem solving/retraining. Refinements in behavior (such as a straighter sit, higher jump, or pricking ears) can be taught with clickers. Acknowledging desired behaviors (such as choosing the right glove in directed retrieve, downing quickly in drop on recall) can be positively reinforced at a distance without breaking the flow of the whole exercise. The dog is able to understand the process the click as information that he did the task correctly even while still doing the full exercise. Fortunately as clicker training increases in popularity, there are numerous resources that people can access for information. Many articles have appeared in various dog magazines. Videos and books are also available. Now, thanks to the Internet, there is a plethora of information on various websites devoted to clicker training. Two sites I enjoy are Karen Pryor’s www.clickertraining.com and Gary Wilkes’ www.clickandtreat.com. A newsletter (The Clicker Journal) is also available for clicker trainers. It can be accessed through www.clickertrain.com. Clicker training can be used with any age of dog from pups in the whelping box to seniors. Puppies and adolescents who are typically unfocused with short attention spans really benefit from clicker work. Reading about clicker work is no substitute, however, for experience. Experience comes directly from clicking, treating, and evaluating the results constantly. A training journal is very useful to chronicle and evaluate the dog’s progress. Dog training instructors who teach clicker training are available. It is advisable to have an instructor assist a new clicker trainer particularly until correct timing is learned. Clicker training – a precise way of teaching our dogs positively – is a powerful tool for a thinking handler. Best of all, it is fun for all. The trainer becomes more motivated and stimulated to "train" the dog and the dog quickly becomes a highly educated Happy Dog! In closing, I’d like to emphasize that it is important for us to recognize the various ways that our dogs (and others in our environment) learn. By recognizing the various behaviors which may or may not be interfering with what we are trying to teach, we will be better prepared to then teach what is desired with more ease and less confusion and frustration for both instructor and student. Knowing the theory behind how and why we teach our dogs a particular skill empowers the trainer and ultimately benefits our companions. |