|
Operant Conditioning
Operant conditioning, so named by psychologist B. F. Skinner, is the
modification of behavior (the actions of animals) brought about by the
consequences that follow upon the occurrence of the behavior. In simple
terms, behavior operates on the environment producing various effects.
The phrase operant conditioning draws out a crucial distinction from Pavlovian
conditioning, which Skinner termed respondent - namely that respondent
conditioning, like the dog's salivation or the knee-jerk, has neither
much effect on the environment, nor is its occurrence changed by its effectiveness
or ineffectiveness in the environment. These two types of conditioning
are also distinguished because they are conceptually different, as their
names imply – operant conditioning is explained by its consequences
(that is, functionally) while respondent conditioning is explained by
its antecedents (that is, causally).
This distinction opens up a much-missed parallel with involuntary behavior
or reflexes and voluntary behavior or acts. The former occur essentially
no matter what given some stimulus and have nothing to ensure that they
act on the rest of the world, while the latter are affected by how well
or poorly they work and hence are much more likely to do work for the
animal in the world.
Operant conditioning, sometimes called instrumental conditioning or instrumental
learning, was first extensively studied by Edward L. Thorndike (1874-1949).
Thorndike's most famous work investigated the behavior of cats trying
to escape from various home-made puzzle boxes. When first constrained
in the boxes the cats took a long time to escape from each. With experience
however, ineffective responses occurred less frequently and successful
responses occurred more quickly enabling the cats to escape in less and
less time over successive trials. In his Law of Effect, Thorndike theorized
that successful responses, those producing satisfying consequences were
"stamped in" by the experience and thus occurred more frequently.
Unsuccessful responses, those producing annoying consequences, were stamped
out and subsequently occurred less frequently. In short, some consequences
strengthened behavior and some consequences weakened behavior. This effect
was (and sometimes still is) described as involving a strengthening of
the association between the response and its effect, suggesting some kind
of parallel to Pavlovian conditioning.
The same idea behind the Law of Effect is described in Skinner's terms
by the notion of reinforcers. Reinforcers are those events that strengthen
a response, i.e., whose rate controls the rate of that response. This
neatly sidestepped Thorndike's satisfaction, resulting in a term which
was less theoretical and more simply descriptive: any event whose presences
and absences control how often a response occurs is by definition a reinforcer
for that response. The problem became not what 'satisfying' meant, but
the better-defined question of which events would reinforce which responses
of which animals under which conditions. Skinner also innovated in making
new definitions of stimulus and response which were similarly to be adapted
to the behavior actually observed. To Skinner, the discriminative stimulus
(SD) was not a single physically defined kind of event, but an entire
class of events (possibly quite physically different) which elicited the
same response. (In contrast with the reflex notion of stimulus, a discriminative
stimulus was held to increase the probability of response.) Skinner's
notion of the operant-conditioning response, called an operant, was similarly
distinct from the physiologically defined reflex and classically conditioned
responses, being a class of responses which shared a consequence - e.g.,
depressing a lever, which is commonly done by rats in several distinct
but functionally equivalent ways. The relation between the discriminative
stimulus, the operant response, and the reinforcer has often been called
the three-term contingency - under these (functional) conditions, this
(functional) response will yield this reinforcer.
The two kinds of reinforcement include positive reinforcement and negative
reinforcement. Positive reinforcement occurs when a behavior (response)
is followed by a pleasant stimulus that rewards it. In the Skinner box
experiment, positive reinforcement is the rat pressing a lever and receiving
a food reward. Negative reinforcement occurs when a behavior (response)
is followed by an unpleasant stimulus being removed. In the Skinner box
experiment, negative reinforcement is a loud noise continuously sounding
inside the rat's cage until it presses the lever, when the noise ceases.
In both kinds of reinforcement, the response or behavior is increased.
According to Skinner's theory of operant conditioning, there are two
methods of decreasing a behavior or response. These can be by punishment
or extinction. Punishment occurs when a behavior (response) is followed
by the addition of an unpleasant stimulus or the removal of a pleasant
stimulus. In the Skinner box experiment, this is the rat pushing the lever
and receiving a painful electric shock directly afterward. Extinction
occurs when a behavior (response) that had previously been followed by
a pleasant stimulus is followed by no stimulus at all. In the Skinner
box experiment, this is the rat pushing the lever and being rewarded with
a food pellet several times, and then pushing the lever again and never
receiving a food pellet again. Eventually the rat would learn that no
food would come, and would cease pushing the lever.
Both punishment and extinction serve to decrease behaviors, although
Skinner stressed that extinction was the more powerful of the two. Often
there are other factors involved in real life situations that cannot simply
be eliminated, and punishments are not a great enough deterrent to prevent
particular responses, as there are still rewards associated with those
said behaviors. According to Skinner, only by completely eliminating the
rewards (positive reinforcements) that follow particular behaviors will
people (or animals) be sufficiently discouraged from repeating those behaviors.
For example, this is one reason why many convicted felons are repeat felons--prison
sentences are forms of punishment, but sometimes the punishment is not
enough. The felons will go back to the crimes previously committed because
they receive the same rewards they previously received for committing
those crimes, and that behavior-reward connection is a greater motivator
than the punishment-behavior deterrent connection.
|