|
In operant conditioning, reinforcement is any change in an animal's
surroundings that (a) occurs after the animal behaves in a given
way, (b) seems to make that behavior re-occur more often in the
future and (c) that reoccurence of behavior must be the result of
the change.
For example: you drop a coin in a slot on an unlabeled, unfamiliar
machine, and a potato chip immediately appears in an opening below.
If you then drop coins into the slot more often than you would have
if no potato chip had appeared, the appearance of the potato chip
is reinforcement for the coin-dropping behavior.
Note that it is the coin-dropping behavior that is reinforced,
not you. The potato chip serves as a reinforcer, reinforcing or
strengthening that behavior, only to the extent that such coin-dropping
subsequently occurs more often because of it.
The study of reinforcement has produced an enormous body of reproducible
experimental results. Reinforcement is the central concept and procedure
in the experimental analysis of behavior.
Schedules of reinforcement
A chart demonstrating the different response rate of the schedules
of reinforcement, each hatch mark designates a reinforcer being
givenWhen enough of the variations in an animal's surroundings are
reduced or "controlled," its behavior patterns after reinforcement
are remarkably predictable. When rates of reinforcement are adjusted
in particular ways, even very complex behavior patterns can be predicted.
A schedule of reinforcement is the protocol for determining which
responses (i.e., which individual occurrences of a given behavior)
will be reinforced. The two extremes are continuous reinforcement,
in which every response results in reinforcement, and extinction,
in which no response is reinforced.
Other schedules include
- Fixed ratio (FR), in which every nth response is reinforced.
- Fixed interval (FI), in which reinforcement occurs after the
passage of a specified length of time from the beginning of training
or from the last reinforcement, provided that at least one response
occurred in that time period.
- Variable ratio (VR), in which the number of responses required
between reinforcements varies, but on average equals a predetermined
number.
- Variable interval (VI), in which reinforcement occurs after
the passage of a varying length of time around an average, provided
that at least one response occurred in that period.
Ratio schedules produce higher rates of responding than interval
schedules. Variable schedules produce higher rates than fixed schedules.
The variable ratio schedule produces both the highest rate of responding
and the greatest resistance to extinction (that is, resistance to
"petering out"). One notable example is gambling behavior.
In the fixed ratio schedule, there's a pause after a reinforcer
is delivered. This is called a post-reinforcement pause. The fixed
interval schedule do produce post-reinforcement pauses, but they
are scalloped-shape. Any responses produced before the elapsed time
are not reinforced, therefore a subject has learned to respond at
a gradual rate.
Positive vs. negative
Positive reinforcement changes the animal's surroundings by adding
a stimulus: a physical object (like a food pellet or paycheck) or
energy (like light from a lamp).
Negative reinforcement changes the surroundings by removing an
aversive stimulus - such as turning off a painful electric current
or removing a hated ex-spouse's picture. Speaking colloquially,
an aversive stimulus is something the animal finds "bad;"
its removal is thus a "good" thing from the animal's point
of view.
| |
| some "bad" thing |
| (aversive
stimulus) |
|
| some "good"
thing |
| (reinforcing
stimulus) |
|
| presented |
|
positive reinforcement |
| taken away |
negative reinforcement |
|
Distinguishing "positive" from "negative" in
these cases is largely a matter of emphasis. For example, in a very
warm room, a current of external air serving as reinforcement may
be positive because it is relatively cool but negative because it
removes the uncomfortably hot air. Furthermore, the distinction
seems to have no real use in research or applied psychology, although
one may some day be found. Until then, many behavioral psychologists
simply refer to reinforcement or punishment—without polarity—to
cover all consequent environmental changes.
Punishment
Punishment is any change in an animal's surroundings that occurs
after a given behavior and seems to reduce the frequency of that
behavior. As with reinforcement, it is the behavior, not the animal,
that is punished. Whether a change is or is not punishing is only
known by its effect on the rate of the behavior, not by any "hostile"
features of the change. In positive punishment or type I punishment,
an experimenter punishes a response by adding an aversive stimulus
into the animal's surroundings (a brief electric shock, for example).
In negative punishment or type II punishment, a positive reinforcer
is removed (as in the removal of a feeding dish). As with reinforcement,
it is not usually necessary to speak of positive and negative in
regard to punishment.
Punishment is not a mirror effect of reinforcement. In experiments
with laboratory animals and studies with children, punishment decreases
the frequency of a previously reinforced response only temporarily,
and it can produce other "emotional" behavior (wing-flapping
in pigeons, for example) and physiological changes (increased heart
rate, for example) that have no clear equivalents in reinforcement.
Punishment is considered by some behavioral psychologists to be
a "primary process" – a completely independent phenomenon
of learning, distinct from reinforcement. Others see it as a category
of negative reinforcement, creating a situation in which any punishment-avoiding
behavior (even standing still) is reinforced.
Aversive stimulus, punisher, and punishing stimulus are synonyms.
Punishment may be used for (a) an aversive stimulus or (b) the occurrence
of any punishing change or (c) the part of an experiment in which
a particular response is punished.
Other reinforcement terms
- An unconditioned reinforcer, sometimes called a primary reinforcer,
is a stimulus or situation considered to be inherently reinforcing
(such as affection, food, or opportunity for sleep).
- A conditioned reinforcer, sometimes called a secondary reinforcer,
is a stimulus or situation that has acquired reinforcing power
after being paired in the animal's environment with an unconditioned
reinforcer or an earlier conditioned reinforcer (such as praise).
- A generalized reinforcer is a conditioned reinforcer that has
been paired with many other reinforcers (such as money).
- Differential reinforcement of incompatible behavior (DRI) is
used in reducing an already frequent behavior without punishing
it by reinforcing a specific incompatible response (like leaving
a room so that fighting with someone in it is not possible).
- In differential reinforcement of other behavior (DRO), any behavior
other than some undesired behavior is reinforced.
- Differential reinforcement of low response rate (DRL): a behavior
is reinforced only if it occurred infrequently. "If you ask
me for a potato chip no more than once every 10 minutes, I will
give it to you. If you ask more often, I will give you none."
- Differential reinforcement alternate behavior (DRA): the reinforcers
for the undesirable behavior are used instead for a more desirable
behavior. For example, a teacher will pay attention to students
who sit than those who walk or talk in class.
- In reinforcer sampling a potentially reinforcing but unfamiliar
stimulus is presented to an animal without regard to any prior
behavior. The stimulus may then later be used more effectively
in reinforcement.
- Social reinforcement involves various sorts of access to and
interaction with others.
- Satiation occurs when a stimulus that had reinforced some behavior
no longer seems to do so.
Shaping & chaining
Shaping involves reinforcing successive, increasingly accurate
approximations of a response desired by a trainer. In training a
rat to press a lever, for example, simply turning toward the lever
will be reinforced at first. Then, only turning and stepping toward
it will be reinforced. As training progresses, the response reinforced
becomes progressively more like the desired behavior. Chaining is
similar but involves reinforcing various simple behaviors separately
and then linking them together in a more complex series.
Controversies
The standard idea of behavioral reinforcement has been criticized
as circular, since it appears define a reinforcer by an effect it
will have in an as-yet unseen future. Other definitions have been
proposed, such as F. D. Sheffield's "consummatory behavior
contingent on a response," but these are not broadly used in
psychology.
History of the terms
In the 1920s Russian physiologist Ivan Pavlov may have been the
first to use the word reinforcement with respect to behavior, but
(according to Dinsmoor) he used its approximate Russian cognate
sparingly, and even then it referred to strengthening an already-learned
but weakening response. He did not use it, as it is today, for selecting
and strengthening new behavior. Pavlov's introduction of the word
extinction (in Russian) approximates today's psychological use.
In popular use, positive reinforcement is often used as a synonym
for reward, with people (not behavior) thus being "reinforced,"
but this is contrary to the term's consistent technical usage. Negative
reinforcement is often used by laypeople and even social scientists
outside psychology as a synonym for punishment. This is contrary
to modern technical use, but it was B. F. Skinner who first used
it this way in his 1938 book. By 1953, however, he followed others
in thus employing the word punishment, and he re-cast negative reinforcement
for the removal of aversive stimuli.
|