|
Psychometrics
Psychometrics is the field of study (connected to psychology and statistics)
concerned with the measurement of "psychological" aspects of
a person such as knowledge, skills, abilities, or personality. The field
of Psychometrics is primarily concerned with differences between individuals
and employs statistical tools such as normal distribution and factor analysis.
Measurement of these unobservable phenomena is difficult and much of the
research and accumulated art of this discipline is designed to reliably
define and then quantify. Critics, including "hard science"
practitioners and social activists, have argued that such definition and
quantification is impossibly difficult and that such measurements are
very often misused (although users of psychometric techniques can reply
that their critics often misuse data by not assessing them with psychometric
criteria). Figures who made significant contributions to psychometrics
include Karl Pearson, L. L. Thurstone, Georg Rasch and Arthur Jensen.
Significant critics include the late Stephen Jay Gould.
Much of the early work in psychometrics was developed in order to measure
intelligence. More recently psychometric theory has been used in measurement
of personality, attitudes and beliefs, academic achievement, and in health
related fields, to measure quality of life.
Psychometric methods involve several distinct areas of study. First,
psychometricians have developed the theory of mental tests. This work
can be roughly divided into classical test theory (CTT) and the more recent
item response theory (IRT). Second, psychometricians have developed methods
for working with large matrices of correlations and covariances. Techniques
in this general tradition include factor analysis (finding important underlying
dimensions in the data), multidimensional scaling (finding a simple representation
for high-dimensional data) and data clustering (finding objects which
are like each other). In these multivariate descriptive methods, users
try to simplify large amounts of data. More recently, structural equation
modeling and path analysis represent more rigorous, statistically sophisticated
approaches to solving this problem of large covariance matrices. These
methods allow statistically sophisticated models to be fitted to data
and tested to determine if they are adequate fits.
The key concepts of classical test theory are reliability and validity.
A reliable measure is measuring something consistently, while a valid
measure is measuring what it is supposed to measure. A reliable measure
may be consistent without necessarily being valid, .e.g., a measurement
instrument like a broken ruler may always under-measure a quantity by
the same amount each time (consistently), but the resulting quantity is
still wrong, that is, invalid. For another example, a reliable rifle will
have a tight cluster of bullets in the target, while a valid one will
center that cluster around the center of the target.
Both reliability and validity may be assessed mathematically. Internal
consistency may be assessed by correlating performance on two halves of
a test (split-half reliability); the value of the Pearson product-moment
correlation coefficient is adjusted with the Spearman-Brown prediction
formula to correspond to the correlation between two full-length tests.
Other approaches include the intra-class correlation (the ratio of variance
of measurements of a given target to the variance of all targets). A commonly
used measure is Cronbach's a, which is equivalent to the mean of all possible
split-half coefficients. Stability over repeated measures is assessed
with the Pearson coefficient, as is the equivalence of different versions
of the same measure (different forms of an intelligence test, for example).
Other measures are also used.
Validity may be assessed by correlating measures with a criterion measure
known to be valid. When the criterion measure is collected at the same
time as the measure being validated the goal is to establish concurrent
validity; when the criterion is collected later the goal is to establish
predictive validity. A measure has construct validity if it is related
to other variables as required by theory. Content validity, or face validity,
is simply a demonstration that the items of a test are drawn from the
domain being measured; it does not guarantee that the test actually measures
phenomena in that domain.
Predictive or concurrent validity cannot exceed the square of the correlation
between two versions of the same measure.
Item response theory models the relationship between latent traits and
responses to test items. Among other advantages, it has the ability to
provide a reliable estimate of the exact score of a test-taker on the
latent trait. For example, a university student's knowledge of history
can be deduced from his or her score on a university test and then be
compared reliably with a high school student's knowledge deduced from
a less difficult test. Scores derived by classical test theory do not
have this characteristic, and assessment of actual ability (rather than
ability relative to other test-takers) must be assessed by comparing scores
to those of a norm group randomly selected from the population. In fact,
all measures derived from classical test theory are dependent on the sample
tested, while those derived from item response theory are not.
For some, the field of psychometrics has controversial aspects. In part,
the controversy involves the very notion of standardized tests. For others,
the problematic aspects of psychometrics involve the history of the field,
which involve aspects of eugenics.
|