Chapter 1

Introduction to Educational and Psychological Measurement

  • subspecialty combining statistics and educational psychology

    development, and vetting of educational and psychological assessments using a wide variety of tools and methodological approaches

  • goal is to create the highest quality measures for stakeholders
  • aspects of the process
    • specifying the target
    • assessing item and scale quality
    • estimating inter-rater agreement
    • providing evidence of construct validity
    • setting performance standards
    • identifying cognitive skills used to perform a task

Process outlined in: American Educational Research Association, American Psychological Association, National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing (U.S.). (2014). Standards for Educational and Psychological Testing. American Educational Research Association.

At the center of this process is the measurement professional who works to ensure that these scales are doing what they were designed to do and that scores are supported by the proper evidence to justify use. (p. 2)

  • the idea that a single number can predict success is not justified and is against the Standards
  • how can psychometricians better the situation around testing controversies?
  • Minnesota Multiphasic Personality Inventory (1920s)
Modern testing
  • 1947 - Educational Testing Service (ETS)
  • early work in multi-factor analysis was slowed by a lack of computational power until the 1970s
Item Response theory
  • examines individual item responses and using that info to better understand the construct
Recent developments
  • computer adaptive Testing
  • automated test assembly and scoring
  • automated item generation
  • cognitive diagnostic Models
  • video games as assessments
  • use of assessments to delve deeper into IR

all come from advances in computing power, improved estimation algorithms, and user-friendly software

Future developments
  • important to recognize the accelerating growth in computing power, large data sets, data mining, machine learning etc will lead to greater advances

Critical to remember that the work of psychometricians is to

try to understand something about an individual person to make an accurate and fair decision to the benefit of that individual. Too often, this goal can become lost in our psychometric machinery. (p. 6)

Types of measures

  • these categories are not immutable or well-defined, but fluid and overlapping

A standardized measure is one that has been administered following a set of consistent directions to a (typically) large sample of individuals from the target population. This ensures that testing conditions are the same for all individuals assessed. In turn, these data, from a representative sample, are used for the establishment of norms. Norms are estimates of typical performance in the population and serve as a way in which performance of future examinees can be judged. (p. 7)

  • GRE, SAT, Dellis-Kaplan executive Functioning System (D-KEFS)

A non-standardized assessment does not yield scores based on a norming sample, but instead typically report scores in terms of a raw value or a percentage. (p. 7)

  • standardized and non-standardized are not better or worse, just different in how the scores are obtained and reported
Type of construct measured (p. 7)

Achievement tests, which are extremely common in educational assessment, measure what an examinee has learned in a specific academic domain. In contrast, aptitude tests are designed to help determine an individual’s potential for learning or performance in some discipline.


The validity evidence for these exams is not sufficient to support an implication inference (i.e., students with high exam scores perform in college and life at higher levels than do students with lower exam scores), perhaps the inference we care most about. In fact, when this exam was given to professionals in Rhode Island, only 40% had scores high enough to earn a high school diploma (Borg, 2013). (p. 8)

Personality assessments may be best thought of as measuring an individual’s disposition, often (though by no means always) with an eye toward identifying potentially problematic issues.

Finally, employment counselors might use interest inventories as a way of assessing a person’s suitability for certain types of employment or career decision making. These scales yield scores that reflect how likely an individual is to find certain types of work interesting and rewarding, for example.

Type of data collected (p. 8)

Tests of maximal performance require examinees to engage in a task with the goal of completing it as well as they possibly can.

In contrast, self-report measures are designed to elicit responses reflecting the attitudes, feelings, and opinions of individuals about one or more issues.

Finally, some assessments are observational, meaning that an individual’s behavior or performance is scored by another person using a formal rubric and list of target behaviors.