Reconceptualizing Validity for Classroom Assessment

Moss, P. A. (2005). Reconceptualizing Validity for Classroom Assessment. Educational Measurement: Issues and Practice, 22(4), 13–25.


This article explores the shortcomings of conventional validity theory for guiding classroom assessment practice and suggests additional theoretical resources from sociocultural theory and hermeneutics to complement and challenge conventional theory. To illuminate these concerns and possibilities in a concrete context, the author uses her own classroom experience in teaching a qualitative research methods course. The importance of examining cases of assessment practice in context for developing, teaching, and evaluating validity theory is discussed.

Keywords: classroom assessment, hermeneutics, sociocultural theory, validity

Key Ideas

The question I consider in this article is to what extent does our understanding of validity in the measurement profession “assure that the relevant issues are addressed in classroom assessment and what role might other theoretical perspectives play in providing a more robust validity framework to guide thinking and action?

Assumptions of validity theory and alternative perspectives

The following assumptions are considered:

  • Assessment is a discrete activity.
  • The focus of validity theory is on an assessment-based interpretation and use.
  • The unit of analysis for an assessment is the individual.
  • Interpretations are constructed by aggregating discrete pieces of evidence to form an interpretable overall score.
  • Consequences are an aspect of validity if they can be traced to a source of construct under-representation or construct irrelevant variance (cf. AERA et al., 1999; Messick, 1989).

Psychometric characterizations of learning infer learning from observed changes in individuals' performances over time.

  • changes are internal to the learner
  • vertical hierarchy of increasingly generalized and abstract skills and knowledge

Sociocultural perspective suggests that learning is perceived through changing relationships among the learner, other human participants, and the material and symbolic tools available.

  • not just a new skill, but a new identity and social position w/i the discourse or CoP
  • learning changes who we are by changing our ability to participate and belong (Wenger, 1998)

Author's thinking about validity informed by interpretive social sciens, esp philosophical hermeneutics. Like psychometrics, hermeneutics interprets human products, expressions or actions by combining information across data sources and evidence [assessment as research].

  • In psychmetrics, scores on independent tests are weighted and aggregated to form an overall grade,
  • in hermeneutics, evidence is gathered holistically and integrated with interpreting human phenomena, seeks to understand the whole in light of its parts, repeatedly tests interpretations against available evidence until each part can be accounted for in an integrated whole.
    • known as hermeneutic circle
    • evidence is used to illuminate and challenge bias

One case of classroom assessment Practice

  • Qualitative Methods in Educational research co-designed and taught with Lesley Rex
  • different perspectives contribute to validity b/c they can challenge each other's biases
Conception of Assessment: ‘Assessment is a discrete activity”
  • educational measurement is predicated on the idea that assessment can and should be considered a discrete aspect of the context in which it is used.
  • assessment is seen as a distinct phase in the T&L process
  • this has consequences for learning design
  • in a CoP, the goal is to engage learners in 'practices' and 'communities' to do the work of the community (Lave & Wenger)
  • assessment becomes a way of looking at the available evidence from learning activities that focus learners' practice as learners and researchers
Focus of Validity: “The focus of validity theory is on an assessment based interpretation and use”
  • conventionally, validity is conceptualized as referring to an inference or interpretation, and a use or action based on a test score

    The 1999 Standards defines validity as “the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (p. 9). the validity argument focuses on an interpretation or action based on an instrument. While this focus is sometimes relevant and useful, it is both too small and too large for most of the decisions I need to make.

  • no need to draw and warrant fixed interpretations of student capabilities; rather the task is to make those interpretations obsolete.
  • need to make in-the-moment decisions that help students learn, as individuals and as members of a learning community
  • sometimes, the interpretation is so consequential that there is need to study the validity more systematically
    • however, evidence forming this inference rarely draws from a single instrument
  • inferences and interpretations drawn throughout a course should be ephemeral and intended to inform next steps
Unit of Analysis: “The unit of analysis for an assessment is the individual”
  • educational measurement is typically used to develop interpretations that characterize individuals, or rather, classes of individuals with the same scores.
  • group-level interpretations are based on aggregates of individual scores
  • whatever the unit of analysis, scores must be comparable across units and contexts --> standardized Assessment
  • to enhance comparability, we control the influence of context (we fix the context) so that each individual experiences the same test and contexts of administration.
    • context is separate from inferences about the individuals
  • we make untested assumptions about the generalizability of students' performance to other contexts (Danziger, 1990)
  • authentic learner work makes the standardization infeasible

    As Wenger suggests: One problem of the traditional classroom format is that it both too disconnected from the world and too uniform to support meaningful forms of identification. It offers unusually little texture to negotiate identities: a teacher sticking out and a flat group of students all learning the same thing at the same time. Competence, thus stripped of its social complexity, means pleasing the teacher, raising your hands first, getting good grades (p. 269)

  • student performance is affected by the description of the assignment, but also myriad other factors and choices they make, resources they find, ongoing interactions with each other, and feedback they receive
    • changing any one of these features and 'accidents' changes the nature of the assessment.
    • in order to interpret and/or evaluate a students' performance, I need to understand the influence of the contexts in which it was produced and to understand the factors that shape that performance.

      consistent with a sociocultural prespective, the most appropriate unit of analysis is the social situation which entails the recursive relationship between person and context ... and claims about individuals must be grounded in interaction

As Mehan notes, By moving beyond the states and traits of individuals to social situations as the unit of analysis, it does not blame low achieving students’ school difficulties on their lack of motivation, diminished linguistic skills, or deficient cognitive styles. . , . [Students’performances can be] recast as collaboratively constructed and continuously embedded in face-to-face interaction in social environments. (pp. 251, 254)”

  • the most useful sort of evidence is that which documents the interaction -- the ongoing effects of actions on other actions
  • requires explicit documentation of the interaction so that it can be viewed with various questions and associated analytical lenses. this could involve:
    • examining video/audio tapes of classroom discourse or interactions with individual Students
    • case studies of individual students examining changes in their performance across time in light of the interactions that shaped them
    • the effect of a particular activity where I examine student work in response to variations in the activity and interview students about how they interpreted and used the resources available to them
    • growing tradition of action research in teaching
    • Teaching portfolios can prompt this sort of systematic analysis

This idea moves the focus from an individual blog post or other artifact to the network of others' ideas that is demonstrated, critiqued, interpreted, and assimilated into an individual's way of being. This network can become visible in the hyperlinks between posts and ideas on a networked course, both within and outside any particular cohort.

Combining Evidence: "Interpretations are constructed by aggregating judgments from discrete pieces of evidence to form an interpretable overall score”

Having multiple sources/pieces of evidence to inform a consequential interpretation/decision is a fundamental feature of the epistemology and ethics in any of the social science perspectives that I have encountered.

  • also, illuminating and challenging biases is fundamental
  • practices of educational measurement are full of techniques for aggregating evidence to a total score, but little to offer when aggregation is not desirable
  • aggregation entails that judgments be made about discrete pieces of information so that the judgments can be algorithmically combined (weighted) to form a 'score' that has a predetermined interpretation associated with it.
  • validity is necessarily tied to test scores
  • however, even the testing Standards hedge on this...

    In educational settings, a decision or characterization that will have major impact on a student should not he made on the basis of a single test score. Other relevant information should he taken into account if it will enhance the overall validity of the decision. (p. 146)14

  • yet, psychometrics has little to offer on how to combine 'other relevant information' into a warranted inference
Common features of validity in hermeneutics
  • interpretations are formed and refined through an iterative process of repeated testing against available Evidence
  • multiple sources of evidence across time and context enhances Validity
  • attempting to uncover problems with a proposed interpretation (looking for disconfirmatory evidence) is central to the development of warranted Interpretations
  • interpretations and supporting evidence are best presented in a way that the learner can evaluate the interpretation and evidence and be allowed to offer counter interpretations and examples
  • valid interpretaions are justified, not imposed
  • multiple readers/interpreters strengthens validity
    • disagreement among interpreters is also a source of validity
Role of Consequences: “Consequences are an aspect of validity only if they can be traced to a source of construct underrepresentation or construct irrelevant variance”

Whatever one’s definition of validity, with classroom assessment, understanding these effects is crucial to sound practice. I might go so far as to argue that validity in classroom assessment-where the focus is on enhancing students’ learning-is primarily about consequences. Assuming interpretations are intended to inform instructional decisions and that instructional decisions entail interpretations about students’ learning, it is on evidence of their (immediate, long-range, and cumulative) effects on which their validity primarily rests.