III. Rating Scales

Description of UTOP Sections

The UTOP is divided into four rating sections: Classroom Environment, Lesson Structure, Implementation, and Mathematics/Science Content.

The Classroom Environment section assesses the degree to which the classroom environment is conducive to the learning of mathematics and/or science, and how the teacher facilitates and creates this setting. This includes pre-existing structures (like classroom management routines and room setup) that the teacher has in place relating to management of the environment.

The Lesson Structure section assesses how well the teacher plans for and organizes the lesson, such as the sequence of learning activities during the class period, and the degree to which this organization facilitates the learning of mathematics and/or science. The focus in this section is on the potential for student engagement and learning as designed and set up by the teacher through the instructional strategies and activities the teacher chooses to employ—not the actual implementation of those strategies and activities.

The Implementation section assesses the instructional decisions, strategies, and practices the teacher actually employs during the lesson, how well the lesson activities flow, and whether the teacher ensures that all students remain engaged in and interact with the content and concepts that are the focus of the lesson. This section also assesses how critical and reflective the teacher is about his or her instruction after the lesson has concluded, through analysis of data collected by teacher interview/survey.

The Mathematics/Science Content section assesses the quality of the mathematics and/or science content being delivered by the teacher and constructed by students during the class period. Although there are indicators within this section that measure the teacher’s content knowledge, the more important focus of this section is meant to address the quality of the content students are exposed to and grappling with during class. Content to be learned by the students includes that which is directly communicated by the teacher and developed through other means like lab activities, discussion, and independent practice. It is important to note that the synthesis rating descriptors (e.g., superficial content knowledge) are not meant to assess the teacher’s content knowledge but instead focus on the overall quality of the content students are learning during the class period.

Rating Lessons on the UTOP

To use the UTOP as intended, scores should be assigned only after the observation has taken place, and the rater has had an opportunity to review the video or field notes as needed to provide evidence for each rating assigned. The UTOP is rated on a 1 to 5 Likert scale, with an NA (Not Applicable) option for six items. Not Applicable (NA) is an appropriate rating score only for the five indicators that specifically mention an NA option:

1.2 Classroom Interactions: Interactions reflected collegial working relationships among students.
2.6 Lesson Reflection: The teacher was critical and reflective about his/her practice after the lesson, recognizing the strengths and weaknesses of their instruction.
3.6 Implementation Safety: The teacher's instructional strategies included safe, environmentally appropriate, and ethical implementation of laboratory procedures and/or classroom activities.
4.4 Content Assessments: Formal assessments used by teacher (if available) were consistent with content objectives (homework, lab sheets, tests, quizzes, etc.).
4.5 Content Abstraction: Elements of mathematical/scientific abstraction were used appropriately.

All other indicators must be assigned 1 to 5 ratings, even if the rater feels the indicator is not applicable to the observed lesson. Rating boxes should not be left blank.

In general, the numerical values for the Likert scale on the UTOP can be interpreted as follows:

Not observed at all / Not demonstrated at all
Observed rarely / Demonstrated poorly
Observed an adequate amount / Demonstrated adequately
Observed often / Demonstrated well
Observed to a great extent / Demonstrated to a great extent

Each numerical value on the rating scale corresponds to two descriptors, one descriptor that measures the frequency of the occurrence of the indicator (observed rarely, observed often, etc.), and one descriptor that is intended to capture the quality of the implementation of that indicator (demonstrated poorly, demonstrated well, etc.).

For some indicators, only one of the descriptors may be appropriate. For instance, indicator 2.1 reads, “The lesson was well organized and structured.” A measure of the frequency of the occurrence of this indicator would be inappropriate. In this case, the rater would need to refer only to the second set of descriptors that measure the quality of the lesson structure as described by the indicator.

For other indicators, descriptors of both frequency and quality may be appropriate. For instance, indicator 4.7 reads, “Appropriate connections were made to other areas of mathematics or science and/or to other disciplines (including non-school contexts).” When scoring this indicator, the rater should take into account the quality as well as the frequency of the connections the teacher is making.

With respect to scoring teachers on the frequency with which they implement indicators, it is important for the rater to remember that some lessons will include more opportunities to exhibit certain characteristics than others. How often the teacher demonstrates the characteristics of any indicator should be considered relative to the number of opportunities available.

Synthesis Ratings

Each of the four scored sections of the UTOP concludes with a Synthesis Rating that is intended to be an overall rating for each area. The synthesis rating boxes contain scores from 1 to 5 with corresponding descriptors.

The synthesis ratings are not intended to be a mathematical average of the indicator scores making up each section, but are designed to allow the rater to describe his or her overall impression, using a holistic view of the domain and providing a “human average” of the entire lesson. Evidence to support the score chosen should be typed in the open space after the Synthesis Ratings boxes.

Supporting Evidence

Immediately after each indicator in the UTOP, space is provided for raters to present specific supporting evidence for their scores. This is done so that raters and other researchers can understand why a specific score was given long after the observation has taken place, and so that raters can achieve inter-rater reliability by comparing and discussing the supporting evidence they used to obtain different numeric scores. Supporting evidence needs to be entered in for each indicator rating with no exceptions. Supporting evidence does not need to be entered for a synthesis rating but is recommended, particularly if the data is to be shared with the observed as feedback for professional development and improvement of practice.

In the next section, general descriptions for each possible rating are given for each indicator in order to promote consistency in the scoring across raters. Please carefully review the descriptions for each item prior to completing a UTOP observation.

Also provided in the next section are examples from specific lessons of each possible rating of each indicator. These examples show the types of supporting evidence that are typically cited for each level, as well as the typical format and level of detail of supporting evidence. Supporting evidence should be specific, factual (i.e., no personal opinions), and evidence-based, and can range between 2 and 6 sentences.

UTOP Home

About the UTOP

UTOP Instruments, User Manuals, and Samples

I. Background Information

II. Lesson Overview

III. Rating Scales

IV. Summary Comments

V. Post-Observational Teacher Interview/Survey

VI. Teacher Demographic Questionnaire