And now for something entirely unrelated to superheroes.

### General Information

Title: Fundamentals of Item Response Theory

Authors: Ronald K. Hambleton, H. Swaminathan, and H. Jane Rogers

Original Publication Date: 1991

ISBN: 0-8039-3647-8

Cover Price: None available. The publisher’s website lists it for 28 British pounds.

Buy from: Amazon.com or Amazon.ca

### Subject Matter

This volume examines the basics of Item Response Theory, a form of text analysis that allows the examiner to ensure the maximum possible quality of a test on a question by question basis.

### Required Background

The only conceptual requirement is a basic understanding of classical assessment theory. In retrospect, you really only need to know two things about classical testing, as the rest is mentioned in detail when the comparisons are made between classical theory and item response theory:

- Difficulty – tests (or test questions) had a difficulty classically defined as the proportion of students that get the item correct. ie. if 90% of the class got question 4 correct, then question 4 has a difficulty of 0.90. Higher values counterintuitively indicate easier questions in the classical theory. (Item response theory also has a parameter called the difficulty parameter, but it has a definition which, while a little more complicated, has far more intuitive behaviour. This is because an item’s difficulty is defined in relation to the ability level of a student who has a 50% chance of getting the question correct.)
- Discrimination – classical theory had a quantity which measures how well an item “discriminates” between high and low performing students. A class would be sorted by their total score on a test, and then divided into halves, thirds, or quarters (depending on the size of the class). The discrimination is then calculated by comparing the number of students in the high-performing portion of class who get the question correct to the number of students in the low-performing portion of class who get the question correct. Item reponse theory provides a means to define this discrimination more precisely, and using the results of every student in the class without dividing them into chunks of almost arbitrary size.

The mathematical requirements are minimal. If you can do high school algebra, you can read the text and follow along conceptually, though you may not be able to some of the exercises at the ends of the chapters. This is the minimum requirement to follow along. In order to do all problems, you must also be comfortable enough with statistics to proceed with computations using only statements such as “calculate the point-biserial correlation” as a guide. In order to complete the derivations which the authors chose to omit, you must also be able to compute the derivatives of rational expressions involving exponential and logarithmic functions. First year post-secondary calculus and statistics will likely be enough to satisfy all of the mathematical requirements of the text.

### High Point

I commend the authors on the mathematical approach taken. If you’re mathematically inclined, you’ve got everything you need to fill in the details. If you’re not mathematically inclined (as a frighteningly large number of educators aren’t,) the authors derive everything for you, explain the important characteristics and limitations of the

models, and even suggest software packages that can do all the dirty work for you, essentially reducing the complexity of the field to data entry and interpretation.

### Low Point

My only real complaint about the text is the seemingly disproportionate amount of time spent on the works of these three authors. This was my first introduction to the topic, so I don’t know enough about the names to know if they really are that good, or if this is just self-promotion. Everything presented is logical and reasonable, but they may spend six pages dealing with one particular model introduced in an earlier work by this team, only to end the section with a sentence or two mentioning that some other author or collaboration has proposed an alternative detailed in some citation. The nature of this alternative is rarely revealed in even the most cursory detail.

### The Scores

I found the *clarity* of the text to be excellent. Though I am confident that the non-mathematically inclined would also be pleased with the content, it’s hard for me to judge that myself. What I can assure readers is that the steps and details which would need to be worked out using post-secondary math, and even using high school math, have been done for you. If one is using one of the three main models presented here for item response theory, then every equation one might need has been derived and provided with the relevant variable(s) already isolated. The authors really focus on the concepts rather than the math. I give it 5 out of 6.

The *structure* was well planned, and laid out effectively in the introduction. The models and analysis procedures build in complexity in a natural order. I give it 5 out of 6.

The *examples* provided were adequate to cover the relevant topics, and were, at times, derived using real world data to analyze. Again, the mathematical details were sparse, and the focus is on interpretation of the results. (This is an education text instead of a math text, after all. Also, real life applications would probably have computers do the analysis and educators simply interpret the results, anyway.) The examples also function well when clarifying a particular concept. I give it 6 out of 6.

The *exercises* were useful, and solutions were provided, but more would have been useful. There are no more than three or four per chapter, and sometimes as few as one. The solutions, though provided, appear immediately after the questions rather than in an appendix, which makes it tempting to cheat rather than solve the problems yourself. I give it 3 out of 6.

It’s hard to judge the *completeness* of an introductory text. This never pretends to offer anything beyond the fundamentals, and is not meant to be a comprehensive view of the topic. As such, it seems to hit many of the main points. I am a little concerned by the fact that it’s a 15 year old text in a new and rapidly growing field, so I’m not sure how up to date it is. For example, this text presents parametrization of ability and item descriptions as being a mandatory and indispensible part of the field, yet the publisher also offers this tempting volume as part of the same “Measurement Methods for the Social Sciences” series, published 11 years after the volume reviewed here. I give it 4 out of 6.

The *editing* was solid, with a smooth flow to text and no errors or typos that I noticed. I give it 6 out of 6.

*Overall*, this is an effective introduction to the topic. I’d recommend it to all educators. I give it 5 out of 6.

In total, *Fundamentals of Item Response Theory* receives 34 out of 42.

Self-contained usefulness?

As a "fundamentals" book, they probably didn’t see the need for keeping absolutely on top of the changes in the field — aside from statements like you mention, the same basics are probably as important now as they were when the book was first written.

That said, "fundamentals" books are often intended to be introductions to further study. Will this book, on its own, help educators to improve their testing techniques?

An unrelated question: What kind of testing does IRT (or at least this book) cover?

Re: Self-contained usefulness?

IRT could, in theory, be the basis of any and every test given. If I were still a teacher in a traditional classroom, I’d use it for every test I write. It means more work after the test than simply marking the test, but I think it would be worth it, particularly if you plan on banking test questions for use in later years. As it rates difficulty question by question so effectively, I could even correlate the results by topics to find the topics that give students the most difficulty and work on changing the way I approach teaching that topic, or even simply spending more time on it.

Re: Self-contained usefulness?

Cool. So it’s a tool for evaluating the effectiveness of tests after they’ve been given. (Makes sense.)

Is there anything inherent to IRT to tell you whether a question’s difficulty is due to the question vs to understanding of the material? That would be important to distinguish if this is going to guide teaching, as opposed to test design.