Yes, I’ve been on something of an education kick lately. This time around, I’ve reviewed an advanced text on item response theory, which is the theory that governs the way people respond to questions, whether they be in class questions or surveys.

General Information

Title: Introduction to Nonparametric Item Response Theory
Authors: Klaas Sijtsma and Ivo W. Molenaar
Original Publication Date: 2002
ISBN: 0-7619-0813-7
Cover Price: There isn’t one, but it can be found in the $50-$60 range.
Buy from: Amazon.com or Amazon.ca

Subject Matter

The previously reviewed text on item response theory focused on parametric item response theory, in which the assessor could use 1-3 parameters to describe the way individuals respond to test items, with emphasis on describing the test items themselves and determining the level of ability for each respondant. This later volume in the same series describes a model that is less concerned with determining the exact ability needed to answer a given question, and instead gears the models for ranking the relative abilities of the respondants. So, the first volume can be used to determine that, say, Alice and Bob have abilities of 2.5 and 1.3 respectively (on some scale) while this volume will provide a model that merely states that Alice has greater ability than Bob. With this model, one will determine a ranking of respondants with better accuracy, but will not be able to give their abilities any absolute meaning. The two models have areas of compatibility, which are clearly distinguished here, and may be used in conjunction for those areas.

The table of contents (listing chapters without appendices) is as follows:

  1. Models for Mental Measurement
  2. Philsophy and Assumptions Underlying Nonparametric IRT Models for Dichotomous Item Scores
  3. The Monotone Homogeneity Model Applied to Transitive Reasoning Data
  4. The Monotone Homogeneity Model: Scalability Coefficients
  5. Automated Item Selection Under the Monotone Homogeneity Model
  6. The Double Monotonicity Model
  7. Extension of Nonparametric IRT to Polytomous Item Scores
  8. Item Analysis Using Nonparamtric IRT for Polytomous Items

Required Background

A person should have a background in parametric item response theory (covered sufficiently here) as well as a background in statistics sufficient to understand correlations, covariance, standard deviations, means, and expected results.

What I Was Looking For, And What I Found

I was looking for another way to model assessment questions to better evaluate students from a criterion referenced standpoint. As it turns out, my assumptions about the field were entirely off the mark: this theory does not and never claims to do what I was hoping for. Instead, it’s entirely about ranking relative levels of performance, or measuring latent traits. While it has tremendous applications for opinion polls, customer satisfaction surveys, evaluating candidates for entrance to programs restricted in size (such as college entrance), and the like, it provides little or no criterion-referenced information indicating how respondants perform on any absolute scale.

High Point

The scalability coefficients described in chapter four are widely applicable both with or without parametrized item response theory, and can be applied to a parametrized test to check for the validity and reliability of test items.

Low Point

The more involved and more interesting math isn’t even touched. This wouldn’t be a problem if things were consistent, but they’re not. In early chapters, every step is explicitly laid out, so that the reader can follow an entire analysis from start to finish. In the later chapters, this is replaced with an almost entirely qualitative approach with references to details in other works, often written by at least one of these authors. There are only 149 pages from the first page of chapter 1 to the first page of the first appendix; there was definitely room for more. The authors explain the lack of detail by essentially saying that there’s a lot of math involved. Fine; if it disrupts the flow of the chapter, leave it out of the chapter, but put it in an appendix so the readers can actually apply the contents of the last two chapters of the book! It really felt like, at the end, I could handle the simplest examples, but that the only way to handle the complicated examples would be to download the software package the authors created and sell through their website (which has changed since publication, and needed to be sought out) for 225 Euros or more, depending on the number of licenses purchased. The free demo version mentioned in the text (published in 2002) no longer exists, but given that it’s 8 years old, I suppose I should be pleased that the software still exists at all.

The Scores

The clarity starts out very well in the early chapters, and starts to fade. It’s almost as if the authors decided on a certain level of detail at the beginning, and then gradually backed off when they realized how much work that would be. There’s also an unusual order to things. Scalability coefficients are introduced in chapter three for the first time, and are described as the “standard” scalability coefficients. Chapter three was difficult to read, because I couldn’t reproduce any of their results with these coefficients. Then I hit chapter four, which described exactly how they were calculated (which is not any standard I’ve run across), at which point I could return to chapter three and reproduce all of the results. At no point in chapter three was it mentioned that this information would come in chapter four. There are similar issues with topic ordering throughout, but this was the most egregious example. I give it 3 out of 6.

The structure is well designed in the general sense, with each chapter ending with summarizing discussions, selections for additional reading, and select problems to be solved by the reader, complete with examples. Equations are numbered and cross-referenced, so that part is easy to follow. However, it’s a non-linear subject, and it can get difficult to follow when the authors fail to indicate when a particular branch in the topics will be returned to. By the time I reached chapter six, I realized that if they didn’t tell the readers when they would come back to a subject, then they’d come back to it. If they chose not to explore a particular avenue, this was revealed immediately and consistently. I give it 4 out of 6.

The quantity and quality of the examples was sufficient for a conceptual understanding. To have an understanding at the level that one can actually apply these techniques to any but the most simplistic data set
requires further reading. They do provide a number of sources to go to for this information, many of which were written by the same authors. I give it 3 out of 6.

The exercises are well designed, and can definitely be solved using only the material in the chapters. This is due in part to a shift in focus for the exercises. Early on, exercises involve actual data analysis, while the later chapters include exercises which invite non-mathematical discussion of the most mathematically intense aspects of the topic. Still, they seem to cover anything and everyone one can do with the information provided. I give it 6 out of 6.

The completeness of the text can be argued. On the one hand, the title starts with the words “Introduction to,” which seems to preclude the thought that this might be a comprehensive view of the topic. On the other hand, the areas that were selected for emission felt as though they were designed to drive the readers directly to papers and software created by the same authors. As an educator, this grates severely on me: this doesn’t breed independence in the reader/student, but instead binds the reader/student to the works of Sijtsma and Molenaar. (When a proof was published by someone else, they reproduce it in the text. When a proof was published by either Sijtsma or Molenaar, they cite the paper and leave the proof out.) In the end, it would feel more complete if it covered fewer topics in more detail than in delving forward. I’m not afraid of math by any stretch of the imagination, but the authors seem to assume their readers are, and enable this attitude by keeping the mathematical core out of the advanced topics. I give it 3 out of 6.

The editing was generally good. There were a few errors, but not enough to warrant a new printing. Instead, the book ships (from Amazon.com at least) with a paper insert listing the five known errata, all of which have corrections that would likely seem obvious to any readers who notice them. I give it 5 out of 6.

Overall, I found the second half ultimately unsatisfying. While I will likely apply the scalability coefficients in practice, I doubt I’ll refer to the other chapters often. I would recommend this book to the limited audience who would like rigorous mathematical analysis of survey results, but don’t want to do any of the actual math themselves, instead allowing the software to do that job and merely interpreting the results. Unfortunately, I am not a part of that group, and found only two out of eight chapters useful. I give the book 2 out of 6 overall.

In total, Introduction to Nonparametric Item Response Theory receives 26 out of 42.