Geek Test Milestone #1 Achieved

I would like to thank everybody who participated in the first part of the Geek Test here at Bureau 42. I got exactly the kind of data I was looking for FAR faster than I’d hoped. I’m excited about where this is going. Complete details are after the break.

I had a few objectives in mind when setting up the questions.

I wanted something short to increase volunteer participation.
I wanted to sort the questions from what I thought would be the easiest to the hardest.
I wanted to represent different categories of geekdom.
I wanted a question so easy that people who got it wrong would almost certainly be submitting garbage data.
I wanted two questions of comparable difficulties but with appeals to completely different categories of geekdom.
I wanted something that would give a large spread of responses and final scores.

I failed on one of those counts: question 4 turned out to be slightly easier than question 3, revealing my own bias as a physics major who has seen little of the classic Doctor Who. Those were the two questions meant to be similar, and they were.

For those who are interested, here is what the analysis found. Overall, the mean percentage correct (from the first 104 responses) was 68.46%. The median average was 70% and the mode average was 60%.

Question 1 was the “honeypot” question. People who cannot provide Clark Kent’s alternate identity likely fall into two categories: people who cannot answer a single test question correctly, and people who are feeding garbage data into the test. Of the five people who answered this question incorrectly, three got 0 on the test, and one missed that question and that question alone. These people enabled me to set up “garbage data” filters that seem to be working fairly well, so they provided more help than they probably realized. Using the standard definitions for single classroom analyses, the difficulty and discrimination of this question were 0.95 and 0.16 respectively. This is an easy question (95% got it correct) with poor discrimination (getting it wrong tells us nothing about the person’s true ability; either the data is garbage, or the person who answered is so far down the scale that this test cannot measure his or her true level of geekiness.)

The second question, about James T. Kirk’s middle name, had a difficulty of 0.88 and a discrimination of 0.47. As a rule of thumb, teachers strive for a discrimination of 0.40-0.60. The discrimination value of a question is used to verify the accuracy of the test results. We can be fairly confident that a student who answers this correctly has some level of geekiness, while those who answer incorrectly are unlikely to get anything but question 1 correct.

The third question, about physicists, was more difficult, and starts to distinguish differences amongst the geeky. It had a difficulty of 0.55 and a discrimination of 0.95. This is an astonishingly high discrimination; the top performers got the question right every time without exception, and the bottom performers usually got it wrong.

The fourth question, about Doctor Who, is quite comparable. The difficulty was 0.57 (slightly easier) and the discrimination was an identical 0.95. These questions together function as I’d hoped: they serve as gateway questions of the “are you a physics/Doctor Who geek” variety. If you are a physics geek, you got #3 right. If you are a Doctor Who geek, you got #4 right.

The fifth question was designed to be the hardest of the set, and it was. I solicited responses from the larryniven-l mailing list, so I expected this to appear easier than it actually was. (I was hoping to generate more data, and I did.) The difficulty level was 0.46, and the discrimination was 0.84. So, physics and Doctor Who geeks were more common overall than Larry Niven geeks amongst those who responded to the test questions.

The final result patterns were as seen in the following chart. The first column is the number of people who provided a given response and columns 2-6 show the individual question responses (where “1” is correct and “2” is incorrect).

Number of results	Q1	Q2	Q3	Q4	Q5
20	1	1	1	1	1
14	1	1	1	1	0
13	1	1	0	1	0
10	1	1	1	0	0
9	1	1	0	0	0
9	1	1	0	1	1
8	1	1	1	0	1
7	1	1	0	0	1
4	1	0	0	0	0
3	0	0	0	0	0
2	1	0	1	1	0
2	1	0	0	0	1
1	1	0	1	0	1
1	0	1	1	1	1
1	0	1	1	1	0

The next stage of the Bureau 42 Geek Test will commence shortly. This test will likely be 20-30 questions from a single domain of geekdom. I ask that people who participate in the testing do the following:

Answer once only, doing the best you can possibly do.
Use the same “unique identifier” string on every subsequent test. If you submit a valid e-mail address, I will forward your overall results to you, but not reveal correct answers to questions answered incorrectly for validity of future results.
Share the link to the test with anyone and everyone who might wish to participate. In order for me to start moving to a computerized adaptive model, I’m going to need a lot of “clean” data.

Thanks again to everyone who participated in part A, and please come back for the later parts!

Number of results	Q1	Q2	Q3	Q4	Q5
20	1	1	1	1	1
14	1	1	1	1	0
13	1	1	0	1	0
10	1	1	1	0	0
9	1	1	0	0	0
9	1	1	0	1	1
8	1	1	1	0	1
7	1	1	0	0	1
4	1	0	0	0	0
3	0	0	0	0	0
2	1	0	1	1	0
2	1	0	0	0	1
1	1	0	1	0	1
1	0	1	1	1	1
1	0	1	1	1	0

Number of results	Q1	Q2	Q3	Q4	Q5
20	1	1	1	1	1
14	1	1	1	1	0
13	1	1	0	1	0
10	1	1	1	0	0
9	1	1	0	0	0
9	1	1	0	1	1
8	1	1	1	0	1
7	1	1	0	0	1
4	1	0	0	0	0
3	0	0	0	0	0
2	1	0	1	1	0
2	1	0	0	0	1
1	1	0	1	0	1
1	0	1	1	1	1
1	0	1	1	1	0

Number of results	Q1	Q2	Q3	Q4	Q5
20	1	1	1	1	1
14	1	1	1	1	0
13	1	1	0	1	0
10	1	1	1	0	0
9	1	1	0	0	0
9	1	1	0	1	1
8	1	1	1	0	1
7	1	1	0	0	1
4	1	0	0	0	0
3	0	0	0	0	0
2	1	0	1	1	0
2	1	0	0	0	1
1	1	0	1	0	1
1	0	1	1	1	1
1	0	1	1	1	0