[ Proceedings Contents ] [ Forum 1997 Abstracts ] [ WAIER Home Page ]

Boolean versus continuous variables as tools of measurement

Jonathan Hippisley
Graduate School of Education
University of Western Australia

Instruments yielding only a Boolean result are seldom used in natural science. A botanist, interested in the growth rate of a certain plant, does not venture out into the garden armed only with an unmarked measuring rod (Figure 1). If he did so, his contribution to science might be rather limited. Suppose he recorded the result that the plant was shorter than the measuring rod with a 0 and the result that the plant was longer than the measuring rod with a 1. His record of plant height over time might look like Table 1 (below).

Figure 1: A Boolean measuring rod, seldom used in botany [1]

Table 1: Plant height expressed as a Boolean variable against time


Day123 45678
Height000 01111

Nor does he trek out with a bundle of unmarked rods, in an attempt to convert a collection of Boolean scores into an arbitrary quantum score. In Figure 2, the plant is shorter than six rods, but taller than three, so we could award the plant a score of 3 out of 9, or approximately 33%. Such a score on its own would be quite without meaning, because nothing is said about the length of any of the rods with respect to each other or anything else. All that can be deduced from the score is that six rods are taller than the plant. Tomorrow, if the plant is taller than 5 of the rods, has it grown a lot or a little. Maybe two of the rods were just a little taller than the plant on the previous day. Maybe these two rods are themselves identical in length.

Natural scientists usually design their instruments to yield results on a scale which might not quite be continuous, but whose graduations are both regular and small enough to meet the needs of their study. A botanist will not use an unmarked rod to measure the height of his plants on a Boolean scale; nor will he grab a collection of unmarked rods to measure height on and arbitrary an undefined quantum scale. He may mark one of his rods with a regular scale, and because he is lucky enough to be measuring in a dimension which has been the subject of study for thousands of years, he may even choose to mark his rod with a standard scale, shared by other scientists, so that he can share his results with them

Figure 2: A set of unmarked rods, also seldom used in botany [1]

Yet in education children have for much of the last hundred years been subject to examinations comprising items the result of which can only be expressed on a Boolean scale.

Figure 3: A Boolean measuring rod, applied to children [1]

Item 14 of an arithmetic test might be 29 x 3. Simon might notice that 29 is 30 - 1; he knows that 3 x 3 is 9 and hence that 30 x 3 is 90, and that 3 x 1 is 3, so he takes 3 from 90 and quickly writes down 87 as the answer. David might not notice the shortcut, but works through long hand and still arrives at the right answer. Neil might not have a clue what the answer is, nor how to reach it; so he might look out of the window for a few minutes, and then look over Simon's shoulder to copy the answer from him. All three children put down the right answer and are indistinguishable by the Boolean score of 1 yielded by item 14 in the test.

In Figure 3, two children are measured on a Boolean scale by an unmarked rod. Both children are shorter than the rod, yielding a score of zero on the Boolean scale. The children are not identical in height, but they yield an identical height result on the Boolean scale.

Figure 4: Using a set of rods to create an arbitrary quantum scale [1]

In Figure 4 a collection of unmarked rods has been gathered together to create an arbitrary quantum scale. The scale is arbitrary because no attempt has been made to standardise the incremental difference in length between the rods in the collection. There is a big length gap between some of the rods, a small length gap between other rods, and no length gap between others. As it happens, the heights of both these children fall between the same two rods. So although they are not identical in height, on this arbitrary quantum scale, both children score six marks out of a possible nine.

Real life examiners try to spread the length of their measuring rods over a range broad enough to cover the students in the study. Items with zero or perfect scores are usually excluded from the analysis (Ebel, 1979), and in some (rare) cases attempts are made to place the length of the rods, or more literally, difficulty of the items in a test, on a standard scale (Rasch, 1960).

Nevertheless, the fact remains that the entire examination, comprising, ten fifty or a hundred items, is reduced to a single measure, and since children, unlike plants, do not stand still to be measured on a constant base; since they run around and bend over and jump up and down; any single measure, even of something as easy to define as their height, may not be entirely accurate. In any measurement, even in natural science, there will be errors, and it can be shown mathematically (Sijtsma 1993) that the scale of the errors is reduced as the number of measurements is increased.

Using an entire examination to produce a single measure is not therefore a particularly efficient method of measurement. Rather than using an array of Boolean instruments to produce a single score, the botanist has the right idea when he uses a graduated instrument to produce a measure on a continuous (or sufficiently near to continuous for the purpose of his study) scale. He might then repeat each measure after an hour for increased accuracy, or ask an assistant to repeat each measure for him. If he has several assistants he might ask each one of them to record their own measurements of the plant each day.

How would it be in education if we could construct a device to measure each item in a test on a continuous scale? Every item on a test would then be like a whole exam in its own right. A test with ten items would be like ten exams. In theory we should expect a reduction of errors through averaging, or to use a term used widely in psychometrics, we should expect an increase in the reliability of the test.

Figure 5: Distribution of quantum scores from 50 Boolean items [1]

Figure 5 shows the distribution of quantum or raw scores from 50 Boolean items in an arithmetic test applied to over 400 c hildren from primary schools in Western Australia. The distribution shows that using the botany analogy most of the measuring rods were shorter than the children (within their range of ability) because there were a lot of 1 scores on the items adding up to very high scores on the quantum scale. In other words, although there were 50 rods (items), most of them were redundant, because they were all shorter than the children. This left a handful of about five rods (items) to record the varying heights (abilities) of the children. We would expect such a test to have a low level of reliability, and Cronbach's alpha turns out to be 0.55. If 0.7 is an acceptable level for Cronbach's alpha, 0.55 represents low reliability.

Figure 6: Adding a continuous scale to a Boolean device [1]

Suppose the botanist, unsatisfied with his unmarked measuring rods, finds that his pen will not mark the rods. Perhaps the ball of the pen has seized, or the surface of the rods is too shiny or flaky to accept the markings. But he finds some string and some old tape measures. The tapes themselves are shorter than the plant, but when attached to the rods they create a useful composite measuring device.

Figure 6 shows the same conceptual idea applied to a child. The rods are unmarked, and their lengths are unknown, so for the three rods which are taller than the child, the old Boolean score of zero has to be recorded; but for the other six rods, a variable score from the tapes can be read. If the same set of rods is used to measure a collection of children, not one, but many variable scores will be recorded for each child, thereby increasing the reliability with which the heights of the children may be compared.

Figure 7: Scoring Rates: a composite Boolean/variable measure [1]

Figure 7 shows the effect of appending a variable scale to the Boolean items used to construct Figure 5. The tape measure appended to the unmarked rod, or item, was time. For graphical convenience this was converted into a rate by dividing the time spent on the item into the old Boolean result. That yields a zero for incorrect answers, so for this composite measure items well within the ability range of the students are preferred. Figure 7 was constructed by dividing the total time taken into the quantum score for the test. It is shown to contrast the two distribution curves. Figure 5 shows that most of the items are within the ability range of most of the children. Figure 7 shows that while the students are getting most items right, some do so much more quickly than others. Figure 7 distinguishes between Simon, and David and Neil in the example given above; Figure 5 does not.

By combining time with the Boolean item result, a variable measure is derived from every item answered correctly. Since many items have been answered correctly, the statistical rules of averaging should lead us to predict relatively lower errors, or conversely higher reliability. A calculation of Cronbach's alpha confirms this to be the case. Cronbach's alpha for scoring rate for the same children using the same test as used in the previous calculation was 0.96. This is exceptionally high by the standards of most written tests.

References

Ebel, R. L. (1979). Essentials of Educational Measurement. Prentice Hall, New Jersey.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. University of Chicago Press, Chicago

Sijtsma, K. (1993). Current trends in theories and assessment of intelligence. In Hamers, J. H. M., Sijtsma, K. and Ruijssenaars, J. M. (Eds), Learning Potential Assessment: Theoretical, methodological and practical issues. Swets & Zeitlinger, Amsterdam N.L.

Notes

  1. Figures 1-7 were not available to the HTML editors at the time this file was prepared.

Please cite as: Hippisley, J. (1997). Boolean versus continuous variables as tools of measurement. Proceedings Western Australian Institute for Educational Research Forum 1997. http://www.waier.org.au/forums/1997/hippisley.html


[ Proceedings Contents ] [ Forum 1997 Abstracts ] [ WAIER Home Page ]
Last revision: 1 June 2006. This URL: http://www.waier.org.au/forums/1997/hippisley.html
Previous URL 30 July 2001 to 16 May 2006: http://education.curtin.edu.au/waier/forums/1997/hippisley.html
Previous URL from 12 Aug 1999 to 30 July 2001: http://cleo.murdoch.edu.au/waier/forums/1997/hippisley.html
HTML: Roger Atkinson [rjatkinson@bigpond.com] and Clare McBeath [c.mcbeath@curtin.edu.au]