Issues with the WISC-V normative data – low N’s make for shaky norms
The Wechsler Intelligence Scale for Children-Fifth Edition was released recently, in the fall of 2014, and has begun to replace its predecessor, the WISC-IV, as the gold standard in IQ testing. IQ tests have always been surrounded by controversy regarding things like cultural biases, agreeing on a definition for “IQ”, and whether the IQ tests, WISC/WAIS or its few competitors, actually measure what they say they measure, and so on. Unfortunately, when you sit down and take a look at the most recent WISC-V manual, the issues with this test appear to be much more fundamental to the test design and the publisher cutting corners.
The WISC-V normative data is comprised of 33 three month age groups from ages 6 through 16. The normative sample included 2,200 children, spread across these age groups (1). Assuming a completely even distribution this works out to 66.6~ children per age group. This is problematic. Often times I have seen people cite numbers like 30+ or 50+ online when discussing required sample sizes. Fortunately for us, math is our friend and we can calculate a reasonable sample size given the population size, which we can estimate from census data.
The 2012 U.S. census (2) reports 62,260,000 individuals between the ages of 5 and 19. Working out the number per age we come to approximately 45.5 million individuals. Again, assuming roughly even distribution this works out to approximately 1,383,556 children in each 3 month age group within the general population.
There are a number of great tools available online for calculating sample size, if you are familiar with any feel free to plug the numbers into your preferred calculator and check for yourself. I am going to use the top google result for the sake of simplicity.
Given our population size of 1,383,566, a confidence level of 95%, and a confidence interval of 5 we can find that a reasonable N for measurements of our population come out to 384 individuals.
Beyond this initial issue. Lets take a closer look at why there might be good reason to break down the normative groups a bit further than just 3 month age groups.
For starters, gender is an interesting human dividing point. There are reams and reams of research available pointing in enough different directions that typically a person is going to be able to find some research to support their existing point of view regarding sex differences in cognitive functioning and IQ. Personally, I am a fan of the variability explanation of sex differences (3) which generally argues that there is little to no difference in averages between the sexes on a number of measures. However, nature being the cruel mistress that she is, plays fast and lose mutation in the Y chromosome relative to the X chromosome, leading to an increase in variability in various trait performance for men (4). Without getting into what this means in a broader context, at the very least this increased variability, to me, justifies breaking these normative groups down by gender. Something to keep in mind regarding this point, because of the way the psychometrics work out for the WISC/WAIS tests the difference between scores at the extremes, both the highs and the lows, are more greatly impacted by a scoring difference of one or two points, compared to more middle of the road performances.
Looking at things this way cuts our N per group from 66.6~ to 33.3~.
Further, geographic location is another important factor. It is no big secret that there is a large amount of variability regarding performance on reading testing across the states (5). Further, there is some strong evidence for relationships between reading ability and performance on IQ testing as well as with gains in IQ as a child grows and develops (6). If we were to go ahead and break down the normative sampling data by state we drop form 33.3~ to approximately 0.6~ individuals per age, gender, and state group.
Finally, while the above is already getting convoluted enough, you can even go so far as to look at differences in IQ performance between rural and urban environments (7). Which, if included in the breakdown, would drop us down to 0.33~ individuals per group.
Now, what does this mean for the WISC and what does it mean for the future? Probably nothing. The WISC is deeply entrenched in the educational system at this point and is, at least on the east coast, the general standard for testing in general as well as academic admissions testing. In the long run, I think that we need to take a much harder and closer look at our normative data and our sampling procedures and standards in the mental health testing industry. Ideally, I feel that the National Institute of Mental Health should have a normative data division that is focused on collecting extremely comprehensive normative data for the commonly used psychological and neuropsychological testing measures. This type of research initiative would do wonders for improving the quality of testing that is used to determine the futures of many of our young people as well as provide an extremely comprehensive data set for research and analysis.
As always, if you enjoyed please subscribe, share, and upvote if you got here from reddit!