15
APPLIED PSYCHOLOGY: AN INTERNATIONAL REVIEW, 1990.39 (1) 89-103 Predictive Validity of School Grades - A Meta-analysis Heinz Schuler, Uwe Funke, and Jutta Baron-Boldt University of Stuttgart-Hohenheim, West Germany Les resultats scolaires servent frequemment de pronostic a de futurs succes academiques ou professionnels, ceci en contradiction avec les doutes emis sur la valeur representative des notes scolaires et plus specialement sur Ieur valeur predictive. Utilisant la methode de la generalisation de validite dtcrite par Schmidt et Hunter, une mkta-analyse est calculke B partir de 63 etudes allemandes, 102 Cchantillons independants et une population complete de 29.422 personnes. La validitt en moyenne comgke des notes obtenues au baccalaurtat pour la prtvision de reussite universitaire est @ = 0.456, celle des notes obtenues A la fin du premier cycle pour la prevision de rkussite professionnelle 0 = 0.408, cette mCthode pouvant rivaliser avzc la validite des meilleures predictions psychologiques en selection de personnel. Concer- nant les etudes sur la reussite lors de I'apprentisage, les seules reserves pourraient venir du cadre des publications et de I'anciennete des etudes montrant une validite plus Clevee des etudes publikes et plus anciennes, que des etudes inedites et plus recentes. School reports are frequently used predictors of further academic or voca- tional training success. This is in contrast to doubts concerning measurement quality of school marks, especially their predictive validity. Using Schmidt and Hunter's method of validity generalisation, a meta-analysis is calculated using 63 German studies with 102 independent samples and a total sample size of 29,422 subjects. The mean corrected validity of final school grades for the prediction of university examinations is 0 = 0.456; for the prediction of vocational training success it is Q = 0.408, matching the validities of the best psychological predictors in personnel selection. The only distinctive modera- tors that could be found were publication form and age of the studies for the vocational group, showing higher validities for published and earlier studies than for unpublished and recent ones. Requests for reprints should be sent to Professor Hcinz Schuler. Universitat Hohenheim, Lehrstuhl fur Psychologie, Postfach 7005 62 (430). 7000 Stuttgart 70. F.R.G. 0 1990 International Association of Applied Psychology

Predictive Validity of School Grades -A Meta-analysis

Embed Size (px)

Citation preview

Page 1: Predictive Validity of School Grades -A Meta-analysis

APPLIED PSYCHOLOGY: AN INTERNATIONAL REVIEW, 1990.39 ( 1 ) 89-103

Predictive Validity of School Grades - A Meta-analysis

Heinz Schuler, Uwe Funke, and Jutta Baron-Boldt University of Stuttgart-Hohenheim, West Germany

Les resultats scolaires servent frequemment de pronostic a de futurs succes academiques ou professionnels, ceci en contradiction avec les doutes emis sur la valeur representative des notes scolaires et plus specialement sur Ieur valeur predictive. Utilisant la methode de la generalisation de validite dtcrite par Schmidt et Hunter, une mkta-analyse est calculke B partir de 63 etudes allemandes, 102 Cchantillons independants et une population complete de 29.422 personnes. La validitt en moyenne comgke des notes obtenues au baccalaurtat pour la prtvision de reussite universitaire est @ = 0.456, celle des notes obtenues A la fin du premier cycle pour la prevision de rkussite professionnelle 0 = 0.408, cette mCthode pouvant rivaliser avzc la validite des meilleures predictions psychologiques en selection de personnel. Concer- nant les etudes sur la reussite lors de I'apprentisage, les seules reserves pourraient venir du cadre des publications et de I'anciennete des etudes montrant une validite plus Clevee des etudes publikes et plus anciennes, que des etudes inedites et plus recentes.

School reports are frequently used predictors of further academic or voca- tional training success. This is in contrast to doubts concerning measurement quality of school marks, especially their predictive validity. Using Schmidt and Hunter's method of validity generalisation, a meta-analysis is calculated using 63 German studies with 102 independent samples and a total sample size of 29,422 subjects. The mean corrected validity of final school grades for the prediction of university examinations is 0 = 0.456; for the prediction of vocational training success i t is Q = 0.408, matching the validities of the best psychological predictors in personnel selection. The only distinctive modera- tors that could be found were publication form and age of the studies for the vocational group, showing higher validities for published and earlier studies than for unpublished and recent ones.

Requests for reprints should be sent to Professor Hcinz Schuler. Universitat Hohenheim, Lehrstuhl fur Psychologie, Postfach 7005 62 (430). 7000 Stuttgart 70. F.R.G.

0 1990 International Association of Applied Psychology

Page 2: Predictive Validity of School Grades -A Meta-analysis

90 SCHULER, FUNKE, BARON-BOLDT

I NTRO D U CTlO N

As background information for the following study a short description of the German university and vocational training admittance procedure is given. University entrance is usually still based entirely on final school grades. Depending on the special subject high school graduates want to study, a national entrance board or universities themselves check the final school reports. Vocational training in Germany is customary for most non-academic jobs and involves a three-year theoretical and practical instruction for a particular vocation. Selection of young people for voca- tional training in firms is based at least partly on the final grades of the secondary or vocational school previously attended.

This prevalent use of school marks contrasts sharply with the fact that doubts are often raised concerning their value as evidence. This applies both to their theoretical measurement quality and their predictive rele- vance (Althoff, 1986; Trost, 1985). For school grades to represent a meaningful and useful variable in admission practices, fulfilment of theor- etical measurement quality criteria must be assumed. However, doubt has already been cast on their objectivity: teachers' opinions are liable to error depending on school subject, previous information about the student, sex of teacher and student, social class of student. In other words, the same performance is judged differently by different teachers. Criticism concern- ing insufficient objectivity implies the lack of reliability as well. There are conceptual problems with stability over time because both characteristics of students' performance as well as demands put upon them which are reflected in the grades change in the course of time. Another aspect of stability is the repeated evaluation of the same performance by the same teacher, which can be very different at different points in time (cf. Birkel, 1978; Heller. Nickel, & Rosemann, 1978; Ingenkamp, 1975, 1977, 1985; Siillwold, 1983; Tent. Fingerhut, & Langfeldt. 1976).

However, it must be remembered that doubts about the lack of theor- etical measurement quality of school grades are mostly founded on investi- gations into individual teacher evaluation, whereas these individual cvalu- ations are never used as predictors: i t is always the composite of these which forms a final subject grade on the report or even the grade point average (Siillwold, 1983) so that one can reckon with certain distortions cancelling each other out. But beyond this one can assume that there are factors (moderators) which cannot be eliminated by aggregation and which influence the comparability of grades and thus their predictive validity.

Accordingly, there has often been criticism concerning insufficient validity. One only needs to apply the assumptions of classical test theory: methods with insufficient objectivity and reliability cannot be very valid. Furthermore. in several empirical studies validities were found which, at

Page 3: Predictive Validity of School Grades -A Meta-analysis

PREDICTIVE VALIDITY OF SCHOOL GRADES 91

most, lie in the medium order of magnitude. For example Althoff (1986) reports for the area of educational success prognosis very low correlations regarding concurrent validity of final school grades (general and vocational school grades were correlated with intelligence and performance test results), and likewise their predictive validity for educational and voca- tional success was low, excepting the mathematics grade. In contrast to this Schuler, Barthel, and Fiinfgelt (1984) found for the same area predictive validities of r = 0.21 to r = 0.67. In the area of academic success Trost and Bickel (1979) obtained, as a median value from 52 correlations between high school average final grades and pre-examinations or exam success respectively in various study subjects, a predictive validity of r = 0.35; individual coefficients were scattered across an area from r = 0.02 to r = 0.53. In comparison, the American College Grade Point Average attains a predictive value of r = 0.55 for academic success prediction (according to Siillwold, 1983). Siillwold explains this on the basis of greater similarity between school and university education in the USA compared to the FRG. But with regard to the prediction of vocational training success, the relationship seems to be the reverse. The Grade Point Aver- age of American high schools usually demonstrates lower predictive val- idity (Hunter & Hunter, 1984: r = 0.30; Reilly & Chao, 1982: r = 0.20). The validity levels for German general and vocational schools is to be explained in this paper.

From a personnel selection point of view there are two unsolved questions: (1) How about the validity of school final grades in the predic- tion of educational, vocational, or academic success? (2) Are there, in view of the great variation in results, factors (moderators) which could be made responsible for this variation in the validity coefficients?

METHOD

Given sufficient studies, meta-analysis offers a method which enables us to resolve these questions. Particularly, validity generalisation developed by Hunter, Schmidt, and Jackson (1982) not only allows for the integration of single study results-this can be accomplished by other meta-analytic methods as well (e.g. Glass, McGraw, & Smith, 1981; Rosenthal, 1984). The validity generalisation method was specifically developed for the integration of correlation coefficients (i.e. validity coefficients) of several single studies. A great advantage of this integration is the summation of the sample size in the respective studies, and thus an increase in test power.

I n addition, and this is probably the most interesting and valuable aspect of this method, i t enables corrections for various statistical artefacts frequently responsible for much of the variation in results. Validity gener-

Page 4: Predictive Validity of School Grades -A Meta-analysis

92 SCHULER, FUNKE, BARON-BOLDT

alisation corrects for sampling error caused by the small sample sizes in the single studies, for imprecise measurement of predictors and criteria, and for range restriction due to selection. In the area of cognitive tests, for example, these four artefacts account for an average of 68% of the variation in the validity coefficients from the single studies (Schmidt & Hunter, 1978). The sampling error is the most important artefact and accounts for a large part of the variance (mostly over 50%; Pearlman, Schmidt, & Hunter, 1980; Schmidt, Gast-Rosenberg, & Hunter, 1980; Schmidt, Hunter, & Caplan, 1981).

Accordingly, further sources of variation which cannot be corrected for statistically cannot be assigned any decisive importance (e.g. problems with the criteria such as contamination and deficiency, copying and com- putation errors, differences in the predictors’ factor structure) and it is unlikely that any moderators will be found. In conclusion it can be stated that validity generalisation achieves two aims: a much more exact estima- tion of the population parameter can be given than is possible from single studies, and it can be determined whether moderators exist or not in a particular validity context.

The computational details of the Schmidt-Hunter validity generalisation method applied in this study are not presented here. Readers interested in these details are referred to the introductory textbook by Hunter et al. (1982).

Of course, the method of validity generalisation has not gone without criticism. An excellent summary of the critique and replies thereto can be found in Schmidt et al. (1985). Here only two particularly interesting points should be emphasised. First, it should be made clear that in this process the generalisability of validity and situation specificity are in no way mutually exclusive. The validity of a personnel selection process can be generalised-namely, if the lower limit of the 90 or 95% confidence interval around the corrected average remains within a reasonable order of magnitude, and furthermore, there can be additional situation specific validity differences if appropriate moderators exist. This means that gener- ally a particular validity is ascribed to this selection process but this can vary according to the specific field to which it is applied. I t is also possible that the area where i t is used can be changed to optimise the validity (see Schmidt et al . , 1985, question 29, also 5 and 27).

A second point of particular interest is that of artefact corrections, in particular those related to artefact distribution. For these corrections i t is critical that overcorrection can occur due to unrealistic correction assump- tions (lower and differing values for the reliability of predictors and criteria. and considerable range restrictions). Therefore, i t is recom- mended. as a safeguard, to be conservative in estimates of correction

Page 5: Predictive Validity of School Grades -A Meta-analysis

PREDICTIVE VALIDITY OF SCHOOL GRADES 93

values, i.e. i t is better to assume high reliabilities and low range restrictions with little variation. Such a conservative procedure is also recommended when determining the correction factors concerning artefact distribution.

In this paper the correction values are a = 1 for the unreliability of the predictors (i.e. not corrected), b = 0.761 for the unreliability of the criteria for academic performance, b = 0.801 for the unreliability of the criteria for performance in apprenticeship and c = 0.981 for the range restriction. With these conservative values a correction ought to be easily within the realm of possibility (see Schmidt et al., 1985, questions 3 , 24, and 31). In general we maintain that validity generalisation stands up to criticism if i t is sensibly applied. The method of validity generalisation is thus a procedure that is very well suited to moving towards an explanation of the problems surrounding the predictive validity of final school grades.

RESULTS

We applied the meta-analysis to individual studies on the predictive validity of final school grades. I t was possible to collect a total of 63 individual studies in Germany from various sources (published in books or journals, unpublished research reports and dissertations, and investiga- tions from industry). These 63 studies consist of a total of k = 119 independent samples. The number of independent samples and the respec- tive validities were determined according to Hunter et al. (1982, Ch. 5). Unfortunately, a few of these studies could not be used for meta-analysis because their results were not cited in an applicable or sufficiently detailed manner and thus were unsuitable for transformation. All these studies belong to the subgroup “prediction of academic success” in which not the examination grade but rather a binary system (pass versus fail, good versus bad results) served as the criterion, giving merely the average values of comparisons of extreme groups, but without results on variance. The sample sizes of usable studies were in the range of n = 11 to n = 4688; the total sample amounted to n = 29,422. The individual studies were divided into two groups. The one, predictors or criteria for success in vocational training based on secondary school (Huupt or Real school) end results, the other for success in university predicted by high school (Gymnasium) final record. These groups were analysed separately as the two educational paths pose, at least to a certain extent, different demands on the students. For the latter, these are of a cognitive theoretical nature, whereas those of the former include many job-oriented practical requirements. Thus one avoids putting predictor and criteria types together which cannot actually be compared with each other.

Page 6: Predictive Validity of School Grades -A Meta-analysis

94 SCHULER, FUNKE, BARON-BOLDT

Analysis of the Whole "Collegiate Success Prediction" Group

Forty-six studies, consisting of k = 75 independent samples and a total sample size of n = 26,867, reported validity coefficients for high school final grades predicting success at university. The median sample size is n = 160 with a range from n = 12 to n = 4688. The mean weighted but uncorrected validity is i = 0.345 with a variance of s' = 0.0078. Assuming that seven very large samples could bias the results by weighting due to sample size, we did the analysis twice, once with and once without those samples. As there was no real difference between those two conditions, we shall report the results including all k = 75 samples.

Subtracting the sampling error variance (s2 = 0.0022, that is 28% of the total variance) from the, itself not very large. total variance (I' = 0.0078). the remaining variance is 2 = 0.0056. The chi' test to determine the significance of this variance as presented by Hunter et al. (1982) is highly significant, i.e. one could expect the existence of moderator variables if this significance. given such a small variance. could not be primarily attributed to the very large total sample; the explanatory value of the sample error is also not very great, which can equally be explained by the theory of sampling error due to the size of the total sample (McDaniel et al., 1986).

We did not correct for the unreliability of the predictors (final grades) because in the actual use of grades for the prediction of training success imprecision in measurement cannot be precluded either. To correct for the unreliability of criteria and for restriction of range we had to follow the procedure of artefact distributions described by Hunter et al. (1982), because not all of the 36 studies contained the relevant data and could not be obtained on request. This resulted in very conservative artefact distribu- tions, i .e . those which allowed one to expect o n l y small correctives with regard to the mean validity and to variance of validities.

As shown in Table 1 , for prediction of success at university it can be stated that the mean corrected validity is 0 = 0.156 with a confidence interval ( P = 95%) from p = 0.317 to p = 0.595 and a variance of a' = 0.005. Together, t he three artefacts account for 64% of the variance. The validity of high school grade averages for predicting academic success is thus generalisable. A strict application of Schmidt and Hunter's 75% rule would mean that there are variables moderating the validity of high school grades for the prediction of academic success. Considering the fact, however, that the use of the 75% rule all too often yields a "false alarm" (Type I error) (Sackett. Harris, & Orr, 1986). here only three of the four artefacts are corrected. also tha t the corrections resulting from the artefact distributions are very conservative; and last but not least considering the small amount of remaining corrected variance (c? = 0.005). a search for

Page 7: Predictive Validity of School Grades -A Meta-analysis

PREDICTIVE VALIDITY OF SCHOOL GRADES 95

TABLE 1 Results of t h e Meta-analysis "Prediction of Success a t University""

n k i, 4 Yo A,",

Total 26,867 Moderator: Subject under study Psychology 1183 Medicine 4677 Philosophy and

Teaching 1298 Economics 1063 Law 808 Mathematin

and Science 3682 Moderator: Type of publication Published studies 6413 Unpublished

studies 6863

75 0.456 0.005

10 0.455 -0.008 18 0.438 0.022

15 0.360 -0.009 4 0.557 0.006 2 0.377 0.026

17 0.446 0.003

37 0.431 0.017

31 0.470 0.005

~

64 0.317 - 0.595

185 0.455 - 0.455 32 0.157 - 0.739

181 0.460 - 0.460 '

67 0.405 - 0.709 21 0.061 - 0.693

78 0.339 - 0.553

42 0.175 - 0.687

70 0.331 - 0.609 -

'For the total sample in comparison to subgroups of different presumed moderators. Key n k e Mean corrected validity. b Corrected variance. % A,, , ,

Sample size in the respective analysis Number of independent samples in the respective analysis

Percentage of total variance accounted for by artefacts Confidence interval wi th P = 95% for 0 .

moderators would probably not prove helpful (Hunter et al., 1982; Mc- Daniel et al., 1986).

Analysis of the Total "Apprenticeship" Group

Fifteen studies consisting of k = 27 independent samples with a total sample size of n = 2555 investigated the validity of grades from secondary vocationally oriented school for the prediction of vocational training success. In general. vocational training success was a composite score of the theoretical and practical standard examinations after the three years' training. The median sample size was n = 74 with a range from n = 30 to n = 222. The mean weighted but uncorrected validity is i = 0.321 with a variance of sz = 0.016. The sampling error variance is sz = 0.009 and accounts for 53% of the total variance; the remaining variance is $ = 0.008. The chi' test mentioned earlier is not significant. Again, there was no correction for the unreliability of predictors, and artefact distribu- tions correcting for unreliability of criteria and restriction of range were likenise very conservative.

Page 8: Predictive Validity of School Grades -A Meta-analysis

96 SCHULER, FUNKE. BARON-BOLDT

Regarding the prediction of vocational training success, Table 2 shows a corrected mean validity of e = 0.408 with a confidence interval ( P = 95%) from Q = 0.234 to e = 0.572 and a variance of b = 0.007. The three artefacts account for 71% of the total variance. Hence, a search for moderators will probably be dispensable in this category, too. Interes- tingly, the corrected variances in both categories (prediction of academic achievement and of vocation training success) are virtually identical.

Following these total group analyses we should once again address the question of the presence of moderators. One can justifiably question the 75% rule and the chi' test and, leaving aside previous results, critically examine the claimed existence of moderators mentioned so repeatedly in the literature. In doing so one must differentiate between true moderators (such as varying educational goals, form of publication, and age of the study) on the one hand, and on the other the different single predictors (particular school subjects) or different sub-criteria (theoretical versus practical performance, preparatory versus main exams) which result from a division of total predictors (average grade) or total criteria (academic or vocational success).

TABLE 2 Resul ts of the Meta-analysis "Prediction of Vocational Training Success""

n 01 k e b A,,,, ______ Total 2555 27 0 408 0007 71 0 244 - 0 572 Moderator Branch of training Public

administration 510 5 0 403 0016 50 0 155 - 0 651 hletal-w or l i n g 1151 13 0409 0003 85 0 302 - 0 516 Electronics 229 3 0 350 0 001 94 0 288 - 0 412

Moderator T ~ p e of publication

L'npublished

Moderator Age of study Earlier studies

Recent studies

Published studies 1856 20 0461 -0oOj 130 0 461 - 0 461

crudies 699 7 0 249 0 033 92 -0 142 - 0256

( u p to 1977) 1569 15 0489 -0013 301 0 489 - 0 489

(since 1978) 986 12 0 259 0 001 94 0 197 - 0 321

'For the total sample in comparison to the subgroups of different presumed moderators Kev

k 3 Mean corrected validity. 0- Corrected variance

n Sample size in the respective analysis. Number of independent samples in the respective analysis

Percentage of roral variance accounted for by artefacts. Confidence internal with P = 95% for p

Page 9: Predictive Validity of School Grades -A Meta-analysis

PREDICTIVE VALIDITY OF SCHOOL GRADES 97

Various Sub-group Analyses

In the analysis of the total number of individual samples according to moderators as well as individual predictors and criteria the following picture emerged in the academic training succesx prognosis (in the follow- ing analyses the seven largest samples were excluded in order 10 avoid the distortions mentioned, because in the subgroups of the analysis the total sample was much smaller than in the total group; see Table 1).

Categorising according to different academic subjects yielded the high- est validities for economics (@ = 0.557) and the lowest for law (@ = 0.377). Published studies brought, on the average, somewhat lower validities (4 = 0.431) than unpublished ones (e = 0.470). Age of study did not influence the validity ( r = -0.074 between year of publication and validity). As shown in Table 3, with the single subjects, mathematics

TABLE 3 Results of the Meta-analysis "Prediction of S u c c e s s a t University""

Total predictor/ criterion 26.867 75 0.456

Single predictors: Single subjects German 4046 17 0.270

French 2341 7 0.278 Latin 2609 10 0.226 Mathematics 4242 18 0.344 Physics 4030 16 0.307 Chemistry 3513 14 0.266 Biology 3444 13 0.193 Geography 2581 10 0.238 History 2719 I I 0.271 Religion 1748 6 0.217 Music 1855 7 0.173 Fine Arts 2151 9 0.143 sports 1512 5 0.069

Single criteria: Pre-examination vs. examination Pre-examination 9950 20 0.436 Examination 5266 30 0.434

English 2400 10 0.212

0.005

0.002 0.009 0.013 0.006 0.01 1 0.007 0.009 0.009 0.003

-0.0002 0.009 0.008 0.003

-0.005

0.010 0 012

64

83 46 34 58 47 57 48 45 70

101 44 46 70

109

43 50

0.317 - 0.595

0.182 - 0.358 0.026 - 0.398 0.054 - 0.501 0.074 - 0.378 0.138 - 0.549 0.143 - 0.471 0.080 - 0.452 0.007 - 0.379 0.131 - 0.345 0.271 - 0.271 0.031 - 0.403

-0.002 - 0.348 0.036 - 0.250 0.069 - 0.069

0.250 - 0.642 0.219 - 0.649

"For correlations between gcneral prcdictors or general criteria in comparison to single predictors and single criteria. Key

k P Mean corrected validity. u- Corrected variance Y'O

A , , , ,

I7 Sample size in rhe respective analysis. Number of independent samples in the respective analysis

Percentage of total variance accounted for by artefacts. Confidence interval with P = 95% for 0 .

Page 10: Predictive Validity of School Grades -A Meta-analysis

98 SCHULER, FUNKE. BARON-BOLDT

ylelded the highest validity (0 = 0.344), and sports the lowest ( i , = 0.069). Pre-examination performance was not predicted better (0 = 0.446) than examination performance (0 = 0.434).

Results for the prediction of vocational training success were similar (see Table 2): categorising according to branches of training resulted in only slight differences. Published studies brought, on the average, higher validi- ties (e = 0.461) than unpublished ones (e = 0.249). Age of the study had a strong influence on validity ( r = -0.578 between year of study and validity, highly significant): earlier studies (1977 and before) yielded higher validities (6 = 0.489) than more recent ones (1978 and later; e = 0.259). Regarding single subjects, mathematics again predicted best (0 = 0.301), and sports worst (@ = 0.033) (see Table 4). Breaking up the total criterion showed (see Table 4) that theoretical training success can be significantly better predicted ( i , = 0.381) than practical training (@ = 0.254).

TABLE 4 Resulrs of the Meta-analysis "Prediction of Vocational Training Success""

Total predictor criterion 2555 27 0 408 0 007

Single predictors Single subjects German 1483 16 0211 0 008 English 673 9 0 154 0 003 Mathematics 1479 I6 0301 0 005 Physics 886 10 0 258 0 014 Chemistry 5'45 8 0 228 0 011 Biology 6 72 9 0 262 -0007 Geography 855 9 0 186 Ooa? History 982 9 0 159 0 ooo1 Spons 808 8 0033 -0012

Single critcna Theorq \ s practice Theoretical

training success 1539 IS 0381 0 012

success 124u 15 0 254 0 019 Practical training

71 0.244 - 0.572

69 0.036 - 0.386 89 0.047 - 0.261 80 0.162 - 0.440 57 -0.026 - 0.490 68 0 022 - 0.434

116 0.262 - 0.262 129 -0.186 - 0.186 100 0.139 - 0.179 353 0.033 - 0.033

58 0.166 - 0.5%

51 -0.016 - 0.524

"For correlations between total predictors and total criteria in comparison to single predictors and single criteria. Ke! n k e Mean corrected validity. + Corrected variance.

-I,,,, Confidence in tena l uith P = 95% for 0.

Sample size in the respective analysis Number of independent samples i n the respective analysis.

0, Percentage of total variance accounted for by artefacts.

Page 11: Predictive Validity of School Grades -A Meta-analysis

PREDICTIVE VALIDITY OF SCHOOL GRADES 99

DISCUSSION

The mean corrected validity of the grammar school average grade for the prediction of success at university is = 0.456 in a magnitude which confirms i t as the best individual predictor for academic success (see Orlik, 1961; Trost, 1985). The corrected mean validity for the prediction of vocational success from final school grades is e = 0.408. These are high validities for a single procedure. Other single procedures seldom reach higher values as every prediction of performance over a long period of time can be hindered by many influential and unpredictable factors. If one does not foster completely unrealistic expectations concerning the level of predictive validity these values can be considered as very satisfactory (see also Schmidt et al., 1985, question 21). It is hardly surprising that uni- versity performance can be predicted better than vocational training success as cognitively more demanding criteria can be predicted better by general measures of intelligence-and such measures are what school grades are, more or less, about.

I t can be assumed with the available corrected variances in the total groups and the variance explanation of 64 or 71% that searching for moderators is probably superfluous. Due to the size of the total sample, the sample error can naturally only explain very little of the variance and likewise the absolute variance size as well as the confidence intervals do not leave much room for moderator effects (McDaniel et a]., 1986), in any case in the academic success prognosis.

With our subgroup analyses we wanted to investigate these arguments. According to the criteria of Hunter et al. (1982), moderator analysis was not very successful because moderators can only be present when (1) the mean validity in the subgroups diverge and (2) the corrected variances in the subgroups are lower than in the whole group. The second condition is, with regard to the moderator subgroups, on the average not fulfilled as the variances in the subgroups tend to be higher than in the whole group.

The only moderators tested that can be considered to be confirmed for the prediction of success in vocational training through use of secondary school final grades are age of the study and form of publication. Older studies report higher validities than more recent ones and published studies report higher validities than unpublished ones. For the “age of study” moderator this could basically mean that with time the validity of final grades as a predictor for vocational training performance has decreased. This moderator analysis was meant as a test for the hypothesised so-called inflation in grades. Such an inflation (with grading not related to perform- ance and with reduced predictor variance) must result in decreasing validity coefficients. Therefore the following observation is of particular interest, namely, differences due to age of the study and to publication

Page 12: Predictive Validity of School Grades -A Meta-analysis

100 SCHULER, FUNKE, BARON-BOLDT

form are partially combined (contingency C = 0.324) as most published studies are of an earlier and unpublished studies of a later date. Certainly this phenomenon is partially caused by the fact that only a small number of the earlier unpublished studies were available. The older samples come from 14 published and 2 unpublished studies. For the more recent samples the relationship is 6 to 5 . In spite of the unequal size of the cells, we undertook a regression analysis on the uncorrected single magnitudes of effect. Contrary to Hunter et al.'s recommendation (1982, p. 140 ff.), here the use of uncorrected effect sizes seemed appropriate because the corres- ponding corrected values were not available for each individual study and because, in this context, correcting through artefact distribution does not seem very sound. The multiple correlation coefficient of the effect size for age of the study and publication form is R = 0.626; the beta weight for age p = -0.467 (significant), and for publication form /? = 0.271 (not signi- ficant). The partial correlations come to r = -0.364 and r = 0.297 respec- tively. Thus the age of the study clearly seems to be the stronger influential factor for the effect size.

However, one should recognise that the strong correlation between effect size and age of the study ( r = -0.578) is not based on a homogenous distribution of data points. Rather there are two groups: the earlier studies from 1966 to 1977 and the recent ones from 1984 to 1987. For the years 1978 to 1983 no studies were to be found. The form of publication has no influence on t h e earlier studies. In more recent ones. however, those that are published have a tendency towards higher corrected mean validity (0 = 0.353) than those that are unpublished (Q = 0.193). This could be attributed to the unpublished studies being more burdened by artefacts, especially with regard to restriction of range. This, however, cannot be clarified any further as the individual studies do not contain the necessary data. Thus, a t ru ly unequivocal conclusion as to whether the value of grades in predicting vocational training performance has decreased (as practitioners have often suspected) is still not possible.

The remaining subgroup variances differ widely in some cases. In the breakdown according to courses, results range from 4 = 0.026 for law to 2 = -0.008 for psychology. That corresponds to a share of 21% and 18.5'6 artefact variance respectively. Negative corrected variances occur when more variance can be explained by artefact than is empirically present. That corresponds then to an amount of artefact variance in the total variance which is greater than 100%. Practically speaking. such values simply mean that all the existing empirical variance can be explained by artefacts, i.e. in all these cases one can set b = 00 and the explanation of variance at lOCl%. But from a theoretical point of view the "percentages of negative variance" are interesting to the extent t h a t they function as an indicator. If the total sample tends to be small in relation to the number of

Page 13: Predictive Validity of School Grades -A Meta-analysis

PREDICTIVE VAL ID IN OF SCHOOL GRADES 101

independent samples then a large sample error occurs in relation to the variance that is empirically present. Such a constellation can lead to negative variances, as for example in the prediction of academic success in the subject “psychology”. On the other hand, in the subgroup “law” the total sample is rather large in relation to the number of samples. Thus the relative sample error tends to be small, thereby not improving the chances for explaining the relative variance. However, the analysis of these sub- groups is burdened by a relatively high-second order sampling error because with only two independent samples there is too little data for a meaningful meta-analysis. The differences between the corrected variances in the remaining subgroups can be explained in a similar manner. The resulting and not inconsiderable differences in the mean validities of the subgroups of the presumed moderators are also probably due to such second order sampling errors: the number of independent samples in the subgroup is, compared to the total number of samples, rather small. Equivalent conditions exist for the size of the samples.

Separate analyses of different individual predictors, that is for individual school subjects, demonstrated a hierarchy of subjects for the prediction of academic success and for vocational training success. In both cases mathematics clearly headed the list. This confirms to some extent the reputation of the mathematics grade as the most valid single grade for prediction (e.g. Althoff, 1986). Furthermore, it is not surprising that in both cases all coefficients for individual predictors were lower than for the total predictors (average grades). Due to aggregation, the average grade fulfils the measurement requirements for a single predictor better than single subject grades (see Siillwold, 1983). For prediction purposes the average grade is more reliable than any particular single grade. During separate analysis for various individual criteria no higher validity arises in the area of academic success for predicting pre-examination success than for predicting exam success, in contradiction to previously widely held opinion on that matter.

In the area of prediction for vocational training success, theoretical educational success can clearly be predicted better from final grades averages than practical achievement can, probably because far more cognitive components are required for success in theoretical training. This is of great importance in so far as in every skilled job the cognitive components will take on ever-increasing importance due to the necessity of lifelong learning and adaptation in our society. The coefficients for the individual criteria are lower than those for the corresponding total criteria. An analogous reason can be given for the relationship between single and total predictors. For further variables which are often assumed to have a moderator effect (e.g. sex, type of school, etc) no analysis was possible as sufficient data or studies were not available.

Page 14: Predictive Validity of School Grades -A Meta-analysis

102 SCHULER, FUNKE, BARON-BOLDT

In conclusion, the mean corrected validity of grades from college track high school for the prediction of success at university is e = 0.356, that for grades from general school to predict vocational training success Q = 0.408. These validities are higher than is usually presumed for school grades. The meta-analysis of German studies on the predictive validity of school grades yields coefficients as high as in recent meta-analysis of the most valid diagnostic instruments (cognitive tests) predicting success at university and in vocational training. The higher comparative values from American studies on prediction of academic success can be explained by the greater similarity between school and university education in the USA compared to in the FRG. However, in the area of vocational training success prediction German school final grades are of higher validity than the American ones. Using a similar argument, this may be due to the fact that the German vocational training system with its dual structure contains parallels to the school system whereas this sort of uniform education system is totally missing in America.

Manuscnpt received July 1988 Revised manuscript received June 1989

REFERENCES' Althoff, K . (1986). Zur Aussagekraft yon Schulzeugnissen im Rahmen der Eignungsdiag-

nostik [Relevance of school grades in personnel selection]. Psychologie und Preris. 30, 77-85,

Birkel. P. (1978). Mundliche Prufungen. Zur Oblekrivirar und Validirar der Leisrungsbeur- reilung [Objectivity and validity of oral examinations]. Bochum: Kamp.

Glass, G. V. , McGraw. B.. & Smith. M. L. (1981). Mera-analysis in social research. Beverly Hills, Calif.: Sage.

Heller. K. . Nickel. H . , & Rosemann. B. (1978). Beurreilen und Beraren. Psychologic in der ErrrehungsH.Ltsenschufi. Bd. IV [Assessment and counselling. Psychology in education. vol. IV] Sturtgan: Klert-Cotta.

Hunter. J E. & Hunter , R . F. (1984). The validity and util i ty of alternalive predictors of job performance. Psychological Bullerin. 96. 78-99.

Hunter , J . E.. Schmidt, F. L., & Jackson. G. B. (1982). Mera-analysis: Curnularing research findings across srudies. Beverly Hills, Calif.: Sage.

Ingenkarnp. K. (1975). Pudagogische Diugnosrik [Pedagogical diagnosis]. Weinheim: Beltz. Ingenkamp. K . (1977). Die Fragwurdigkril der Zemurengebung [The dubious nature of

school grades]. Weinheim: Beltz ( 7 . Auflage). Ingenkamp. K. (1985). Lehrbuch der Pudagogischen Diagnosrik [Textbook of pedagogical

diagnosis]. Weinheim: Beltz. McDaniel. M. A . . Hirsh. H . R . . Schmidt. F. L . . Raju, N. S.. & Hunter, J .

E. (1986). Interpreting the results of rneta-analytic research: A comment on Schrnitt. Gooding. Noe & Kirsch (1984). Personnel Psychology, 3Y. 141-148.

'References and data of rhe single studies included in the rneta-analysis can be obtained from the senior author.

Page 15: Predictive Validity of School Grades -A Meta-analysis

PREDICTIVE VALIDITY OF SCHOOL GRADES 103

Orlik, P. (1961). Ein Beitrag zum Problem der lMetrik und der diagnostischen Valenz schulischer Leistungsbeurteilungen [A contribution to the problem of metncs and diagnostic value of performance assessments in school]. Zeirrchriftfiir Experirnenielle und Angewandie Psychologie, 8 , 400-408.

Pearlman. K. , Schmidt, F. L. . & Hunter. J . E . (1980). Validity generalization results for tests used to predict job proficiency and training success in clerical occupations. Journal O/ Applied P syhology ~ 65 ~ 3 73-406.

Reilly, R. R. & Chao, G. T. (1982). Validity and fairness of some alternative employee selection procedures. Personnel Psychology, 35, 1-62.

Rosenthal. R . (1984). Mera-analysis procedures for social research. Beverly Hills, Calif.: Sage.

Sackett. P. R., Harris, M. M., & Om. J . M. (1986). On seeking moderator variables in the meta-analysis of correlational data: A ,Monte Carlo investigation of statistical power and resistance to Type I error. Journal of Applied Psychology, 71. 302-310.

Schmidt. F. L. . Gast-Rosenberg. I . , & Hunter, J . E . (1980). Validity generalization: Results for computer programmers. Journal of Applied Psychology, 65, 643-661.

Schmidt, F. L. & Hunter, J . E . (1978). Moderator research and the law of small numbers. Personnel Psychology, 31 215-232.

Schmidt, F. L.. Hunter, J . E . . & Caplan, 1. R . (1981). Validity generalization results for two occupations in the petroleum industry Journal of Applied Psychology, 66, 261-273.

Schmidt. F. L.. Pearlman. K. . Hunter. J . E. , Hirsh. H. R., Sackett. P. R . , Tenopyr. M. L.. Schmitt, N . , Kehoe. J . . & Zedeck, S. (1985). Forty questions about validity generaliza- tion and meta-analysis. Commentary on forty questions about validity generalization and meta-analysis. Personnel Psychology, 38, 697-798.

Schuler. H . , Banhel, E. , & Fiinfgelt, V. (1984). Erfolg von Madchen in gewerblich- technischen Ausbildungsberufen: Ein Modellversuch [Girls' success in mechanical- technical apprenticeships]. Psychologie und Praxis, 28, 67-78.

Stillwold. F. (1983). Padagogische Diagnostik [Pedagogical diagnosis]. In K. J . Groffmann & L. Michel (Hrsg.), Inrelligenz- und Leisiungsdiagnosnk. Enzyklopadie der Psychologie, Ed. E I 1 2 [Intelligence and aptitude testing] (pp. 307-386). Gdttingen: Hogrefe.

Tent , L., Fingerhut, W. . & Langfeldt. H.-P. (1976). Quellen des Lehrerurteils [Sources of teachers' assessments]. Weinheim: Beltz.

Trost, G . (1985). Padagogische Diagnostik beim Hochschulzugang [Study entrance examinations]. In R . S . Jager, R. Horn, & K. Ingenkamp (Hrsg.), Tcsrs und Trends 4 (pp. 41-83). Weinheim: Beltz.

Trost, G . & Bickel, G . (1979). Srudierfahigkeit und Studienerfolg [Scholastic aptitudes and academic performance]. Munich: Minerva.