3
The CKSA ofNovember 1973 contains an article titled 'The influence of socio-economic vari- ables on family size in Wentworth County, On- tario, 1871: a Statistical analysis of historical micro-data' by Frank Denton and Peter George. It presents a multiple regression analysis offam- ily size based on a sample of households from the census of 1871 which attempts to show the relative effects of a number of different vari- ables on the number of children in a household. Unfortunately the analysis is marred by a number of serious technical errors, errors of sufficient seriousness to place all of the results in jeopardy. It is possible that these errors all cancel one another out or that none is large enough to upset the conclusions Denton and George reach. but this can only be determined by reanalysing the data. I will describe the na- ture of my criticisms and in each case suggest remedial action.' THE SAMPLF Denton and George base their analysis of family size on a systematic sample ofthe population of Wentworth County: this sample, however, ex- cludes certain households. They explain 'it was decided at the outset to employ only data for "normal" families, a "normal" family being defined for our purposes as one in which the husband and wife were both present, and the husband was between 20 and 59 years of age, inclusive. "Non-normal" families were not considered' (p. 337). This seems sensible. though it narrows the focus from households to the numbers of children of married couples.2 Unfortunately. their method is likely to produce biased estimates of the numbers of children in a family - for two reasons. First. because older children are likely to move away from home. older parents will have smaller families than those of middle age. Second. differential mor- tality among ethnic groups, and among social classes, will also bias the results since the death of either parent converts the family into a 'non-normal' family and thus excludes it from the sample. The difficulty could be overcome by performing cohort analyses of family size, with the cohorts defined by the age of the wife and perhaps of the husband. This simply in- volves splitting the sample into a number of groups based on the parental ages and then separately analysing the data for each group. Denton and George attempt to deal with this cohort effect by inserting the age ofthe wife and its square and the husband-wife age difference in the regression. All have statistically sig- nificant effects. Indeed they exert by far the most important influence on the number of chil- dren in the family; at the zero-order level the I After I had written a first draft on this note, Professor Michael Katz of the History Department at York drew my attention to a later article on school attendance by Denton and George (1974)which employs a methodology very similar to that used here, and to criticisms of this article by Calhoun (1974)and by himself(Katz, 1974). A number of the criticisms advanced here are similar to those of Calhoun and Katz. Gordon Darroch provided helpful comments on a draft. z We might note that some social historians have turned away from the simple concern with fertility and family size to concentrate on the larger question of household composition (e.g., see Harevan, 1974;Anderson, 1971). In the latter halfof the nineteenth century in Canada large numbers of households included boarders, relatives of the household head, and servants. In fact this seems to have been an important characteristic of the society. Rev. canad. Soc. & Anth./Canad. Rev. Soc. &Anth. 13(2) 1976

Comment/Commentaire

Embed Size (px)

Citation preview

Page 1: Comment/Commentaire

The CKSA ofNovember 1973 contains an article titled 'The influence of socio-economic vari- ables on family size in Wentworth County, On- tario, 1871: a Statistical analysis of historical micro-data' by Frank Denton and Peter George. It presents a multiple regression analysis offam- ily size based on a sample of households from the census of 1871 which attempts to show the relative effects of a number of different vari- ables on the number of children in a household. Unfortunately the analysis is marred by a number of serious technical errors, errors of sufficient seriousness to place all of the results in jeopardy. It is possible that these errors all cancel one another out or that none is large enough to upset the conclusions Denton and George reach. but this can only be determined by reanalysing the data. I will describe the na- ture of my criticisms and in each case suggest remedial action.'

T H E S A M P L F

Denton and George base their analysis of family size on a systematic sample of the population of Wentworth County: this sample, however, ex- cludes certain households. They explain 'it was decided at the outset to employ only data for "normal" families, a "normal" family being defined for our purposes as one in which

the husband and wife were both present, and the husband was between 2 0 and 59 years of age, inclusive. "Non-normal" families were not considered' (p. 337). This seems sensible. though it narrows the focus from households to the numbers of children of married couples.2 Unfortunately. their method is likely to produce biased estimates of the numbers of children in a family - for two reasons. First. because older children are likely to move away from home. older parents will have smaller families than those of middle age. Second. differential mor- tality among ethnic groups, and among social classes, will also bias the results since the death of either parent converts the family into a 'non-normal' family and thus excludes it from the sample. T h e difficulty could be overcome by performing cohort analyses of family size, with the cohorts defined by the age of the wife and perhaps of the husband. This simply in- volves splitting the sample into a number of groups based on the parental ages and then separately analysing the data for each group. Denton and George attempt to deal with this cohort effect by inserting the age of the wife and its square and the husband-wife age difference in the regression. All have statistically sig- nificant effects. Indeed they exert by far the most important influence on the number of chil- dren in the family; at the zero-order level the

I After I had written a first draft on this note, Professor Michael Katz of the History Department at York drew my attention to a later article on school attendance by Denton and George (1974) which employs a methodology very similar to that used here, and to criticisms of this article by Calhoun (1974) and by himself(Katz, 1974). A number of the criticisms advanced here are similar to those of Calhoun and Katz. Gordon Darroch provided helpful comments on a draft. z We might note that some social historians have turned away from the simple concern with fertility and family size to concentrate on the larger question of household composition (e .g . , see Harevan, 1974; Anderson, 1971). In the latter halfof the nineteenth century in Canada large numbers of households included boarders, relatives of the household head, and servants. In fact this seems to have been an important characteristic of the society.

Rev. canad. Soc. & Anth./Canad. Rev. Soc. &Anth. 13(2) 1976

Page 2: Comment/Commentaire

240 / Michael Ornstein

three age variables explain 25.6 per cent of the ~ a r i a t i o n , ~ compared to a total for 39 variables of 32.8 per cent. Unfortunately, there are likely to be strong correlations between age and other variables in the equation and interaction be- tween them. A cohort analysis would make it possible to develop some hypotheses about the life cycle of these families.

The sample is the sum of two systematic samples: a 10 per cent sample from Hamilton city and Dundas town (the two urbanized areas of Wentworth County) and a 2oper cent sample from the two rural districts, Wentworth South and Wentworth North. It is therefore not a self-weighting sample. However, the regression is apparently performed on the unweighred data. This procedure produces biased estimates of the regression coefficients, since the occupa- tional, ethnic, religious, and other variables all have different distributions in the urban and rural areas. Their result overemphasizes the impact of the rural areas.

We can obtain some idea of the nature of the sample of ‘normal’ families by examining the number of families coded, which Denton and George report as 429 for Hamilton city and Dundas town and 671 for the remaining rural areas of Wentworth County. Given the ten and twenty per cent sampling fractions and coding all families, they would get 569 families in Hamilton and Dundas town and 1035 in Went- worth South and North (see Denton and George, I 973: 335-6).

Thus they account for only 75.4 per cent and 64.8 per cent of the total families in the urban and rural areas respectively. Since the propor- tion of ‘n~n~normal ’ families is greater in the rural areas, this difference further accentuates the bias produced by the choice of different urban and rural sampling fractions. In all, 31.5 per cent of the families are excluded because they are ‘non-normal.’

T H E U S E O F D U M M Y V A R I A B L E S T O M E A S U R E B I R T H P L A C E

The birthplace of the husband is measured by five dummy variables which show the differ- ences between men born in Canada and those born in England, Ireland, Scotland, the USA, and in ‘other’ countries. Similarly a set of five

variables is used to measure the birthplace of the wife. The results are rather curious; a hus- band born in Scotland lowers the number of children by 0.2655 children, but a wife born in Scotland has 0.6124 more children (than the base group of women born in Canada)! When both parents are born in the USA, the number of children is increased by 0.2300 for the father and lowered by 0.3979 for the wife! Since the regression model is additive, we should expect a Scottish-born wife and an American-born husband to have 1.5058 more children than a Scottish-born husband with an American-born wife.

This makes no sense. The curious differences of sign (for each of the five pairs of ethnic dummy variables - one for the husband and one for the wife) result from the high intercorrela- tions among the pairs of dummy variables. Naturally enough, Scottish-born men are far more likely to have Scottish-born wives than the men from any other ethnic group. The large intercorrelations produced by endogamous marriage patterns magnify the small differences between the correlations of each of the two dummy variables and the dependent variable, the number of children. A simple illustration will clarify the point. Let us assume: that the correlation between a dummy variable measur- ing whether a wife is born in Country A and the size of her family is 0.2; that the corresponding correlation for the husband is 0. I; that the corre- lation between two place of birth variables is 0.8. Call the variables Ah and Atv and S (for the two birthplace variables for the husband and wife and size of family) and assume they are all normalized with means of zero and variances equal to one. Then the equation predicting size from place of birth is S = 0.33Ah-0. I ~ A , , . The very high correlation among the two variables causes the regression coefficient of one of the variables to become negative!

The difficulty is that when the sets of dummy variables are constructed in this fashion, the additivity assumptions of linear regression no longer hold and some of the regression coefficients take on negative values. The cor- rective action is simple: the wife’s dummy vari- ables can be constructed as at present; then the husband’s variables can be used to iden- tify husbands who are not born in the same

3 MycomputationisfromtheFstatisticfor thisgroup(DentonandGeorge, 1973:340).Actudly the25.6percent also includes a fourth variable, ‘children with different surname.’ However, almost all of the variance is explained by the three age variables.

Page 3: Comment/Commentaire

Comment / Commentaire / 241

place as their wives. All the large correlation coefficients in the matrix will then be eliminated (and incidently the accuracy of all the estimates in the regression will be improved, since the resulting matrix has a much larger determin- ant). It is interesting t o note that for all of the birthplaces except the United States the place of birth is more strongly related to the wife than it is to the husband; for the us, the husband has the stronger effect. This suggests that in this case some other variable like nation of origin exerts the predominant effect since Americans are almost entirely of European origin.

O T H E R D U M b l k V A R I A B L E S

T w o other group\ of variables are constructed in the same way as those measuring the birth- place. Denton and George create ten variables to measure the religion of the husband and wife (five for each) and eight variables to measure their ethnic origins (two groups of four). These variables suffer from the same problem as birthplace, described above.

C O N C L U SI 0 V

The only data presented in the article are for the large regression. We are not even given the mean value or variance of the number of chil- dren in the family. the dependent variable, or of any of the independent variables. Nor are the relationships between the religions, birth- places, and ethnic origins of husbands and wives shown. The omission of these simple de- scriptive statistics makes the data more difficult to understand.

Katz (1974:234) concludes his criticism of a similar article by Denton and George on school attendance ( ~ 9 7 4 ) by arguing that his own cross-tabulation methodology is ‘quite straight- forward, but it does take time, experimentation, and a great deal of thought to create meaningful categories. This is what multiple regression av- oids. Despite its complexity, it is an easy way out; a formula for everything. T h e problem is that the results are not very instructive, that is, if one wants to know quite exactly and con- cretely who it was that did and did not go to

school.’ Katz’s criticism should be directed at the specific work, not at the method. Multiple regression can be used to deal with problems that are completely beyond the reach of tabular analysis, a method which is almost useless when more than three or four variables are emp- loyed because the numbers of cases in the cells become too small to be meaningful, while the number of cells becomes unmanageable. Good regression analysis requires categories that are every bit a s carefully constructed as those in the most meticulous historical cross-tabulations. Historians and other social scientists are now collecting large bodies of quantitive historical data, and the multivariate methods developed by sociologists to deal with contemporary sam- ple surveys will no doubt come to assume an important place in historical research. They must be used with care.

R E F E R E N C E S

Anderson, Michael I971 Family Structure in Nineteenth Century

Lancashire. Cambridge: Cambridge Uni- versity Press

Denton, Frank T . , and Peter J . George 1973 ‘The Influence of socio-economic vari-

ables on family size in Wentworth County, Ontario, 1871: a statistical analysis of historical micro-data.’ Cana- dian Review of Sociology and Anthropol- ogy 10: 334-45

George, Peter J., and Frank T . Denton 1974 ‘Socio-economic influences on school at-

tendance: a study of a Canadian county in 1871 .’ History of Education Quarterly 14: 223-32

Harevan, Tamara K. 1974 ‘The family as process: the historical

study of the life cycle.’ Journal of Social History 7: 322-9

Katz, Michael 1974 ‘Reply’ [to George and Denton, 19741.

Calhoun, Daniel 1974 ‘Letter’ [on Denton and George, 19741.

History of Education Quarterly 14: 233-4

Forthcoming in History of Education Quarterly 14: 545-6