Int Roe Pi

Embed Size (px)

Citation preview

  • 8/7/2019 Int Roe Pi

    1/153

    A Short Introductionto Epidemiology

    Second Edition

    Neil Pearce

    Occasional Report Series No 2

    Centre for Public Health ResearchMassey University Wellington CampusPrivate Box 756Wellington, New Zealand

    1

  • 8/7/2019 Int Roe Pi

    2/153

    Centre for Public Health ResearchMassey University Wellington CampusPrivate Box 756Wellington, New ZealandPhone: 64-4-3800-606Fax: 64-4-3800-600E-mail: [email protected]: http://www.publichealth.ac.nz/

    Copies of this publication can be purchased in hard copythrough our website (NZ$36.744 incl GST), or downloaded forfree in pdf form from the website.

    2nd edition

    February 2005

    ISBN 0-476-01236-8

    ISSN 1176-1237

    2

  • 8/7/2019 Int Roe Pi

    3/153

    To Irihapeti Ramsden

    3

  • 8/7/2019 Int Roe Pi

    4/153

    4

  • 8/7/2019 Int Roe Pi

    5/153

    Preface

    Who needs another introductoryepidemiology text? Certainly, there aremany introductory epidemiology bookscurrently in print, and many of them areexcellent. Nevertheless, there are fourreasons why I believe that this new textis justified.

    Firstly, it is much shorter than mostintroductory texts, many of which containmore material than is required for a shortintroductory course. This is a shortintroduction to epidemiology, and is notintended to be comprehensive.

    Secondly, I have endeavoured to showclearly how the different basicepidemiologic methods fit together in alogical and systematic manner. Forexample, I attempt to show how the

    different possible study designs relate toeach other, and how they are differentapproaches to a common task. Similarly,I attempt to show how the different studydesign issues (confounding and othertypes of bias) relate to each other, andhow the principles and methods of dataanalysis are consistent across differentstudy designs and data types.

    Thirdly, in this context, rather thanattempt a comprehensive review of

    available methods (e.g. multiple methodsfor estimating confidence intervals for thesummary risk ratio), I have attempted toselect only one standard method for eachapplication, which is reasonably robustand accurate, and which is consistent andcoherent with the other methodspresented in the text.

    Finally, the field of epidemiology ischanging rapidly, not only with regards toits basic methods, but also with regards

    to the hypotheses which these methods

    are used to investigate. In particular, inrecent years there has been a revival inpublic health applications ofepidemiology, not only at the nationallevel, but also at the international level,as epidemiologists tackle global problemssuch as climate change. This text doesnot attempt to review the more complexmeasures used to consider such issues.

    However, it does provide a coherent andsystematic summary of the basicmethods in the field, which can be usedas a logical base for the teaching anddevelopment of research into these morecomplex issues.

    Chapter 1 gives a brief introduction to thefield, with an emphasis on the broadrange of applications and situations inwhich epidemiologic methods have beenused historically, and will continue to be

    used in the future.

    Part 1 then addresses study designoptions. Chapter 2 discusses incidencestudies (including cohort studies) anddescribes the basic study design and thebasic effect measures (i.e. incidence ratesand rate ratios). It then presentsincidence case-control studies as a moreefficient means of obtaining the samefindings. Chapter 3 similarly discusses

    prevalence studies, and prevalence case-control studies. Chapter 4 then considersstudy designs incorporating other axes ofclassification, continuous outcomemeasures (e.g. blood pressure) such ascross-sectional studies and longitudinalstudies, or more complex study designssuch as ecologic and multi-level studies.

    5

  • 8/7/2019 Int Roe Pi

    6/153

    Part 2 then addresses study designissues. Chapter 5 discusses issues ofstudy size and precision. Chapter 6considers general issues of validity,namely selection bias, information bias,and confounding. Chapter 7 discusseseffect modification.

    Part 3 then discusses the practical issuesof conducting a study. Chapter 8addresses issues of measurement ofexposure and disease. Chapters 9-11then discuss the conduct of cohort

    studies, case-control studies and cross-sectional studies respectively.

    Finally, Part 4 considers what happensafter the data are collected, with chapter12 addressing data analysis and chapter13 the interpretation of the findings ofepidemiologic studies.

    I should stress that this book provides nomore than a very preliminary introductionto the field. In doing so I have attempted

    to use a wide range of examples, whichgive some indication of the broad range

    of situations in which epidemiologicmethods can be used. However, there areundoubtedly many other types ofepidemiologic hypotheses andepidemiologic studies which are notrepresented in this book. In particular,my focus is on the use of epidemiology inpublic health, particularly with regard tonon-communicable disease, and I includefew examples from clinical epidemiologyor from communicable disease outbreakinvestigations. Nevertheless, I hope thatthe book will be of interest not only to

    epidemiologists, but also to others whohave other training but are involved inepidemiologic research, including publichealth professionals, policy makers, andclinical researchers.

    Neil Pearce

    Centre for Public Health ResearchMassey University Wellington CampusPrivate Box 756

    Wellington, New Zealand

    Acknowledgements

    During the writing of this text, my salary

    was funded by the Health ResearchCouncil of New Zealand. I wish to thankSander Greenland and Jonny Myers fortheir comments on the draft manuscript.I also wish to thank Massey Universityfor support for my research programme.

    6

  • 8/7/2019 Int Roe Pi

    7/153

    A Short Introduction to Epidemiology

    Contents

    1. Introduction 9

    Germs and miasmas 10

    Risk factor epidemiology 11

    Epidemiology in the 21st century 12

    PART 1: STUDY DESIGN OPTIONS

    2. Incidence studies 21

    Incidence studies 22

    Incidence case-control studies 28

    3. Prevalence studies 33

    Prevalence studies 33

    Prevalence case-control studies 38

    4. More complex study designs 41

    Other axes of classification 41

    Continuous outcome measures 42

    Ecologic and multilevel studies 47

    PART 2: STUDY DESIGN I SSUES

    5. Precision 59

    Basic statistics 60

    Study size and power 61

    6. Validity 67

    Confounding 67

    Selection bias 73

    Information bias 74

    7. Effect modification 83

    Concepts of interaction 83

    Additive and multiplicative models88

    Joint effects 89

    PART 3: CONDUCTING A STUDY

    8. Measurement of exposure andhealth status 95 Exposure 95

    Health status 102

    9. Cohort studies 109 Defining the source population andrisk period 109

    Measuring exposure 112

    Follow-up 113

    10.Case-control s tudies 117

    Defining the source population andrisk period 117

    Selection of cases 118

    Selection of controls 119

    Measuring exposure 122

    11.Prevalence studies 125 Defining the source population 125 Measuring health status 126 Measuring exposure 128

    PART 4: ANALYSIS ANDINTERPRETATION OF STUDIES

    12.Data analysis 133 Basic principles 133 Basic analyses 136 Controlling for confounding 140

    13.Interpretation 145 Appraisal of a single study 145

    Appraisal of all of the available

    evidence 148

    7

  • 8/7/2019 Int Roe Pi

    8/153

    8

  • 8/7/2019 Int Roe Pi

    9/153

    CHAPTER 1. Introduction

    (In: Pearce N. A Short Introduction to Epidemiology. 2nd ed. Wellington, CPHR, 2005)

    Public health is primarily concerned withthe prevention of disease in humanpopulation. It differs from clinicalmedicine both in its emphasis onprevention rather than treatment, and inits focus on populations rather thanindividual patients (table 1.1).Epidemiology is the branch of publichealth which attempts to discover thecauses of disease in order to makedisease prevention possible.Epidemiological methods can be used inother contexts (particularly in clinicalresearch), but this short introductory textfocuses on the use of epidemiology inpublic health, i.e. on its use as part of thewider process of discovering the causesof disease and preventing its occurrencein human populations.

    In this context, epidemiology has beendefined as (Last, 1988):

    "the study of the distribution anddeterminants of health-related states orevents in specified populations, and theapplication of this study to control ofhealth problems"

    This broad definition could in theoryinclude a broad range of research

    methodologies including qualitativeresearch and quantitative randomisedcontrolled trials. Some epidemiologists

    recognise the complementary nature ofthe former (McKinlay, 1993), and sometexts include the latter in their definitionof epidemiology. However, the keyfeature of epidemiological studies is thatthey are quantitative (rather thanqualitative) observational (rather thanexperimental) studies of the determinantsof disease in human populations (ratherthan individuals). This will be my focushere, while recognising the value, andcomplementary nature, of other researchmethodologies. The observationalapproach is a major strength ofepidemiology as it enables a study to beconducted in a situation where arandomized trial would be unethical orimpractical (because of the largenumbers of subjects required). It is also

    the main limitation of epidemiologicalstudies in that the lack of randomizationmeans that the groups being comparedmay differ with respect to various causesof disease (other than the main exposureunder investigation). Thus,epidemiological studies, in general,experience the same potential problemsas randomized controlled trials, but maysuffer additional problems of bias becauseexposure has not been randomlyallocated and there may be differences in

    baseline disease risk between thepopulations being compared.

    Table 1.1

    The defining features of public health: populations and prevention

    Prevention Treatment----------------------------------------------------------------------

    Populations Public health Health systems research

    Individuals Primary health care/ Medicine (including primary health care)Health education

    9

  • 8/7/2019 Int Roe Pi

    10/153

    1.1 Germs and Miasmas

    Epidemiology is as old as public healthitself, and it is not difficult to findepidemiological observations made byphysicians dating back to Hippocrateswho observed that:

    Whoever wishes to investigatemedicine properly should proceed thus:

    in the first place to consider theseasons of the year, and what effectseach of them produces when onecomes into a city in which he is astranger, he should consider itssituation, how it lies as to the windsand the rising of the sunOne shouldconsider most attentively the waterswhich the inhabitants useand theground and the mode in which theinhabitants live, and what are theirpursuits, whether they are fond of

    drinking and eating to excess, andgiven to indolence, or are fond ofexercise and labor. (Hippocrates,1938; quoted in Hennekens and Buring,1987)

    Many other examples of epidemiologicalreasoning were published through the

    ages. However, epidemiology wasfounded as an independent discipline in anumber of Western countries in parallelwith the industrial revolution of the 19thcentury. In Anglophone countries it isconsidered to have been founded by thework of Chadwick, Engels, Snow andothers who exposed the appalling social

    conditions during the industrialrevolution, and the work of Farr andothers who revealed majorsocioeconomic differences in disease inthe 19th century. At that time,epidemiology was generally regarded asa branch of public health and focused onthe causes and prevention of disease inpopulations, in comparison with theclinical sciences which were branches ofmedicine and focussed on diseasepathology and treatment of disease in

    individuals. Thus, the emphasis was onthe prevention of disease and the healthneeds of the population as a whole. Inthis context, the fundamentalimportance of population-level factors(the urban environment, housing,socioeconomic factors, etc) was clearlyacknowledged (Terris, 1987).

    Table 1.2

    Deaths and death rates from cholera in London 1854 in households supplied by theSouthwark and Vauxhall Water Company and by the Lambeth Water Company

    DeathsCholera per 10,000

    Houses deaths houses ------------------------------------------------------------------------------------------------

    Southwark and Vauxhall 40,046 1,263 315

    Lambeth Company 26,107 98 37

    Rest of London 256,423 1,422 59

    ------------------------------------------------------------------------------------------------Source: (Snow, 1936; quoted in Winkelstein, 1995)

    10

  • 8/7/2019 Int Roe Pi

    11/153

    Perhaps the most commonly quotedepidemiologic legend is that of Snow whostudied the causes of cholera in Londonin the mid-19th century (Winkelstein,1995). Snow was able to establish thatthe cholera death rate was much higherin areas supplied by the Southwark andVauxhall Company which took waterfrom the Thames downstream fromLondon (i.e. after it had beencontaminated with sewerage) than inareas supplied by the Lambeth Companywhich took water from upstream, with

    the death rates being intermediate inareas served by both companies.Subsequently, Snow (1936) studied thearea supplied by both companies, andwithin this area walked the streets todetermine for each house in which acholera death had occurred, whichcompany supplied the water. The deathrate was almost ten times as high inhouses supplied with water containingsewerage (table 1.2).

    Although epidemiologists and otherresearchers continue to battle overSnows legacy and its implications forepidemiology today (Cameron and Jones,

    1983; Loomis and Wing, 1991; Samet,2000; Vandenbroucke, 1994), it is clearthat Snow was able to discover, andestablish convincing proof for, the modeof transmission of cholera, and to takepreventive action several decades beforethe biological basis of his observationswas understood. Thus, it was not untilseveral decades after the work of Snowthat Pasteur and others established therole of the transmission of specificpathogens in what became known as the

    infectious diseases, and it was another

    century, in most instances, beforeeffective vaccines or antibiotictreatments became available.Nevertheless, a dramatic decline inmortality from these diseases occurredfrom the mid-nineteenth century longbefore the development of modernpharmaceuticals. This has beenattributed to improvements in nutrition,sanitation, and general living conditions(McKeown, 1979) although it has beenargued that specific public health

    interventions on factors such as urbancongestion actually played the major role(Szreter, 1988).

    1.2 Risk Factor Epidemiology

    This decline in the importance ofcommunicable disease was accompaniedby an increase in morbidity and mortalityfrom non-communicable diseases suchas heart disease, cancer, diabetes, andrespiratory disease. This led to majordevelopments in the theory and practiceof epidemiology, particularly in thesecond half of the 20th century. Therehas been a particular emphasis on

    aspects of individual lifestyle (diet,exercise, etc) and in the last decade the

    human genome project has seen anaccelerated interest in the role of geneticfactors (Beaty and Khoury, 2000).

    Thus, epidemiology became widelyrecognized with the establishment of thelink between tobacco smoking as a causeof lung cancer in the early 1950's (Dolland Hill, 1950; Wynder and Graham,1950), although this association had

    already been established in Germany inthe 1930s (Schairer and Schninger,

    11

  • 8/7/2019 Int Roe Pi

    12/153

    2001). Subsequent decades have seenmajor discoveries relating to othercauses of chronic disease such asasbestos, ionizing radiation, viruses,diet, outdoor air pollution, indoor airpollution, water pollution, and geneticfactors. These epidemiologic successeshave in some cases led to successfulpreventive interventions without theneed for major social or political change.For example, occupational carcinogenscan, with some difficulty, be controlledthrough regulatory measures, and

    exposures to known occupationalcarcinogens have been reduced inindustrialized countries in recentdecades. Another example is thesuccessful World Health Organisation(WHO) campaign against smallpox. Morerecently, some countries have passedlegislation to restrict advertising oftobacco and smoking in public placesand have adopted health promotionprogrammes aimed at changes in"lifestyle".

    Individual lifestyle factors would ideallybe investigated using a randomisedcontrolled trial, but this is often unethicalor impractical (e.g. tobacco smoking).Thus, it is necessary to do observationalstudies and epidemiology has mademajor contributions to the understandingof the role of individual lifestyle factorsand health. Because such factors wouldideally be investigated in randomisedcontrolled trials, and in fact would be

    ideally suited to such trials if it were not

    for the ethical and practical constraints,epidemiologic theory and practice has,quite appropriately, been based on thetheory and practice of randomised trials.Thus, the aim of an epidemiologic studyinvestigating the effect of a specific riskfactor (e.g. smoking) on a particulardisease (e.g. lung cancer) is intended toobtain the same findings that would havebeen obtained from a randomisedcontrolled trial. Of course, anepidemiologic study will usuallyexperience more problems of bias than a

    randomised controlled trial, but therandomised trial is the gold standard.

    This approach has led to majordevelopments in epidemiologic theory(presented most elegantly andcomprehensively in Rothman andGreenland, 1998). In particular, therehave been major developments in thetheory of cohort studies (which mimic arandomised trial, but without therandomisation) and case-control studies

    (which attempt to obtain the samefindings as a full cohort study, but in amore efficient manner). It is these basicmethods, which follow a randomisedcontrolled trial paradigm, which receivemost of the attention in this shortintroductory text. However, whilepresenting these basic methods, it isimportant to also recognise theirlimitations, and to also consider differentor more complex methods that may bemore appropriate when epidemiology is

    used in the public health context.

    1.3 Epidemiology in the 21st Century

    In particular, in the last decade therehas been increasing concern expressed

    about the limitations of the risk factorapproach, and considerable debate about

    the future direction of epidemiology(Saracci, 1999). In particular, it has

    been argued that there has been anoveremphasis on aspects of individual

    12

  • 8/7/2019 Int Roe Pi

    13/153

    lifestyle, and little attention paid to thepopulation-level determinants of health(Susser and Susser, 1996a, 1996b;Pearce, 1996; McMichael, 1999).Furthermore, the success of risk factorepidemiology has been more temporaryand more limited than might have beenexpected. For example, the limitedsuccess of legislative measures inindustrialised countries has led thetobacco industry to shift its promotionalactivities to developing countries so thatmore people are exposed to tobacco

    smoke than ever before (Barry, 1991;Tominaga, 1986). Similar shifts haveoccurred for some occupationalcarcinogens (Pearce et al, 1994). Thus,on a global basis the "achievement" ofthe public health movement has oftenbeen to move public health problemsfrom rich countries to poor countries andfrom rich to poor populations within theindustrialized countries.

    It should be acknowledged that not all

    epidemiologists share these concerns(e.g. Savitz, 1994; Rothman et al, 1998;Poole and Rothman, 1998), and somehave regarded these discussions as anattack on the field itself, rather than asan attempt to broaden its vision.Nevertheless, the debate has progressedand there is an increasing recognition ofthe importance of taking a more globalapproach to epidemiologic research andof the importance of maintaining anappropriate balance and interaction

    between macro-level (population),individual-level (e.g. lifestyle), andmicro-level (e.g. genetic) research(Pearce, 2004).

    There are three crucial concepts whichhave received increasing attention in thisregard.

    The Importance of Context

    The first, and most important issue, isthe need to consider the populationcontext when conducting epidemiologic

    studies. Even if one is focusing onindividual lifestyle risk factors, there isgood reason to conduct studies at thepopulation level (Rose, 1992). Moreover,every population has its own history,culture, and economic and socialdivisions which influence how and whypeople are exposed to specific riskfactors, and how they respond to suchexposures. For example, New Zealand(Aotearoa) was colonised by GreatBritain more than 150 years ago,resulting in major loss of life by the

    indigenous people (the Mori). It iscommonly assumed that this loss of lifeoccurred primarily due to the arrival ofinfectious diseases to which Mori had nonatural immunity. However, a morecareful analysis of the history ofcolonisation throughout the Pacificreveals that the indigenous peoplemainly suffered major mortality fromimported infectious diseases when theirland was taken (Kunitz, 1994), thusdisrupting their economic base, food

    supply and social networks. Thisexample is not merely of historicalinterest, since it these same infectiousdiseases that have returned in strengthin Eastern Europe in the last decade,after lying dormant for nearly a century(Bobak and Marmot, 1996). Similarly,the effects of occupational carcinogensmay be greater in developing countrieswhere workers may be relatively youngor may be affected by malnutrition orother diseases (Pearce et al, 1994).

    These issues are likely to become moreimportant because, not only isepidemiology changing, but the worldthat epidemiologists study is also rapidlychanging. We are seeing the effects ofeconomic globalization, structuraladjustment (Pearce et al, 1994) andclimate change (McMichael, 1993, 1995),and the last few decades have seen theoccurrence of the informationalrevolution which is having effects asgreat as the previous agricultural andindustrial revolutions (Castells, 1996).

    13

  • 8/7/2019 Int Roe Pi

    14/153

    In industrialized countries, this is likelyto prolong life expectancy for some,but not all, sections of the population.In developing countries, the benefitshave been even more mixed (Pearce etal, 1994), while the countries ofEastern Europe are experiencing thelargest sudden drop in life expectancythat has been observed in peacetimein recorded human history (Bobackand Marmot, 1996) with a major risein alcoholism and forgotten diseasessuch as tuberculosis and cholera.

    This increased interest in population-level determinants of health has beenparticularly marked by increasedinterest in techniques such asmultilevel modelling which allowindividual lifestyle risk factors to beconsidered in context and in parallelwith macro-level determinants ofhealth (Greenland, 2000). Such a shiftin approach is important, not onlybecause of the need to emphasize the

    role of diversity and local knowledge(Kunitz, 1994), but also because of themore general moves within science toconsider macro-level systems andprocesses (Cohen and Stewart, 1994)rather than taking a solely reductionistapproach (Pearce, 1996).

    Problem-Based Epidemiology

    A second issue is that a problem-basedapproach may be particularly valuable

    in encouraging epidemiologists tofocus on the major public healthproblems and to take the populationcontext into account (Pearce, 2001;Thacker and Buffington, 2001). Aproblem-based approach to teachingclinical medicine has been increasinglyadopted in medical schools around theworld. The value of this approach isthat theories and methods are taughtin the context of solving real-lifeproblems. Starting with the problemat the population level provides a

    reality check on existing etiological

    theories and identifies the major publichealth problems which new theoriesmust be able to explain. A fruitfulresearch process can then begenerated with positive interactionbetween epidemiologists and otherresearchers. Studying real publichealth problems in their historical andsocial context does not excludelearning about sophisticated methodsof study design and data analysis (infact, it necessitates it), but it may helpto ensure that the appropriate

    questions are asked (Pearce, 1999).

    Appropriate Technology

    A related issue is the need to useappropriate technology to addressthe most important public healthresearch questions. In particular, asattention moves upstream to thepopulation level (McKinlay, 1993) newmethods will need to be developed(McMichael, 1995). One example of

    this, noted above, is the recent rise ininterest in multilevel modelling(Blakely and Woodward, 2000; Pearce,2000), although it is important tostress that it is an increase in

    multilevel thinking in thedevelopment of epidemiologichypotheses and the design of studiesthat is required, rather than just theuse of new statistical techniques ofdata analysis. The appropriateness ofany research methodology depends on

    the phenomenon under study: itsmagnitude, the setting, the currentstate of theory and knowledge, theavailability of valid measurement tools,and the proposed uses of theinformation to be gathered, as well asthe community resources and skillsavailable and the prevailing norms andvalues at the national, regional or locallevel (Pearce and McKinlay, 1998).Thus, there has been increasedinterest in the interface betweenepidemiology and social science(Krieger, 2000), and in the

    14

  • 8/7/2019 Int Roe Pi

    15/153

    development of theoretical andmethodological frameworksappropriate for epidemiologic studiesin developing countries (Barreto et al,2001; Barreto, 2004; Loewenson,2004), and in indigenous people in

    Western countries (Durie, 2004). As

    noted above, this short introductorytext focuses on the most basicepidemiologic methods, but I attemptto refer to more complex issues, andthe potential use of more complexmethods, where this is appropriate.

    Summary

    Public health is primarily concerned withthe prevention of disease in humanpopulations, and epidemiology is thebranch of public health which attemptsto discover the causes of disease inorder to make disease preventionpossible. It thus differs from clinicalmedicine both in its emphasis onprevention (rather than treatment) andin its focus on populations (rather thanindividual patients). Thus, theepidemiological approach to a particulardisease is intended to identify high-risksubgroups within the population, todetermine the causes of such excessrisks, and to determine the effectivenessof subsequent preventive measures.Although the epidemiological approachhas been used for more than a centuryfor the study of communicable diseases,epidemiology has considerably grown in

    scope and sophistication in the last fewdecades as it has been increasingly

    applied to the study of non-communicable diseases. At the beginningof the 21st century, the field ofepidemiology is changing rapidly, notonly with regards to its basic methods,but also with regards to the hypotheseswhich these methods are used toinvestigate. In particular, in recent yearsthere has been a revival in public healthapplications of epidemiology, not only atthe national level, but also at theinternational level, as epidemiologiststackle global problems such as climatechange. This text does not attempt toreview the more complex methods usedto study such issues. However, it doesprovide a coherent and systematicsummary of the basic methods in thefield, which can be used as a logical basefor the teaching and development ofresearch into these more complex

    issues.

    15

  • 8/7/2019 Int Roe Pi

    16/153

    References

    Barreto ML (2004). The globalization ofepidemiology: critical thoughts fromLatin America. Int J Epidemiol 33:1132-7.

    Barreto ML, Almeida-Filho N, Breihl J(2001). Epidemiology is more thandiscourse: critical thoughts from Latin

    America. J Epidemiol Comm Health55: 158-9.

    Barry M (1991). The influence of theU.S. tobacco industry on the health,economy, and environment ofdeveloping countries. New Engl J Med324: 917-20.

    Beaty TH, Khoury MJ (2000). Interface ofgenetics and epidemiology.Epidemiologic Reviews 22: 120-5.

    Blakeley T, Woodward AJ (2000).

    Ecological effects in multi-levelstudies. J Epidemiol Comm Health 54:367-74.

    Bobak M, Marmot M (1996). East-Westmortality divide and its potentialexplanations: proposed researchagenda. Br Med J 312: 421-5.

    Cameron D, Jones IG (1983). JohnSnow, the Broad Street pump andmodern epidemiology. Int J Epidemiol12: 393-6.

    Castells M (1996). The information age:Economy, society and culture. Vol 1.The rise of the network society.Oxford: Blackwell.

    Cohen J, Stewart I (1994). The collapseof chaos: discovering simplicity in acomplex world. London: Penguin.

    Doll R, Hill AB (1950). Smoking andcarcinoma of the lung. Br Med J 2:739-48.

    Durie M (2004). Understanding healthand illness: research at the interface

    between science and indigenousknowledge. Int J Epidemiol 33: 1138-43.

    Greenland S (2000). Principles ofmultilevel modelling. Int J Epidemiol29: 158-67.

    Hennekens CH, Buring JE (1987).

    Epidemiology in medicine. Boston:Little, Brown.

    Hippocrates (1938). On airs, waters andplaces. Med Classics 3: 19.

    Krieger N (2000). Epidemiology andsocial sciences: towards a criticalreengagement in the 21st century.Epidemiologic Reviews 22: 155-63.

    Kunitz S (1994). Disease and socialdiversity. New York: Oxford UniversityPress.

    Last JM (ed) (1988). A dictionary ofepidemiology. New York: OxfordUniversity Press.

    Loewenson R (2004). Epidemiology inthe era of globalization: skills transferor new skills? Int J Epidemiol 33:1144-50.

    Loomis D, Wing S (1991). Is molecularepidemiology a germ theory for theend of the twentieth century? Int J

    Epidemiol 19: 1-3.McMichael AJ (1993). Planetary

    overload: global environmentalchange and the health of the humanspecies. Cambridge: CambridgeUniversity Press.

    McMichael AJ (1995). The health ofpersons, populations, and planets:epidemiology comes full circle.Epidemiol 6: 633-5.

    McMichael AJ (1999). Prisoners of the

    proximate: loosening the constraints

    16

  • 8/7/2019 Int Roe Pi

    17/153

    on epidemiology in an age of change.Am J Epidemiol 149: 887-97.

    McKeown T (1979). The role of medicine.Princeton, NJ: Princeton UniversityPress.

    McKinlay JB (1993). The promotion ofhealth through planned sociopoliticalchange: challenges for research andpolicy. Soc Sci Med 36: 109-17.

    Pearce N (1996). Traditionalepidemiology, modern epidemiology,

    and public health. AJPH 86: 678-83.Pearce N (1999). Epidemiology as a

    population science. Int J Epidemiol28: S1015-8.

    Pearce N (2000). The ecologic fallacystrikes back. J Epidemiol CommHealth 54: 326-7.

    Pearce N (2001). The future ofepidemiology: a problem-basedapproach using evidence-basedmethods. Australasian Epidemiologist

    8.1: 3-7.Pearce N (2004). The globalization of

    epidemiology: introductory remarks.Int J Epidemiol 33: 1127-31.

    Pearce N, McKinlay J (1998). Back to thefuture in epidemiology and publichealth. J Clin Epidemiol 51: 643-6.

    Pearce NE, Matos E, Vainio H, Boffetta P,Kogevinas M (eds) (1994).Occupational cancer in developingcountries. Lyon: IARC.

    Poole C, Rothman KJ (1998). Ourconscientious objection to theepidemiology wars. J Epidemiol CommHealth 52: 613-4.

    Rose G. The strategy of preventivemedicine. Oxford: Oxford UniversityPress, 1992.

    Rothman KJ, Greenland S (1998).Modern epidemiology. 2nd ed.Philadelphia: Lippincott-Raven.

    Rothman KJ, Adami H-O, Trichopolous(1998). Should the mission of

    epidemiology include the readicationof poverty? Lancet 352: 810-3.

    Samet JM (2000). Epidemiology andpolicy: the pump handle meets thenew millennium. EpidemiologicReviews 22: 145-54.

    Saracci R (1999). Epidemiology inprogress: thoughts, tensions andtargets. Int J Epidemiol 28: S997-9.

    Savitz DA (1994). In defense of blackbox epidemiology. Epidemiology 5:

    550-2.Schairer E, Schninger E (2001). Lung

    cancer and tobacco consumption. Int JEpidemiol 30: 24-7.

    Snow J (1936). On the mode ofcommunication of cholera. (Reprint).New York: The Commonwealth Fund,pp 11-39.

    Susser M, Susser E (1996a). Choosing afuture for epidemiology: I. Eras andparadigms. Am J Publ Health 86: 668-

    73.Susser M, Susser E (1996b). Choosing a

    future for epidemiology: II. Fromblack boxes to Chinese boxes. Am JPubl Health 86: 674-8.

    Szreter S (1988). The importance ofsocial intervention in Briatain'smortality decline c.1850-1914: areinterpretation of the role of publichealth. Soc Hist Med 1: 1-37.

    Terris M (1987). Epidemiology and the

    public health movement. J Publ HealthPolicy 7: 315-29.

    Thacker SB, Buffington J (2001). Appliedepidemiology for the 21st century. IntJ Epidemiol 30: 320-5.

    Tominaga S (1986). Spread of smokingto the developing countries. In:Zaridze D, Peto R (eds). Tobacco: amajor international health hazard.Lyon: IARC, pp 125-33.

    17

  • 8/7/2019 Int Roe Pi

    18/153

    Vandenbroucke JP (1994). New publichealth and old rhetoric. Br Med J 308:994-5.

    Winkelstein W (1995). A new perspectiveon John Snows communicable diseasetheory. Am J Epidemiol 142: S3-9.

    Wynder EL, Graham EA (1950). Tobaccosmoking as a possible etiologic factorin bronchiogenic carcinoma. J AmStatist Assoc 143: 329-38.

    18

  • 8/7/2019 Int Roe Pi

    19/153

    Part I

    Study Design Options

    19

  • 8/7/2019 Int Roe Pi

    20/153

    20

  • 8/7/2019 Int Roe Pi

    21/153

    CHAPTER 2. Incidence Studies

    (In: Pearce N. A Short Introduction to Epidemiology. 2nd ed. Wellington, CPHR, 2005)

    In this chapter and the next one I reviewthe possible study designs for the simplesituation where individuals are exposed toa particular risk factor (e.g. a particularchemical) and when a dichotomousoutcome is under study (e.g. being alive ordead, or having or not having a particulardisease). Thus, the aim is to estimate theeffect of a (dichotomous) exposure on theoccurrence of a (dichotomous) diseaseoutcome or health state.

    It should first be emphasized that allepidemiologic studies are (or should be)based on a particular source population(also called the study population or basepopulation) followed over a particular riskperiod. Within this framework afundamental distinction is between studies

    of disease incidence (i.e. the number ofnew cases of disease over time) andstudies of disease prevalence (i.e. thenumber of people with the disease at aparticular point in time). Studies involvingdichotomous outcomes can then beclassified according to two questions:

    a.Are we studying studying incidence orprevalence?;

    b. Is there sampling on the basis ofoutcome?

    The responses to these two questions yieldfour basic types of epidemiologic studies(Morgenstern and Thomas, 1993; Pearce,1998):

    1. Incidence studies2. Incidence case-control studies3.Prevalence studies4.Prevalence case-control studies

    These four study types represent cells in atwo-way cross-classification (table 2.1).Such studies may be conducted to describethe occurrence of disease (e.g. to estimatethe burden of diabetes in the communityby conducting a prevalence survey), or toestimate the effect of a particular exposureon disease (e.g. to estimate whether theincidence new cases of diabetes is greater

    in people with a high fat diet than inpeople with a low fat diet) in order to findout how we can prevent the diseaseoccurring. In the latter situation we arecomparing the occurrence of disease in an

    exposed group with that in a non-exposed group, and we are estimating theeffect of exposure on the occurrence of thedisease, while controlling for other knowncauses of the disease.

    Table 2.1

    The four basic study types in studies involving a dichotomous health outcome

    Sampling on outcome------------------------------------------------------------No Yes------------------------------------------------------------

    Study Incidence Incidence studies Incidence case-control studiesoutcome

    Prevalence Prevalence studies Prevalence case-control studies------------------------------------------------------------

    21

  • 8/7/2019 Int Roe Pi

    22/153

    Thus, we might conclude that lungcancer is five times more common inasbestos workers than in other workers,even after we have controlled fordifferences in age, gender, andsmoking. In some instances we mayhave multiple categories of exposure(high, medium, low) or individualexposure scores, but we will start withthe simple situation in which individualsare classified as exposed or non-exposed.

    In this chapter I discuss incidencestudies, and in the following chapter I

    consider prevalence studies. In chapter4, I then consider studies involving morecomplex measurements of health status(e.g. continuous lung function or bloodpressure measurements) and morecomplex study designs (ecologic andmultilevel studies). As noted in chapter1, the latter situation is perhaps thenorm, rather than the exception, whenconducting studies in the public healthcontext. However, for logical andpractical reasons I will first address thesimpler situation of a dichotomous

    exposure (in individuals) and adichotomous health outcome measure.

    2.1 Incidence Studies

    The most comprehensive approach

    involves collecting data on theexperience of the entire sourcepopulation over the risk period in orderto estimate disease incidence (thedevelopment of a disease for the firsttime) or mortality (i.e. death which is aparticular type of incidence measure).Figure 2.1 shows the experience of asource population in which all personsare followed from a particular date. Forsimplicity, I will initially assume that thesource population is confined to persons

    born in a particular year, i.e. a birthcohort. In the hypothetical study shownin figure 2.1, the outcome under study isthe "event" of developing a particulardisease. However, the concept ofincidence applies equally to studies ofother health events, such ashospitalisation or death. The key featureof incidence studies is that they involvean event (e.g. developing a disease forthe first time) which occurs at aparticular point in time, rather than astate (e.g. having a disease) which canexist over an extended period of time.

    In the hypothetical study shown in figure

    2.1, people enter the study when theyare born, and some of themsubsequently develop disease. Of these,some subsequently "lose" their disease(although they may "regain" it at a laterdate), and some have the condition alltheir lives; some persons die from thedisease under study, but most eventuallydie from another cause. However, theinformation is "censored" since the studycannot last indefinitely; i.e. follow-upstops by a particular age, at which time

    some members of the study populationhave died, and some have been lost tofollow-up for other reasons (e.g.emigration). For example, several peoplein figure 2.1 were censored beforefollow-up finished, either because theydied of the disease we were studying (ifwe were studying the incidence ofdisease, rather than deaths, they wouldbe censored as soon as they developedthe disease), they died of somethingelse, or because they were lost tofollow-up. Each person only contributes

    person-time to the study until they are

    22

  • 8/7/2019 Int Roe Pi

    23/153

    censored, and after that we stopcounting them. This approach is followedbecause we may not get a faircomparison between the exposed andthe non-exposed groups if they havebeen followed for different lengths oftime, e.g. if one group has many morepeople lost to follow-up than the othergroup.

    However, the person-time approachwould be necessary even if no-one waslost to follow up and both groups were

    followed for the same length of time. Forexample, consider a cohort study of1,000 exposed and 1,000 non-exposedpeople in which no-one was lost tofollow-up and everyone was followeduntil they died. Assume also that theexposure causes some deaths so theexposed group, on the average, died at a

    younger age than the non-exposedgroup. If we only calculated thepercentage of people who died, then itwould be 100% in both groups, and wewould see no difference. However, if wetake into account the person-timecontributed by each group, then itbecomes clear that both groups had thesame number of deaths (1,000), but thatin the exposed group these deathsoccurred earlier and the person-timecontributed was therefore lower. Thus,the average age at death would be lower

    in the exposed group; to say the samething another way, the death rate(deaths divided by person-years) wouldbe higher. To see this, we need toconsider not only how many people werein each group, but how much person-time they contributed, i.e. how long theywere followed for.

    Figure 2 .1

    Occurrence of disease in a hypothetical population followed from birth

    Birth End of Follow up

    death from disease under study

    other death

    lost to follow up

    at risk

    disease symptoms

    severe symptoms

    23

  • 8/7/2019 Int Roe Pi

    24/153

    Example 2 .1

    Martinez et al (1995)studied 1246 newbornsin the Tucson, Arizonaarea enrolled betweenMay 1980 and October1984. Parents werecontacted shortly afterthe children were born,and completed aquestionnaire about theirhistory or respiratory

    illness, smoking habits,and education. Furtherparental questionnaires

    were completed duringthe childs second yearof life and again at sixyears. At the age of sixyears, 51.5% of thechildren had neverwheezed, 19.9% hadhad at least one lowerrespiratory tract illnesswith wheezing during thefirst three years of life

    but had no wheezing atsix years, 15.0% had nowheezing before the age

    of three years but hadwheezing at six years,and 13.7% hadwheezing both beforethree years of age andat six years. The authorsconcluded that themajority of infants withwheezing have transientconditions and do nothave increased risks of

    asthma or allergies laterin life.

    In some circumstances, a study mightbe conducted to study the "naturalhistory" of a disease (e.g. diabetes). Insuch clinical epidemiology studies,the population (denominator) under

    study comprises people who alreadyhave a particular disease or condition,and the goal is to ascertain whichfactors affect the disease prognosis.More typically, one might be interestedin a particular hypothesis aboutdeveloping disease, such as "a highcholesterol diet increases the risk ofdeveloping ischaemic heart disease".In this situation, the population understudy comprises healthy individualsand we are interested in factors that

    determine who develops the diseaseunder study (and who doesnt). Thedata generated by such an incidencestudy involve comparing exposedand non-exposed groups and aresimilar to that generated by arandomised controlled trial, exceptthat dietary exposure has not beenrandomly allocated.

    Incidence studies ideally measureexposures, confounders and outcometimes on all population members.When the source population has been

    formally defined and enumerated (e.g.a group of workers exposed to aparticular chemical) then the studymay be termed a cohort study orfollow-up study (Rothman and

    Greenland, 1998) and the formerterminology will be used here.Incidence studies also include studieswhere the source population has beendefined but a cohort has not beenformally enumerated by theinvestigator. Perhaps the mostcommon examples are descriptivestudies, e.g. of national death rates. Infact, as Rothman and Greenland(1998) note, no qualitative distinctiondistinguishes descriptive variables

    from the variables that are studied inanalytic studies of risk factors. Thus,the distinction between descriptiveincidence studies and analyticincidence studies is at best only adistinction based on data source (e.g.obtaining information from routinerecords rather than collecting theinformation specifically for the study).

    Similarly, there is no fundamentaldistinction between incidence studiesbased on a broad population (e.g. allworkers at a particular factory, or all

    24

  • 8/7/2019 Int Roe Pi

    25/153

    persons living in a particulargeographical area) and incidencestudies involving sampling on the basisof exposure, since the latter proceduremerely redefines the source population(cohort) (Miettinen, 1985).

    Measures of Disease Occurrence

    I will briefly review the basic measuresof disease occurrence that are used inincidence studies, using the notationdepicted in table 2.2 which shows the

    findings of a hypothetical incidencestudy of 20,000 persons followed for10 years (statistical analyses usingthese measures are discussed furtherin chapter 12).

    Three measures of disease incidenceare commonly used in incidencestudies.

    Perhaps the most common measure ofdisease occurrence is the person-time

    incidence rate (or hazard rate, force ofmortality or incidence density(Miettinen, 1985)) which is a measureof the disease occurrence per unitpopulation time, and has the reciprocalof time as its dimension. In thisexample (table 2.2), there were 952cases of disease diagnosed in the non-exposed group during the ten years offollow-up, which involved a total of95,163 person-years; this is less thanthe total possible person-time of

    100,000 person-years since peoplewho developed the disease before theend of the ten-year period were nolonger at risk of developing it, andstopped contributing person-years atthat time (for simplicity I have ignoredthe problem of people whose diseasedisappears and then reoccurs overtime, and I have assumed that we arestudying the incidence of the firstoccurrence of disease). Thus, theincidence rate in the non-exposedgroup (b/Y

    0

    ) was 952/95,163 =

    0.0100 (or 1000 per 100,000 person-years).

    A second measure of diseaseoccurrence is the incidence proportionor average risk which is the proportionof people who experience the outcomeof interest at any time during thefollow-up period (the incidenceproportion is often called thecumulative incidence, but the latterterm is also used to refer tocumulative hazards (Breslow and Day,

    1987)). Since it is a proportion it isdimensionless, but it is necessary tospecify the time period over which it isbeing measured. In this instance,there were 952 incident cases amongthe 10,000 people in the non-exposedgroup, and the incidence proportion(b/N

    0) was therefore 952/10,000 =

    0.0952 over the ten year follow-upperiod. When the outcome of interestis rare over the follow-up period (e.g.an incidence proportion of less than

    10%), then the incidence proportion isapproximately equal to the incidencerate multiplied by the length of timethat the population has been followed(in the example, this product is 0.1000whereas the incidence proportion is0.0952). I have assumed, forsimplicity, that no-one or was lost tofollow-up during the study period (andtherefore stopped contributing person-years to the study). However, as notedabove when this assumption is not

    valid (i.e. when a significant proportionof people have died or have been lostto follow-up), then the incidenceproportion cannot be estimateddirectly, but must be estimatedindirectly from the incidence rate(which takes into account that follow-up was not complete) or from lifetables (which stratify on follow-uptime).

    25

  • 8/7/2019 Int Roe Pi

    26/153

    A third possible measure of diseaseoccurrence is the incidence odds(Greenland, 1987) which is the ratio ofthe number of people who experiencethe outcome (b) to the number ofpeople who do not experience theoutcome (d). As for the incidenceproportion, the incidence odds isdimensionless, but it is necessary tospecify the time period over which it isbeing measured. In this example, theincidence odds (b/d) is 952/9,048 =0.1052. When the outcome is rare

    over the follow-up period then theincidence odds is approximately equalto the incidence proportion. Onceagain, if loss to follow-up is significant,then the incidence odds cannot beestimated directly, but must be

    estimated indirectly from the incidencerate (via the incidence proportion, orvia life-table methods). The incidenceodds is not very interesting or usefulas a measure of disease occurrence,but it is presented here because theincidence odds is used to calculate theincidence odds ratio which is estimatedin certain case-control studies (seebelow).

    These three measures of diseaseoccurrence all involve the same

    numerator: the number of incidentcases of disease (b). They differ inwhether their denominators representperson-years at risk (Y

    0), persons at

    risk (N0), or survivors (d).

    Table 2.2

    Findings from a hypothetical cohort study of 20,000 persons followed for 10 years

    Exposed Non-exposed Ratio

    ------------------------------------------------------------------------------------------------Cases 1,813 (a) 952 (b)Non-cases 8,187 (c) 9,048 (d)------------------------------------------------------------------------------------------------Initial population size 10,000 (N1) 10,000 (N0)------------------------------------------------------------------------------------------------Person-years 90,635 (Y1) 95,163 (Y0)------------------------------------------------------------------------------------------------Incidence rate 0.0200 (I1) 0.0100 (I0) 2.00Incidence proportion 0.1813 (R1) 0.0952 (R0) 1.90

    (average risk)Incidence odds 0.2214 (O1) 0.1052 (O0) 2.11

    26

  • 8/7/2019 Int Roe Pi

    27/153

    Measures of Effect in Incidence

    Studies

    Corresponding to these three measuresof disease occurrence, there are threeprincipal ratio measures of effect whichcan be used in incidence studies. Themeasure of interest is often the rateratio (incidence density ratio), the ratioof the incidence rate in the exposedgroup (a/Y

    1) to that in the non-exposed

    group (b/Y0). In the example in table

    2.2, the incidence rates are 0.02 perperson-year in the exposed group and0.01 per person-year in the non-exposedgroup, and the rate ratio is therefore2.00.

    A second commonly used effect measureis the risk ratio (incidence proportionratio or cumulative incidence ratio) whichis the ratio of the incidence proportion inthe exposed group (a/N1) to that in thenon-exposed group (b/N0). In this

    example, the risk ratio is 0.1813/0.0952= 1.90. When the outcome is rare overthe follow-up period the risk ratio isapproximately equal to the rate ratio.

    A third possible effect measure is theincidence odds ratio which is the ratio ofthe incidence odds in the exposed group(a/c) to that in the non-exposed group(b/d). In this example the odds ratio is0.2214/0.1052 = 2.11. When theoutcome is rare over the study period

    the incidence odds ratio is approximatelyequal to the incidence rate ratio.

    These three multiplicative effectmeasures are sometimes referred tounder the generic term of relative risk.Each involves the ratio of a measure ofdisease occurrence in the exposed group

    to that in the non-exposed group. Thevarious measures of disease occurrenceall involve the same numerators(incident cases), but differ in whethertheir denominators are based on person-years, persons, or survivors (people whodo not develop the disease at any timeduring the follow-up period). They are allapproximately equal when the disease israre during the follow-up period (e.g. anincidence proportion of less than 10%).However, the odds ratio has beenseverely criticised as an effect measure

    (Greenland, 1987; Miettinen and Cook,1981), and has little intrinsic meaning inincidence studies, but it is presentedhere because it is the standard effectmeasure in incidence case-controlstudies (see below).

    Finally, it should be noted that ananalogous approach can be used tocalculate measures of effect based ondifferences rather than ratios, inparticular the rate difference and the risk

    difference. Ratio measures are usually ofgreater interest in etiologic research,because they have more convenientstatistical properties, and it is easier toassess the strength of effect and thepossible role of various sources of biaswhen using ratio measures (Cornfield etal, 1951). Thus, I will concentrate on theuse of ratio measures in the remainderof this text. However, other measures(e.g. risk difference, attributablefraction) may be of value in certain

    circumstances, such as evaluating thepublic health impact of a particularexposure, and I encourage readers toconsult standard texts for acomprehensive review of these measures(e.g. Rothman and Greenland, 1998).

    27

  • 8/7/2019 Int Roe Pi

    28/153

    2.2. Incidence Case-Control Studies

    Incidence studies are the mostcomprehensive approach to studying thecauses of disease, since they use all ofthe information about the sourcepopulation over the risk period.However, they are very expensive interms of time and resources. Forexample, the hypothetical study

    presented in table 2.2 would involveenrolling 20,000 people and collectingexposure information (on both past andpresent exposure) for all of them. Thesame findings can be obtained moreefficiently by using a case-controldesign.

    An incidence case-control study involvesstudying all (or a sample) of the incidentcases of the disease that occurred in thesource population over the risk period,

    and a control group sampled from the

    same population over the same period(the possible methods of samplingcontrols are described below).

    Table 2.3shows the data from ahypothetical case-control study, whichinvolved studying all of the 2,765incident cases which would have been

    identified in the full incidence study, anda sample of 2,765 controls (one for eachcase). Such a case-control study wouldachieve the same findings as the fullincidence study, but would be muchmore efficient, since it would involveascertaining the exposure histories of5,530 people (2,765 cases and 2,765controls) rather than 20,000. When theoutcome under study is very rare, aneven more remarkable gain in efficiencycan be achieved with very little reduction

    in the precision of the effect estimate.

    Table 2.3

    Findings from a hypothetical incidence case-control study based on the cohort in table 2.2

    Exposed Non-exposed Odds Ratio-----------------------------------------------------------------------------------------------------Cases 1,813 (a) 952 (b)

    Controls: from survivors(cumulative sampling) 1,313 (c) 1,452 (d) 2.11from source population

    (case-cohort sampling) 1,383 (c) 1,383 (d) 1.90from person-years

    (density sampling) 1,349 (c) 1,416 (d) 2.00

    ------------------------------------------------------------------------------------------------------------------

    28

  • 8/7/2019 Int Roe Pi

    29/153

    Measures of Effect in IncidenceCase-Control Studies

    In case-control studies, the relative riskis estimated using the odds ratio.

    Suppose that a case-control study isconducted in the study population shownin table 2.2; such a study might involveall of the 2,765 incident cases and agroup of 2,765 controls (table 2.3). Theeffect measure which the odds ratioobtained from this case-control study will

    estimate depends on the manner inwhich controls are selected. Once again,there are three main options (Miettinen,1985; Pearce, 1993; Rothman andGreenland, 1998).

    One option, called cumulative (orcumulative incidence) sampling, is toselect controls from those who do notexperience the outcome during thefollow-up period, i.e. the survivors(those who did not develop the disease

    at any time during the follow-up period).In this instance, the ratio of exposed tonon-exposed controls will estimate theexposure odds (c/d = 8178/9048 =1313/1452) of the survivors, and theodds ratio obtained in the case-controlstudy will therefore estimate theincidence odds ratio in the sourcepopulation over the study period (2.11).Early presentations of the case-controlapproach usually assumed this context(Cornfield, 1951), and it was emphasised

    that the odds ratio was approximatelyequal to the risk ratio when the diseasewas rare.

    It was later recognised that controls canbe sampled from the entire sourcepopulation (those at risk at thebeginning of follow-up), rather than justfrom the survivors (those at risk at theend of follow-up). This approach whichwas previously used by Thomas (1972)and Kupper et al (1975), has more

    recently been termed case-cohortsampling (Prentice, 1986), or case-basesampling (Miettinen, 1982). In thisinstance, the ratio of exposed to non-exposed controls will estimate theexposure odds in the source populationof persons at risk at the start of follow-up (N1/N0 = 10000/10000 =1383/1383), and the odds ratio obtainedin the case-control study will thereforeestimate the risk ratio in the sourcepopulation over the study period (1.90).In this instance the method of calculation

    of the odds ratio is the same as for anyother case-control study, but minorchanges are needed in the standardmethods for calculating confidenceintervals and p-values to take intoaccount that some cases may also beselected as controls (Greenland, 1986).

    The third approach is to select controlslongitudinally throughout the course ofthe study (Sheehe, 1962; Miettinen,1976); this is sometimes described as

    risk-set sampling (Robins et al, 1986),sampling from the study base (theperson-time experience) (Miettinen,1985), or density sampling (Kleinbaumet al, 1982). In this instance, the ratio ofexposed to non-exposed controls willestimate the exposure odds in theperson-time (Y1/Y0 = 90635/95613 =1349/1416), and the odds ratio obtainedin the case-control study will thereforeestimate the rate ratio in the studypopulation over the study period (2.00).

    Case-control studies have traditionallybeen presented in terms of cumulativesampling (e.g. Cornfield, 1951), butmost case-control studies actuallyinvolve density sampling (Miettinen,1976), often with matching on a timevariable such as calendar time or age,and therefore estimate the rate ratiowithout the need for any rare diseaseassumption (Sheehe, 1962; Miettinen,1976; Greenland and Thomas, 1982).

    29

  • 8/7/2019 Int Roe Pi

    30/153

    Example 2 .2

    Gustavsson et al (2001)studied the risk ofmyocardial infarctionfrom occupationalexposure to motorexhaust, othercombustion products,organic solvents, lead,and dynamite. Theyidentified first-time,nonfatal myocardialinfarctions among menand women aged 45-70

    years in StockholmCounty from 1992-1994.They selected controlsfrom the generalpopulation living in thesame County during thesame period (i.e. densitymatching), matched forsex, age, year, andhospital catchment area.The odds ratio(estimating the rateratio) of myocardial

    infarction was 2.11(95% CI 1.23-3.60)among those highlyexposed occupationally,and 1.42 (95% CI 1.05-1.92) in thosemoderately exposed,compared with personsnot occupationallyexposed to combustionproducts from organicmaterial.

    Summary

    When a dichotomous outcome is understudy (e.g. being alive or dead, orhaving or not having a disease) afundamental distinction is betweenstudies of incidence and studies ofprevalence. Thus, four main types ofstudies can be identified: incidencestudies, incidence case-control studies,prevalence studies, and prevalence case-control studies (Morgenstern andThomas, 1993; Pearce, 1998). Thesevarious study types differ according towhether they involve incidence orprevalence data and whether or not theyinvolve sampling on the basis of theoutcome under study. Incidence studiesinvolve collecting and analysing data on

    the exposure and disease experience ofthe entire source population. They mayresemble randomized trials, but theymay involve additional problems ofconfounding because exposure has notbeen randomly assigned. The otherpotential study designs all involvesampling from the source population,and therefore may include additionalbiases arising from the sampling process(chapter 6). In particular, incidencecase-control studies involve sampling onthe basis of outcome, i.e. they usuallyinvolve all incident cases generated bythe source population and a controlgroup (of non-cases) sampled at randomfrom the source population.

    30

  • 8/7/2019 Int Roe Pi

    31/153

    References

    Breslow NE, Day NE (1987). Statisticalmethods in cancer research. Vol II:The analysis of cohort studies. Lyon,France: IARC.

    Checkoway H, Pearce N, Kriebel D(2004). Research methods inoccupational epidemiology. 2nd ed.New York: Oxford University Press.

    Cornfield J (1951). A method ofestimating comparative rates fromclinical data: applications to cancer ofthe lung, breast and cervix. JNCI 11:1269-75.

    Greenland S (1986). Adjustment of riskratios in case-base studies (hybridepidemiologic designs). Stat Med 5:579-84.

    Greenland S (1987). Interpretation and

    choice of effect measures inepidemiologic analyses. Am JEpidemiol 125: 761-8.

    Greenland S, Thomas DC (1982). On theneed for the rare disease assumptionin case-control studies. Am JEpidemiol 116: 547-53.

    Gustavsson P, Plato N, Hallqvist J, et al(2001). A population-based case-referent study of myocardial infarctionand occupational exposure to motorexhaust, other combustion products,organic solvents, lead and dynamite.Epidemiol 12: 222-8.

    Kleinbaum DG, Kupper LL, MorgensternH (1982). Epidemiologic research.Principles and quantitative methods.Belmont, CA: Lifetime LearningPublications.

    Kupper LL, McMichael AJ, Spirtas R(1975). A hybrid epidemiologic design

    useful in estimating relative risk. J AmStat Assoc 70:524-8.

    Martinez FD, Wright AJ, Taussig LM, et al(1995). Asthma and wheezing in thefirst six years of life. New Engl J Med332: 133-8.

    Miettinen OS (1976). Estimability and

    estimation in case-referent studies.Am J Epidemiol 103: 226-35.

    Miettinen OS, Cook EF (1981).Confounding: essence and detection.Am J Epidemiol 114: 593-603.

    Miettinen O (1982). Design options inepidemiologic research: an update.Scand J Work Environ Health 8(suppl1): 7-14.

    Miettinen OS (1985). Theoreticalepidemiology. New York: Wiley.

    Morgenstern H, Thomas D (1993).Principles of study design inenvironmental epidemiology. EnvironHealth Perspectives 101: S23-S38.

    Pearce N (1993). What does the oddsratio estimate in a case-control study?Int J Epidemiol 22: 1189-92.

    Pearce N (1998). The four basicepidemiologic study types. JEpidemiol Biostat 3: 171-7.

    Prentice RL (1986). A case-cohort designfor epidemiologic cohort studies anddisease prevention trials. Biometrika73: 1-11.

    Robins JM, Breslow NE, Greenland S(1986). Estimation of the Mantel-Haenszel variance consistent withboth sparse-data and large-stratalimiting models. Biometrics 42: 311-23.

    31

  • 8/7/2019 Int Roe Pi

    32/153

    Rothman KJ, Greenland S (1998).Modern epidemiology. 2nd ed.Philadelphia: Lippincott-Raven.

    Sheehe PR (1962). Dynamic risk analysisof matched pair studies of disease.Biometrics 18: 323-41.

    Thomas DB (1972). The relationship oforal contraceptives to cervicalcarcinogenesis. Obstet Gynecol 40:508-18.

    32

  • 8/7/2019 Int Roe Pi

    33/153

    CHAPTER 3. Prevalence Studies

    (In: Pearce N. A Short Introduction to Epidemiology. 2nd ed. Wellington, CPHR, 2005)

    Incidence studies are ideal for studyingevents such as mortality or cancerincidence, since they involve collectingand analysing all of the relevantinformation on the source population andwe can get better information on when

    exposure and disease occurred.However, incidence studies involvelengthy periods of follow-up and largeresources, in terms of both time andfunding, and it may be difficult toidentify incident cases of non-fatalchronic conditions such as diabetes.Thus, in some settings (e.g. somedeveloping countries) and/or for some

    conditions (e.g. chronic non-fataldisease) prevalence studies are the onlyoption. Furthermore, in some instanceswe may be more interested in factorswhich affect the current burden ofdisease in the population. Consequently,

    although incidence studies are usualpreferable, there is also an importantrole for prevalence studies, both forpractical reasons, and because suchstudies enable the assessment of thelevel of morbidity and the population

    disease burden for a non-fatalcondition.

    3.1. Prevalence Studies

    The term prevalence denotes thenumber of cases of the disease understudy existing in the source population ata particular time. This can be defined aspoint prevalence estimated at one pointin time, or period prevalence which

    denotes the number of cases thatexisted during some time interval (e.g.one year).

    The prevalence is a proportion, and thestatistical methods for calculating aconfidence interval for the prevalenceare identical to those presented abovefor calculating a confidence interval forthe incidence proportion (chapter 12).

    In some instances, the aim of aprevalence study may simply be tocompare the disease prevalence among

    a specific population with that in othercommunities or countries. This may bedone, for example, in order to discoverdifferences in disease prevalence and tothus suggest possible risk factors for thedisease. These further studies may

    involve testing specific hypotheses bycomparing prevalence in subgroups ofpeople who have or have not beenexposed to a particular risk factor (e.g.as passive smoking) in the past.

    Prevalence studies often represent aconsiderable saving in resourcescompared with incidence studies, since itis only necessary to evaluate diseaseprevalence at one point in time, ratherthan continually searching for incidentcases over an extended period of time.On the other hand, this gain in efficiency

    33

  • 8/7/2019 Int Roe Pi

    34/153

    is achieved at the cost of greater risk ofbiased inferences, since it may be muchmore difficult to understand the temporalrelationship between various exposuresand the occurrence of disease. Forexample, an exposure that increases therisk of death in people with pre-existing

    chronic heart disease will be negativelyassociated with the prevalence of heartdisease (in people who are alive!), andwill therefore appear to be protectiveagainst heart disease in a prevalencestudy.

    Example 3 .1

    The International Studyof Asthma and Allergiesin Childhood (ISAAC)(Asher et al, 1995;Pearce et al, 1993)involved a simple PhaseI global asthmasymptom prevalencesurvey and a more in-depth Phase II survey.The emphasis was onobtaining the maximum

    possible participationacross the world in orderto obtain a globaloverview of childhoodasthma prevalence, andthe Phase Iquestionnaire moduleswere designed to besimple and to requireminimal resources toadminister. In addition,a video questionnaire

    involving the audio-visual presentation ofclinical signs andsymptoms of asthmawas developed in orderto minimise translationproblems. Thepopulation of interestwas schoolchildren aged6-7 years and 13-14

    years within specifiedgeographical areas. Theolder age-group waschosen to reflect theperiod when morbidityfrom asthma is commonand to enable the use ofself-completedquestionnaires. Theyounger age-group waschosen to give areflection of the early

    childhood years, andinvolves parent-completion ofquestionnaires. ThePhase I findings,involving more than700,000 children,showed strikinginternational differencesin asthma symptomprevalence (ISAACSteering Committee,

    1998a, 1998b). Figure3.1 shows the findingsfor current wheeze (i.e.wheeze in the previous12 months). There are anumber of interestingfeatures of the figure: (i)there is a particularlyhigh prevalence ofreported asthma

    symptoms in English-speaking countries; (ii)centres in Latin Americaalso had particularly highsymptom prevalence;(iii) there is also highasthma prevalence inWestern Europe, withlower prevalences inEastern and SouthernEurope - for example,there is a clear

    Northwest-Southeastgradient within Europe,with the highestprevalence in the worldbeing in the UnitedKingdom, and some ofthe lowest prevalencesin Albania and Greece;(iv) Africa and Asiagenerally showedrelatively low asthmaprevalence. These

    striking findings call intoquestion many of theestablished theories ofasthma causation, andhave played a major rolein the development ofnew theories of asthmacausation in recent years(Douwes and Pearce,2003).

    34

  • 8/7/2019 Int Roe Pi

    35/153

    Figure 3 .1

    Twelve month period prevalence of asthma symptoms in 13-14 year old children inPhase I of the International Study of Asthma and Allergies in Childhood (ISAAC)

    Source: ISAAC Steering Committee (1998b)

  • 8/7/2019 Int Roe Pi

    36/153

    Measures of Effect in Prevalence Studies

    Figure 3.2 shows the relationshipbetween incidence and prevalence ofdisease in a steady state population.Assume that the population is in a

    steady state (stationary) over time (inthat the numbers within eachsubpopulation defined by exposure,disease and covariates do not changewith time) this usually requires thatincidence rates and exposure anddisease status are unrelated to the

    immigration and emigration rates and

    population size - and that averagedisease duration (D) does not changeover time. Then, if we denote theprevalence of disease in the studypopulation by P, the prevalence odds isequal to the incidence rate (I) times theaverage disease duration (Alho, 1992):

    P------ = ID(1-P)

    Figure 3.2

    Relationship between prevalence and incidence in a steady state population

    Non-asthmatic

    [N(1-P)]

    Asthma

    cases

    [NP]

    N(1-p) x I

    NP/D

    P/(1-P) = I x D

    P=prevalence

    I=incidence

    D=duration

    N=population

    Now suppose that we compare twopopulations (indexed by 1=exposed and0=non-exposed) and that both satisfythe above conditions. Then, theprevalence odds is directly proportionalto the disease incidence, and the

    prevalence odds ratio (POR) satisfies theequation:

    POR = [P1/(1-P1)]/[P0/(1-P0)] = I1D1/I0D0

    An increased prevalence odds ratio maythus reflect the influence of factors thatincrease the duration of disease, as wellas those that increase disease incidence.However, in the special case where the

    36

  • 8/7/2019 Int Roe Pi

    37/153

    average duration of disease is the samein the exposed and non-exposed groups(i.e. D1 = D0), then the prevalence oddsratio satisfies the equation:

    POR = I1/I0

    i.e. under the above assumptions, theprevalence odds ratio directly estimatesthe incidence rate ratio (Pearce, 2004).However, it should be emphasised thatprevalence depends on both incidenceand average disease duration, and a

    difference in prevalence between twogroups could entirely depend ondifferences in disease duration (e.g.because of factors which prolong orexacerbate symptoms) rather thandifferences in incidence. Changes inincidence rates, disease duration andpopulation sizes over time can also biasthe POR away from the rate ratio, as canmigration into and out of the populationat risk or the prevalence pool.

    Table 3.1

    Findings from a hypothetical prevalence study of 20,000 persons

    Exposed Non-exposed Ratio--------------------------------------------------------------------------------------Cases 909 (a) 476 (b)Non-cases 9,091 (c) 9,524 (d)--------------------------------------------------------------------------------------Total population 10,000 (N1) 10,000 (N0)--------------------------------------------------------------------------------------Prevalence 0.0909 (P1) 0.0476 (P0) 1.91Prevalence odds 0.1000 (O1) 0.0500 (O0) 2.00

    Table 3.1 shows data from aprevalence study of 20,000 people.This is based on the incidence studyrepresented in table 2.2 (chapter 2),with the assumptions that, for bothpopulations, the incidence rate andpopulation size is constant over time,that the average duration of disease isfive years, and that there is nomigration of people with the diseaseinto or out of the population (suchassumptions may not be realistic, butare made here for purposes ofillustration). In this situation, thenumber of cases who "lose" thedisease each year is balanced by the

    number of new cases generated fromthe source population. For example, inthe non-exposed group, there are 476prevalent cases, and 95 (20%) ofthese "lose" their disease each year;this is balanced by the 95 people whodevelop the disease each year (0.0100of the susceptible population of 9524people). With the additionalassumption that the average durationof disease is the same in the exposedand non-exposed groups, then theprevalence odds ratio (2.00) validlyestimates the incidence rate ratio (seetable 2.2).

    37

  • 8/7/2019 Int Roe Pi

    38/153

    3.2. Prevalence Case-Control Studies

    Just as an incidence case-control studycan be used to obtain the same findingsas a full incidence study, a prevalencecase-control study can be used to obtainthe same findings as a full prevalence

    study in a more efficient manner.

    For example, in a prevalence study,obtaining exposure information may bedifficult or costly, e.g. if it involveslengthy interviews, or expensive testingof biological samples. In this situation, aconsiderable gain in efficiency can beachieved by only obtaining exposureinformation on the prevalent cases and asample of controls selected at randomfrom the non-cases, rather thancollecting exposure information foreveryone in the prevalence study.

    Measures of Effect in PrevalenceCase-Control Studies

    Suppose that a nested case-controlstudy is conducted in the study

    population (table 3.1), involving all ofthe 1,385 prevalent cases and a groupof 1,385 controls (table 3.2). The usualapproach is to select controls from thenon-cases. The ratio of exposed to non-exposed controls will then estimate theexposure odds (b/d) of the non-cases,and the odds ratio obtained in theprevalence case-control study willtherefore estimate the prevalence oddsratio in the source population (2.00),which in turn estimates the incidencerate ratio provided that theassumptions described above aresatisfied in the exposed and non-exposed populations.

    Table 3.2

    Findings from a hypothetical prevalence case-control study based on the populationrepresented in table 3.1

    Exposed Non-exposed Ratio--------------------------------------------------------------------------------------Cases 909 (a) 476 (b)Controls 676 (c) 709 (d)--------------------------------------------------------------------------------------Prevalence odds 1.34 (O1) 0.67 (O0) 2.00---------------------------------------------------------------------------------

    38

  • 8/7/2019 Int Roe Pi

    39/153

    Example 3 .2

    Studies of congenitalmalformations usuallyinvolve estimating theprevalence ofmalformations at birth(i.e. this is a prevalencerather than an incidencemeasure). Garcia et al(1999) conducted a

    (prevalence) case-controlstudy of occupational

    exposure to pesticides andcongenital malformationsin Comunidad Valenciana,Span. A total of 261 casesand 261 controls wereselected from thoseinfants born in eightpublic hospitals during1993-1994. For mothers

    who were involved inagricultural activities in

    the month beforeconception and the firsttrimester of pregnancy,the adjusted prevalenceodds ratio for congenitalmalformations was 3.2(95% CI 1.1-9.0). Therewas no such associationwith paternal agricultural

    work.

    Summary

    When a dichotomous outcome is understudy (e.g. being alive or dead, orhaving or not having a disease) fourmain types of studies can be identified:incidence studies, incidence case-controlstudies, prevalence studies, andprevalence case-control studies(Morgenstern and Thomas, 1993;Pearce, 1998). Prevalence studiesinvolve measuring the prevalence of thedisease in the source population at a

    particular time, rather than the incidenceof the disease over time. Prevalencecase-control studies involve sampling onthe basis of outcome, i.e. they usuallyinvolve all prevalent cases in the sourcepopulation and a control group (of non-cases) sampled from the sourcepopulation.

    References

    Alho JM (1992). On prevalence,incidence, and duration in generalstable populations. Biometrics 48:587-92.

    Asher I, Keil U, Anderson HR, et al(1995). International study of asthmaand allergies in childhood (ISAAC):

    rationale and methods. Eur Resp J 8:483-91.

    Douwes J, Pearce N (2003). Asthma andthe Westernization package. Int JEpidemiol 31: 1098-1102.

    Garcia AM, Fletcher T, Benavides FG,Orts E (1999). Parental agriculturalwork and selected congenitalmalformations. Am J Epidemiol 149:

    64-74.

    39

  • 8/7/2019 Int Roe Pi

    40/153

    ISAAC Steering Committee (1998a).Worldwide variation in prevalence of

    symptoms of asthma, allergicrhinoconjunctivitis and atopic eczema:ISAAC. Lancet 351: 1225-32.

    ISAAC Steering Committee (1998b).Worldwide variations in theprevalence of asthma symptoms:International Study of Asthma andAllergies in Childhood (ISAAC). EurRespir J 12: 315-35.

    Morgenstern H, Thomas D (1993).Principles of study design in

    environmental epidemiology. EnvironHealth Perspectives 101: S23-S38.

    Pearce N (1998). The four basicepidemiologic study types. J Epidemiol

    Biostat 3: 171-7.

    Pearce N (2004). Effect measures inprevalence studies. EnvironmentalHealth Perspectives 2004; 112: 1047-50.

    Pearce NE, Weiland S, Keil U, et al(1993). Self-reported prevalence ofasthma symptoms in children inAustralia, England, Germany and NewZealand: an international comparisonusing the ISAAC protocol. Eur Resp J

    6: 1455-61.

    40

  • 8/7/2019 Int Roe Pi

    41/153

    CHAPTER 4. More Complex Study Designs

    (In: Pearce N. A Short Introduction to Epidemiology. 2nd ed. Wellington, CPHR, 2005)

    In the previous two chapters I reviewedthe possible study designs for the simplesituation where individuals are exposedto a particular risk factor (e.g. aparticular chemical) and when adichotomous outcome is under study(e.g. being alive or dead, or having or

    not having a particular disease). I nowconsider studies involving other axes ofclassification, continuous measurementsof health status (e.g. continuous lungfunction or blood pressuremeasurements) and more complex studydesigns (ecologic and multilevel studies).

    4.1: Other Axes of Classification

    The four basic study types discussed inchapters 2 and 3 are defined in terms of:(a) the type of outcome under study

    (incidence or prevalence); and (b)whether there is sampling on the basis ofoutcome. They do not involve anyconsideration of the nature of theexposure data. This provides additionalaxes of classification.

    Continuous Exposure Data

    Firstly, it should be noted that indiscussing the above classification wehave assumed that exposure is

    dichotomous (i.e. study participantsare exposed or not exposed). Inreality, there may be multipleexposure categories (e.g. high,medium and low exposure), orexposure may be measured as acontinuous variable (see chapter 8).However, although this requires minorchanges to the data analysis (seechapter 12), it does not alter the four-fold categorisation of study designoptions presented above.

    The Timing of Collection Of ExposureInformation

    Perhaps the feature that has receivedthe most attention in variousclassification schemes is the timing ofthe collection of exposure information.This has dominated discussions of

    directionality, particularly with regardto case-control studies. In fact, for all ofthe four basic study types, exposureinformation can be collectedprospectively or retrospectively. Forexample, an incidence study or incidencecase-control study of occupational cancer

    may collect exposure informationprospectively, or use historicalinformation that was collectedprospectively but abstractedretrospectively by the investigator (e.g.occupational hygiene monitoringrecords), or use exposure informationthat was collected retrospectively (e.g.recall of duration and intensity ofpesticide use). An unfortunate aspect ofsome discussions of the merits of case-control studies is that they have often

    41

  • 8/7/2019 Int Roe Pi

    42/153

    been labelled as retrospective studies,when this is in fact not an inherent partof their design. The potential problemof bias due to exposure ascertainmenterrors (e.g. recall bias) arises from theretrospective collection of exposureinformation, irrespective of whether thestudy is an incidence, incidence case-control, prevalence, or prevalence case-control study.

    Sources of Exposure Information

    Another set of issues that occur inpractice involve the sources of exposureinformation (e.g. routine records, job-exposure-matrices, questionnaires,biological samples). However, as notedabove, these issues are important inunderstanding sources of bias but are

    not fundamental to the classification ofstudy types since, as with issues ofdirectionality, they do not affect theparameterization of the exposure-outcome association.

    The Level of Measurement ofExposure

    A third additional axis of classificationinvolves the level of measurement ofexposure. In particular, in ecologicstudies exposure information may be

    collected on a group rather than onindividuals (e.g. average level of meatconsumption) although others may stillbe available for individuals (e.g. age,gender). This situation is discussed insection 4.3.

    4.2: Continuous Outcome Measures

    Cross-Sectional Studies

    In chapters 2 and 3, the health outcomeunder study was a state (e.g. having ornot having hypertension). Studies couldinvolve observing the incidence of theevent of acquiring the disease state (e.g.the incidence of being diagnosed with

    hypertension), or the prevalence of thedisease state (e.g. the prevalence ofhypertension). More generally, thehealth state under study may havemultiple categories (e.g. non-hypertensive, mild hypertension,moderate hypertension, severehypertension) or may be represented bya continuous measurement (e.g. bloodpressure). Since these measurementsare taken at a particular point in time,such studies are often referred to ascross-sectional studies. Prevalencestudies (see chapter 3) are a subgroup

    of cross-sectional studies in which thedisease outcome is dichotomous.

    Although cross-sectional studies aresometimes described as studies in whichexposure and disease information iscollected at the same point in time (e.g.Kramer and Boivin, 1988; Last 1988),

    this is not in fact an inherent feature ofsuch studies. In most cross-sectionalstudies (including prevalence studies),information on exposure will bephysically collected by the investigator atthe same time that information ondisease is collected. Nonetheless,exposure information may includefactors that do not change over time(e.g. gender) or change in a predictablemanner (e.g. age) as well as factors thatdo change over time. The latter mayhave been measured at the time of datacollection (e.g. current levels of airborne

    42

  • 8/7/2019 Int Roe Pi

    43/153

    dust exposure), or at a previous time(e.g. from historical records on pastexposure levels) or integrated over time.The key feature of cross-sectionalstudies is that they involve studyingdisease at a particular point in time.Exposure information can be collectedfor current and/or historical exposures,and a wide variety of exposureassessment methods can be used withinthis general category of study (these arediscussed further in chapter 8).

    Just as a prevalence case-control studycan be based on a prevalence survey, across-sectional study can also involvesampling on the basis of the diseaseoutcome. For example, a cross-sectionalstudy of bronchial hyperresponsiveness(BHR) could involve testing all studyparticipants for BHR and thencategorising the test results into severeBHR, mild BHR, and no BHR, and thenobtaining exposure information on allsevere BHR cases and from random

    samples of the other two groups.

    Measures of Effect in Cross-SectionalStudies

    In a simple cross-sectional studyinvolving continuous outcome data, thebasic methods of statistical analysisinvolve comparing the mean level of theoutcome in exposed and non-exposed groups, e.g. the mean levels ofblood pressure in exposed and non-exposed people. Standard statisticalmethods of analysis for comparingmeans (perhaps after a suitable

    transformation to normalise the data),and calculating confidence intervals (andassociated p-values) for differencesbetween means, can be used to analysesuch studies (see chapter 12). Moregenerally, regression methods can beused to model the relationship betweenthe level of exposure (measured as acontinuous variable) and the level of theoutcome measure (also measured as acontinuous variable) (e.g. Armitage et al,2002).

    Example 4 .1

    Nersesyan et al (2001)studied chromosomeaberrations inlymphocytes of personsexposed to anearthquake in Armenia.

    They collected bloodsamples from 41 victimsof the 1988 earthquakeand from 47 referenceblood donors. Those

    exposed to theearthquake had a higherproportion of cells withchromosome aberrations(3.1% (SD 2.1)) thanthe referents (1.7% (SD

    1.3)). The differencespersisted when the datawere adjusted for ageand gender. The authorssuggested that the

    findings could be dueeither to environmentalexposures related to theearthquake or to severepsychogenic stress.They noted that studies

    in wild rodents living inseismic regions haveshown similar findings.

    43

  • 8/7/2019 Int Roe Pi

    44/153

    Longitudinal Studies

    Longitudinal studies (cohort studies)involve repeated observation of studyparticipants over time (Pearce et al,1998). Incidence studies (chapter 2) area subgroup of longitudinal study in whichthe outcome measure is dichotomous.More generally, longitudinal studies mayinvolve repeated assessment ofcategorical or continuous outcomemeasures over time (e.g. a series oflinked cross-sectional studies in the

    same population). They thus can involveincidence data, a series of prevalencesurveys, or a series of cross-sectionalcontinuous outcome measures.

    General longitudinal studies

    A simple longitudinal study may involvecomparing the disease outcomemeasure, or more usually changes in themeasure over time, between exposedand non-exposed groups. For example,

    rather than comparing the incidence ofhypertension (as in an incidence study),

    or the prevalence at a particular time (asin a prevalence study), or the meanblood pressure at a particular point intime (as in a cross-sectional study), alongitudinal study might involvemeasuring baseline blood pressure inexposed and non-exposed persons andthen comparing changes in mean bloodpressure (i.e. the change from thebaseline measure) over time in the twogroups. Such a comparison of means can

    be made using standard statisticalmethods for comparing means andcalculating confidence intervals andassociated p-values for the differencebetween the means (Armitage et al,2002; Beaglehole et al, 1993). Moregenerally, regression methods (Diggle etal, 1994) might be used to model therelationship between the level ofexposure (mea