Click here to load reader

View

50Download

1

Embed Size (px)

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Statistics and Data Analysisfor Nursing Research

Second Edition

CHAPTER

Correlation and Simple Regression

9

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Pearsons r

A descriptive index that summarizes magnitude and nature (direction) of a relationship between two variables in a sample

Can also be used to make inferences about relationships in the population

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Basic Hypotheses

Correlational hypotheses are about (rho), the population correlation coefficient

Basic null hypothesis: rho is zero H0: = .00

The alternative (nondirectional) hypothesis is the opposite: H1: .00

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Sampling Distribution

The mean of a sampling distribution of the correlation coefficient is , the population coefficient

When the null hypothesis is true (when = .00): The theoretical sampling distribution is

centered on .00 The sampling distribution is approximately

normal

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Assumptions and Requirements

Pearsons r is suitable for (a) interval- and ratio-level variables (b) detecting linear relationships

Pearsons r can be used inferentially: If the variables have an underlying distribution that is

bivariate normal (scores on X normally distributed for each value of Y)

If values on both variables are homoscedastic (for each value of X, variability of Y scores about the same)

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Testing Significance

Value of a computed r must be compared to critical values in a table for which degrees of freedom is known and a significance criterion () is established

For Pearsons r:df = N - 2

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

r-to-z Transformation

Testing differences between two correlations requires that the two correlation coefficients be transformed: The r-to-z transformation

The normal distribution can then be used, using appropriate formulas

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Magnitude of Effect Pearsons r provides direct information

about the direction and magnitude of effects

Pearsons r can be directly used as the effect size index in meta-analysis

But the magnitude of effect is more often presented as r2

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Coefficient of Determination r2 is sometimes called the coefficient of

determination r2 indicates the proportion of variability in

one variable shared with or explained by variability in the other

r2 is analogous to eta2: It represents the ratio of explained variance to total variance: r2 = SSExplained SSTotal

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Precision and Pearsons r Confidence intervals can be built around

the value of r to indicate the precision of the population estimate

For example, with a sample of 50, the 95% CI around r = .26 is -.02 to .50 This includes the possibility that the

population correlation is zerothe null hypothesis

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Power and Pearsons r In power analysis, r = effect size index Tables can be used to estimate sample

size needs (to minimize the risk of a Type II error)

As a last resort, small, medium, and large effects correspond to rs of .10, .30, and .50, respectively This corresponds to needed Ns of 785, 85, and

29 participants

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Factors Affecting r

The magnitude of r can be affected (often reduced) by: Having variables with restricted ranges of

values Using groups at both extremes of a distribution

of values Having outliers in the data Measuring the variables with instruments

having low reliability (attenuation)

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Nonparametric Correlations Nonparametric options can be used if the

data are ordinal or if assumptions for Pearsons r are seriously violated

Spearmans rho (rs): Based on ranks of the original data values

Kendalls tau (): A complex formula, sometimes a preferred index because of its statistical properties

Both range from -1.00 through .00 to 1.00

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS and Correlations

Analyze Correlate Bivariate

Move all variables of interest into Variables slot

Select Pearsons r, Kendalls tau, and/ or Spearmans rho

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Regression

Regression: Techniques used to analyze relationships between variables and to make predictions about values of variables

Strong link between correlation and regression

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Simple Linear Regression

Simple linear regression involves regressing one variable (Y) on another variable (X)

Y: The dependent (outcome) variable X: The independent variable, but often

called the predictor variable in regression analysis

The goal is to be able to predict new values of Y based on values of X

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Linear Regression

Linear regression builds on the equation for a straight line because the relationship between the two variables is assumed to be linear

A straight line should yield the best fit of the data points in a scatterplot (a linear model)

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Equation for a Straight Line Any straight line can be described by

this equation:Y = a + bX

Y = Values on one variable X = Values on the other variable a = The intercept constant (the point at

which the line crosses the vertical (Y) axis

b = The slope of the line

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Regression Equation Identifies the straight line that runs through

the scatterplot data with the best possible fit

Y = a + bX Y = Predicted value of Y X = Actual value of X a = Intercept constant b = The slope of the line, but in this context

is called the regression coefficient

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Solving for a and b The values of the intercept constant

(a) and regression coefficient (b) in the regression equation are calculated using formulas that involve:

Means Deviations from means Cross products of deviations of X and Y

scores from their respective means

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Illustration Textbook example: Predicting students

final exam scores based on midterm scores:

Midterm scores: 2, 6, 5, 9, 7, 9, 3, 4, 1, 4Final scores: 3, 7, 6, 8, 9, 10, 4, 6, 2, 5 r = .955 Regression equation:

Y = 1.5 + .90X

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Graphic Representation The intercept constant

crosses the Y axis at a = 1.5

The slope is such that for every 10 points on the X axis, you go up 9 on the Y axis

b = .90

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Prediction and Regression

Regression equations yield predictions of new values of Y based on known values of X

E.g., for the equation, Y = 1.5 + .90X:X Actual Y Predicted Y 1 2 2.4 5 6 6.0 9 10 9.6

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Errors of Prediction Errors of prediction: Differences between

actual and predicted values of Y: Symbolized as e Also called residuals

X Actual Y Predicted Y e1 2 2.4 -0.45 6 6.0 0.09 10 9.6 0.4

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Least Squares

The regression equation uses a least-squares criterion in solving for a and b

The squares of the errors of prediction (e2) are minimized

Standard regression sometimes called ordinary least-squares (OLS) regression for this reason

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Standard Error of Estimate

Standard error of estimate: An index of how wrong, on average, a predicted value of Y is

The larger the correlation coefficient between the two variables in the regression, the smaller the SEEstimate

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Proportion of Variability in Y

As a proportion of all variability in Y scores, the squared residuals are what is left to be explained (residual variation), after the correlation between the two variables is taken into account:

e 2

Total variability in Y = 1 r2

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS and Regression

Analyze Regression Linear

Commands will be explained in the next chapter

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS Output: Model Summary

Model R R Square Adjusted R Square

Standard Error of

Estimate

1 .955 .912 .901 .81

SPSS calculates an adjusted R square using a formula that adjusts for sample size and number of predictors

Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS Output: Coefficients

Model Unstandardized Coefficients

Standard-ized Coeffi-

cients t Sig.

95% Confidence Interval for B

B Std. Error

Beta Lower Bound

Upper Bound

Constant 1.515 .556 2.727 .026 .234 2.796Midterm .897 .099 .955 9.106 .000 .670 1.124Dependent variable: Final exam scores The information in the column unstandardized

coefficients embodies the regression equation:a (constant) = 1.515 and b (slope) = .897