Click here to load reader

Polit ln ch09

  • View
    50

  • Download
    1

Embed Size (px)

Text of Polit ln ch09

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Statistics and Data Analysisfor Nursing Research

    Second Edition

    CHAPTER

    Correlation and Simple Regression

    9

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Pearsons r

    A descriptive index that summarizes magnitude and nature (direction) of a relationship between two variables in a sample

    Can also be used to make inferences about relationships in the population

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Basic Hypotheses

    Correlational hypotheses are about (rho), the population correlation coefficient

    Basic null hypothesis: rho is zero H0: = .00

    The alternative (nondirectional) hypothesis is the opposite: H1: .00

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Sampling Distribution

    The mean of a sampling distribution of the correlation coefficient is , the population coefficient

    When the null hypothesis is true (when = .00): The theoretical sampling distribution is

    centered on .00 The sampling distribution is approximately

    normal

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Assumptions and Requirements

    Pearsons r is suitable for (a) interval- and ratio-level variables (b) detecting linear relationships

    Pearsons r can be used inferentially: If the variables have an underlying distribution that is

    bivariate normal (scores on X normally distributed for each value of Y)

    If values on both variables are homoscedastic (for each value of X, variability of Y scores about the same)

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Testing Significance

    Value of a computed r must be compared to critical values in a table for which degrees of freedom is known and a significance criterion () is established

    For Pearsons r:df = N - 2

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    r-to-z Transformation

    Testing differences between two correlations requires that the two correlation coefficients be transformed: The r-to-z transformation

    The normal distribution can then be used, using appropriate formulas

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Magnitude of Effect Pearsons r provides direct information

    about the direction and magnitude of effects

    Pearsons r can be directly used as the effect size index in meta-analysis

    But the magnitude of effect is more often presented as r2

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Coefficient of Determination r2 is sometimes called the coefficient of

    determination r2 indicates the proportion of variability in

    one variable shared with or explained by variability in the other

    r2 is analogous to eta2: It represents the ratio of explained variance to total variance: r2 = SSExplained SSTotal

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Precision and Pearsons r Confidence intervals can be built around

    the value of r to indicate the precision of the population estimate

    For example, with a sample of 50, the 95% CI around r = .26 is -.02 to .50 This includes the possibility that the

    population correlation is zerothe null hypothesis

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Power and Pearsons r In power analysis, r = effect size index Tables can be used to estimate sample

    size needs (to minimize the risk of a Type II error)

    As a last resort, small, medium, and large effects correspond to rs of .10, .30, and .50, respectively This corresponds to needed Ns of 785, 85, and

    29 participants

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Factors Affecting r

    The magnitude of r can be affected (often reduced) by: Having variables with restricted ranges of

    values Using groups at both extremes of a distribution

    of values Having outliers in the data Measuring the variables with instruments

    having low reliability (attenuation)

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Nonparametric Correlations Nonparametric options can be used if the

    data are ordinal or if assumptions for Pearsons r are seriously violated

    Spearmans rho (rs): Based on ranks of the original data values

    Kendalls tau (): A complex formula, sometimes a preferred index because of its statistical properties

    Both range from -1.00 through .00 to 1.00

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    SPSS and Correlations

    Analyze Correlate Bivariate

    Move all variables of interest into Variables slot

    Select Pearsons r, Kendalls tau, and/ or Spearmans rho

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Regression

    Regression: Techniques used to analyze relationships between variables and to make predictions about values of variables

    Strong link between correlation and regression

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Simple Linear Regression

    Simple linear regression involves regressing one variable (Y) on another variable (X)

    Y: The dependent (outcome) variable X: The independent variable, but often

    called the predictor variable in regression analysis

    The goal is to be able to predict new values of Y based on values of X

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Linear Regression

    Linear regression builds on the equation for a straight line because the relationship between the two variables is assumed to be linear

    A straight line should yield the best fit of the data points in a scatterplot (a linear model)

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Equation for a Straight Line Any straight line can be described by

    this equation:Y = a + bX

    Y = Values on one variable X = Values on the other variable a = The intercept constant (the point at

    which the line crosses the vertical (Y) axis

    b = The slope of the line

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Regression Equation Identifies the straight line that runs through

    the scatterplot data with the best possible fit

    Y = a + bX Y = Predicted value of Y X = Actual value of X a = Intercept constant b = The slope of the line, but in this context

    is called the regression coefficient

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Solving for a and b The values of the intercept constant

    (a) and regression coefficient (b) in the regression equation are calculated using formulas that involve:

    Means Deviations from means Cross products of deviations of X and Y

    scores from their respective means

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Illustration Textbook example: Predicting students

    final exam scores based on midterm scores:

    Midterm scores: 2, 6, 5, 9, 7, 9, 3, 4, 1, 4Final scores: 3, 7, 6, 8, 9, 10, 4, 6, 2, 5 r = .955 Regression equation:

    Y = 1.5 + .90X

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Graphic Representation The intercept constant

    crosses the Y axis at a = 1.5

    The slope is such that for every 10 points on the X axis, you go up 9 on the Y axis

    b = .90

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Prediction and Regression

    Regression equations yield predictions of new values of Y based on known values of X

    E.g., for the equation, Y = 1.5 + .90X:X Actual Y Predicted Y 1 2 2.4 5 6 6.0 9 10 9.6

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Errors of Prediction Errors of prediction: Differences between

    actual and predicted values of Y: Symbolized as e Also called residuals

    X Actual Y Predicted Y e1 2 2.4 -0.45 6 6.0 0.09 10 9.6 0.4

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Least Squares

    The regression equation uses a least-squares criterion in solving for a and b

    The squares of the errors of prediction (e2) are minimized

    Standard regression sometimes called ordinary least-squares (OLS) regression for this reason

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Standard Error of Estimate

    Standard error of estimate: An index of how wrong, on average, a predicted value of Y is

    The larger the correlation coefficient between the two variables in the regression, the smaller the SEEstimate

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    Proportion of Variability in Y

    As a proportion of all variability in Y scores, the squared residuals are what is left to be explained (residual variation), after the correlation between the two variables is taken into account:

    e 2

    Total variability in Y = 1 r2

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    SPSS and Regression

    Analyze Regression Linear

    Commands will be explained in the next chapter

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    SPSS Output: Model Summary

    Model R R Square Adjusted R Square

    Standard Error of

    Estimate

    1 .955 .912 .901 .81

    SPSS calculates an adjusted R square using a formula that adjusts for sample size and number of predictors

  • Copyright 2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

    All rights reserved.

    Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

    SPSS Output: Coefficients

    Model Unstandardized Coefficients

    Standard-ized Coeffi-

    cients t Sig.

    95% Confidence Interval for B

    B Std. Error

    Beta Lower Bound

    Upper Bound

    Constant 1.515 .556 2.727 .026 .234 2.796Midterm .897 .099 .955 9.106 .000 .670 1.124Dependent variable: Final exam scores The information in the column unstandardized

    coefficients embodies the regression equation:a (constant) = 1.515 and b (slope) = .897