60
Research in Applied Econometrics Chapter 1. R Research in Applied Econometrics Chapter 1. R Pr. Philippe Polomé, Université Lumière Lyon 2 M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux 2017 – 2018

Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Embed Size (px)

Citation preview

Page 1: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Research in Applied EconometricsChapter 1. R

Pr. Philippe Polomé, Université Lumière Lyon 2

M1 APE Analyse des Politiques Économiques

M1 RISE Gouvernance des Risques Environnementaux

2017 – 2018

Page 2: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

SWIRL

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

Page 3: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

SWIRL

SWIRL

I Do Course 1 : R programming, Lessons 1-9 + 14 by yourselfI To quit a lesson : escI Answer “no” to any proposition to “register”I Following ...

I press ΩÚI Sometimes, much text is to be read – that is a good exercice

I Follow the commands in the RAE2017.RI They follow the slides

I We do just Lesson 1I To make sure you can start the other lessons by yourself

Page 4: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

SWIRL

SWIRL R programming overview

1 : Basic Building Blocks 2 : Workspace and Files3 : Sequences of Numbers 4 : Vectors5 : Missing Values 6 : Subsetting Vectors7 : Matrices and Data Frames 8 : Logic9 : Functions 10 : lapply and sapply11 : vapply and tapply 12 : Looking at Data13 : Simulation 14 : Dates and Times15 : Base Graphics

Page 5: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

SWIRL

A few commands outside of SWIRL

I In R-Studio, create a new project (upper right button)I Call it “RAE” for exampleI Store it where you can find it back

I Execute the commands on RAE2017.R to see the outputI Usual math functions : log, exp, sign, sqrt, abs, min, max

I log(exp(sin(pi/4)^2)*exp(cos(pi/4)^2)) Type in Console ΩÚ

I Special vectorsI ones <- rep(1, 10)I even <- seq(from = 2, to = 20, by =2)I trend <- 1981 :2005

I diag(4) Identity mtx of size 4

Page 6: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

SWIRL

Mtx OperationsI A<-matrix(1 :6, nrow = 2)

I A look what it looks like & how R gives the position of theelements

I Look @ your environment window : A is now thereI It remains in you project until erased (the brush)

I t(A) = transpose of A ( not A’ )I dim(A) = dimensions of A (R then C)I nrow(A) ; ncol(A) nbr R ; CI A[i,j] extract element (i,j)

I Does not remove it from the mtxI A[,j] extract C j (all the R) into one vector

I A[i,] same for R iI A1<-A[1 :2, c(1, 3)] A1 has 2 R containing the elts in R 1 to

2 and C 1 & 3 from AI For this particular mtx, same result w/ A[,-2]

Page 7: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

SWIRL

Mtx Operations

I det(A1) determinantI eigen(A1) eigenvaluesI chol(A1) Cholesky decomposition (type ?chol in Console)I solve(A1) inverseI A %*% B mtx product

I A*A element-by-element productI kronecker(A, B) Kronecker element ¢ (type ?kronecker)

I crossprod(A, B) e�cient calculation of A’BI diag(A1) extract diag

Page 8: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

SWIRL

Mtx Operations

I cbind(1, A1) “combine” one C of ones and A1

. . .

. æ . .

I rbind(A1, diag(4, 2)) “stack” A1 & a diag mtx of size 2 w/ 4on the diag

. .

. .ø. .. .

Page 9: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

Page 10: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

Dataframe

I “Frame” = “context”I In R, a “Dataframe” is a data mtx

I a collection of vectors of same lengthI Stacked together horizontaly

I Each vector = 1 C = “variable”I Possibly of di�erent natures

I quantitative, numeric but qualitative, characters, dates...I it may further contain meta-data

I e.g. variable type or categories nameI Each R = 1 obs in the sampleI An “array” is, in R, a more general object as it may have more

than 2 dimensions

Page 11: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

Dataframe Creation

I Several waysI keyboard (cfr Swirl programming lesson 7)I read R fileI import

I keyboard exampleI alternative 1

I mydata <- data.frame(one = 1 :10, two = 11 :20, three =21 :30)

I alternative 2I mydata <- as.data.frame(matrix(1 :30, ncol=3)) and

names(mydata) <- c(“one”, “two”, “three”)I R is not very good for encoding data manually

Page 12: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

attach

I A dataframe is “attached”I w/ command attach

I then variables’ names in the dataframe maybe used directly incommands

I For exampleI mean(two) produce an error messageI attach(mydata) and then mean(two) produces the average of

variable “two”I detach(mydata) is self-explanatory

I Why detach ? e.g. to avoid confusionsI Attacher for a single operation

I with(mydata, mean(two))

Page 13: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

Subset Selection

I As seen in swirl a subset of a Dataframe can be accessed by[ or $

I $ extract a single varaibleI The command subset sometimes work better (e.g.

conditional selection)I e.g. mydata.sub<-subset(mydata, two<=16, select = -two)I selects all the obs. of variables one & three

I fow which the obs of variable 2 are Æ 16

Page 14: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

Export (write) a dataframe

I write.table(mydata, file=“mydata.txt”, col.names=TRUE)I create a txt file mydata.txt in the working directory

I normally where your project isI Meta-data are not passed

I The text file format is

“one” “two” “three”“1” 1 11 21“2” 2 12 22...

I So that it looks like the C headers are shifted leftI Take that into account accordingly w/ the software you use to

open it

Page 15: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

Import (read) a dataframeI From a text file (.txt or .csv)

I newdata <- read.table(“mydata.txt”, header=TRUE)I reads a txt file in which the 1st R has the variable namesI this is placed in a “table” called newdataI Also works as read_csv( ) from csv into a data frame

I read.table accepts many optionsI C separator : , ;I Decimal separator : . ,

I French is your enemy hereI ?read.table

I The Environment window has a button that makes it veryeasy

I a preview is generated

Page 16: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Data Management

Import a dataframeI scan is used for data that are not in mtx form

I ?scanI Import from another software : excel, stata, sas...

I Easiest : if you have access to the software, export the data filein txt or csv

I loss of meta-dataI R-Studio proposes several formats

I It does not work often as these software change their formatsoften

I Use GoogleI e.g. “R import Stata 17 data”

I Also www.statmethods.net/input/importingdata.htmlI for a few formats

Page 17: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

Page 18: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

Plot

I First SWIRLI course R-programming, lesson 15 Base graphics

I A few additional graphic elements using package plot

I Packages lattice ggplot2 are betterI http ://varianceexplai-

ned.org/RData/code/code_lesson2/#segment1I R has many publication-quality graphics

I But they are not very intuitiveI plot( ) is the default graphic command for many objects :

I dataframes, time séries, fitted linear modelsI it is also an old, crude, command

Page 19: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

Examples with data("CPS1988")

I Data file is cps1988 preloaded in the AER packageI Pop. survey March 1988, US Census BureauI 28 155 obs., cross-sectionI Men, 18-70 y-oI Income > US$ 50 in 1988I Not self-employed, not working w/o salary

I summary(CPS1988)I Quantitative data

I wage $/weekI education & experience (=age-education-6) in years

Page 20: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

“Scatterplots” – dispersion – XYI Probably the + commons in stat (with histograms)

I We use CPS1988 : a census data file on wage and itsdeterminants

I From the AER packageI attach(CPS1988)

I plot(education, log(wage))I First is on arg in x-axis, 2nd in y-axis

I rug(education)I rug(log(wage), side=2)I rug = “tapis” – is a 1-D plot

I detach(CPS1988)I plot(log(subs)~log(citeprice), data=Journals)

I alternative to avoid attaching the dataframe

Page 21: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

R Graphic Parameters

I A plot results may be modified in many waysI E.g. argument type controls if the plot is made points (type =

p), lines (type = l), both (type = b), steps (type = s) orothers

I Several dozens parameters may be modifiedI See ?parI They may be modified after the plot w/ command par( )I Or thay can be supplied in the plot( ) command e.g.

plot(log(wage)~education, data=CPS1988, pch=20,col="blue", ylim=c(4,10), xlim=c(0,20), main="Wage byeducation years")

I Next slide : list of par

Page 22: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Argument Descriptionaxes should axes be drawn ?bg background colorcex size of a point or symbolcol colorlas orientation of axis label

lty, lwd line type and line widthmain, subs main title and subtitle

mar size of marginsmfcol, mfrow array defining layout for several graphs on one plot

pch plotting symboltype types

xlab, ylab axis labelsxlim, ylim axis ranges

xlog, ylog, log logarithmic scales

Page 23: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

R Graphic Parameters

I Add layer(s) to a plot : lines( ), points( ), text( ), legend( )I Add a straight line abline(a, b)

I a intercept, b slopeI 1 plot over another

I x <- rnorm(50)I x2 <- rnorm(50, -1)I plot(ecdf(x), xlim = range(c(x, x2)))

I ecdf empirical cumulative density functionI plot(ecdf(x2), add = TRUE, lty = "dashed")

I Barplots, pie charts, boxplots, QQ plots & histogramsI barplot( ), pie( ), boxplot( ), qqplot( ), hist( )I We’ll see later

Page 24: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

Export graphicsI To use R graphics in other software

I “Export” send the graph on a “device”I Really : just a .pdf or .jpg file extension

I All devices work similarly in R, see ?devices1. The device is opend by a command that bears its name, e.g.

pdf( )2. Then, the plot is executed3. Finaly, the device is closed dev.o�( )

I ExampleI pdf("myfile.pdf", height=5, width=6)I plot(1 :20, pch=1 :20, col=1 :20, cex=2)I dev.o�()

I Search myfile.pdf on your laptopI Simplest : “Export” button in Plots window

Page 25: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

Math Formulas in a Plot

I R may pass a formula in a plot via LATEXI see ?plotmath

I ExampleI plot of the std normal density w/ its math definitionI curve(dnorm, from=-5, to=5, col="slategray", lwd=3,

main="Density of the Standard Normal Distribution")I text(-5, 0.3, expression(f(x) == frac(1, sigma ~~ sqrt(2*pi))

~~ e^{-frac((x - mu)^2, 2*sigma^2)}), adj=0)I Unfortunately, you have to know LATEXI & the parameters are not easy

Page 26: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

Histograms & boxplots

I Continue w/ CPS1988 data base on wage & its determinantsI summary(CPS1988) reveals that some variables are categoricalI Categorical : called factors in R

I Factors are vectors of categoriesI sometimes w/ metadata

I e.g. categories namesI g <- rep(0 :1, c(2,4))I g <- factor(g, levels=0 :1, labels=c("male", "female"))

I Name categories (0,1) of g into “Male”(=0) & “Female”I so g is [1] male male female female female female

Page 27: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

Factors in CPS1988

I In CPS1988, the factors areI ethnicity vaut caucasian “cauc” & african-american “afam”I smsa résidence en zone urbaineI regionI parttime travail à mi-temps

I Plots according to data typeI Numerical/Quantitative or categoricalI Single variable or 2 in relation

Page 28: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

One numerical variable : histogram & densityI hist(wage, freq=FALSE)

I option freq=FALSEI relative frequencies, else absolute (counting)

I option binwidth=zzzI “bin” = container : chose the length of the base of the

rectanglesI hist(log(wage), freq=FALSE)I lines(density(log(wage)), col=4)

I Command density is actually a non-parametric estimate of thedensity function (next year)

I RemarksI log distribution is less asymetrical than the raw dataI data in log are often closer to a normal

I That is often the case w/ econ. data & a rationale for thenormal hypothesis

Page 29: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

One categorical

I W/ categorical dataI Mean & variance have no meaningI But frequencies do

I summary(region) : absolute frequencies (counts)I tab <- table(region) : stores these freq. in a table called tabI prop.table(tab) computes the proportions (relative freq.)I Barplots & pie visualise often quite well cat. data

I barplot(tab)I pie(tab)I These plots can be modified using parameters

Page 30: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

2 categorical

I Usually presented in a Contingency TableI xtabs( ) w/ a formula interface :

I e.g. xtabs(~ ethnicity + region, data = CPS1988)I data is optional si it is still attached

I table(ethnicity, region) mêmes résultatsI A plot of that is a “spine plot”

I plot(ethnicity ~ region) FormulaI plot(ethnicity, region) What di�erences ?

Page 31: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

2 numerical

I The Correlation Coe�cient r is typicalI For positive & asymetrical variables : Spearman’s fl

I ranks correlation, instead of values, is often prefered because ris not robust to asymetry

I cor(log(wage), education)I cor(log(wage), education, method="spearman")

I Results di�er a bitI plot(log(wage)~education)

I scatterplot shows little correlationI but log makes it di�cult to see graphically

Page 32: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

1 numerical & 1 categorical

I Often, conditionnal moments are calculatedI e.g. average wage by ethnicityI tapply(log(wage), ethnicity, mean)

I “Applies” the cmd “mean” on the 2 variables ethnicity &log(wage)

I Mean maybe replaced by any valid cmd, e.g quantileI The Box plots & QQ (quantile-quantile) plots are often used

Page 33: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

1 numerical & 1 categorical : Box plot

I A box plot is a crude representation of an empiricaldistribution

I The box is limited by “hinges” (1º & 3º quartiles) and showthe median

I Outside of the box, 2 lines indicate the smallest & largest obs.I within 1.5 ◊ size of the box from the closest hinge

I Any obs. outside is represented by separate pointsI boxplot(log(wage)~ethnicity)

Page 34: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

R graphics

1 numerical & 1 categorical : QQ plotI A QQ plot matches the quantiles of 2 (empirical)

distributionsI Recall that quantiles are quantities

I e.g. the 1º quartile of afam wage is the wage s.t. 25% of afammake less & 75% +

I If the 2 distributions are identical : QQ plot = diagonalI Otherwise, if e.g. cauc make more than afam, then

I with cauc on the x-axis, the QQ plot will be below the diag.I A bit like the plot of income inequality, but w/ 2 var.

I awage <- subset(CPS1988, ethnicity == "afam")$wageI cwage <- subset(CPS1988, ethnicity == "cauc")$wageI qqplot(awage, cwage)I abline(0,1) overlay the diag (intercept 0, slope 1)

I detach(CPS1988) pour refermer CPS1988

Page 35: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

Page 36: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Basic Regression Commands in RI Linear Regression Model LRM

yi = x

Õi — + ‘i

w/ i = 1...nI In mtx form y = X— + ‘

I Typical Hyp. in cross-sectionsI E (‘|X ) = 0 (exogeneity)I Var (‘|X ) = ‡2

I (“sphericity” : homoscedasticity & no autoc.)I In R, models are usually fitted by calling a cmd

I For the LRM in cross-section : fm <- lm(formula, data,...)I Argument ... replace a series of arguments

I describing the modelI or choosing the computation mode (algorithm)I or options

Page 37: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Basic Regression Commands in R

I The lm cmd returns an object

I Here : the fitted model under the name fmI Maybe visualised in many ways or summarized

I The lm object can be used to compute :I Predictions & fitted values, residuals, ... by means of fm$... see

RAE2017I Tests & several postestimations diagnostics

I Most estimation commands work the same way

Page 38: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

SWIRLI Do Lessons 1-6, course « Regression Models » in Swirl

I The others : laterI Concentrate on code, you know the econometricsI Think of closing files that may have remained opened from the

previous session

1. “Introduction”I To remember “A coe�cient will be within 2 standard errors of

its estimate about 95% of the time”2. “Residuals” is + di�cult (reading + programming +concepts)

I Explains loopsI Forces to re-read previous cmdsI Make sure to execute program res_eqn.r when it shows up

Page 39: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

SWIRL

3. “Least Squares Estimation” – nothing in particular4. Introduction to Multivariable Regression

I Install manipulate previouslyI I am not sure of the stability of this lessonI Do not edit the function myplot which will show upI ! cor(gpa_nor, gch_nor) will be ”= ˆ—, SWIRL expects =, so a

bug5. “Residual Variation”

I “Gaussian elimination” shows that a k-regressors regressionI may be seen as a succession of k 1-regressor regressionsI DO NOT interpret this as model building or presentation or a

way to select results

6. “MultiVar Examples” – nothing in particular

Page 40: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Multivariare Linear Regression w/ Factors

I The purpose of this example is to demonstrate various R toolsI that are used to transform & combine regressors

I Dataframe : cps1988 as beforeI SWIRL Course « Regression Models »

I lesson 7 : “MultiVar Examples2”I Plots window for BoxPlotI sapply : use help in help window

Page 41: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Wage EquationI Wage Equation

log (wage) = —1

+—2

exp+—3

exp

2+—4

education+—5

ethnicity+‘

cps_lm<-lm(log(wage)~experience+I(experience^2)+education+ethnicity, data=CPS1988)

I “Insulation function” I( )I indicates to R that ^2 be understood as the square of exp

I otherwise, R is unsure of the meaning and withdrawsexperience^2

I This might be clearer w/ a formula y ~ a + (b+c)I Are there 2 variables on the RHS of the formula : a et (b+c),

or are there 3 ?I To clarify, write y ~ a + I(b+c)

Page 42: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Results & Testing

I summary(cps_lm)I The return of education (to the wage) is 8.57%/year

I % interpretation because wage is in log modelI Categorical variables are managed by R

I that selects the reference cat.I Compare Nested Models : Anova (Analysis of Variance) Table

I Regression + constraintI cps_noeth<-lm(log(wage)~experience+

I(experience^2)+education, data=CPS1988)I Usually, the test is on + than one variable

I anova(cps_noeth,cps_lm)

Page 43: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Interactions : e�ects of combined regressorsI e.g. in labor econ : the combined e�ect of education &

ethnicityI Does one year of Education have the same return for di�erent

ethnicities ?I This is modeled w/ multiplicative terms

I Consider

log (wage) = —1

+—2

ethnicity+—3

ethnicity◊education+—4

education+‘

I Then ˆ log (wage) /ˆeducation = —3

ethinicity + —4

I If ethinicity = 0, then the e�ect of 1 year of education is —4

I If ethinicity = 1, then the e�ect of 1 year of education is—

3

+ —4

I Let a, b, c three factorsI so that each has several discrete levels

I and x, y two continuous variables (quantitative)

Page 44: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Several Models/Formulas with Interactions

I y~a+x : no interactionI A single slope (of x) but one intercept for each level of factor a

I y~a*x : same as previous model +I one interaction term for each level of a with x (di�erent slopes)I In a more formal notation, let dai = I (a = i) :

[y ≥ a ú x ] ©C

y = —aiÿ

idai + “aix

ÿ

idai

D

Page 45: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Formulas with Interactions

I y~(a+b+c)^2I models all the interactions at 2 variables

I but not at 3I So this is like as many dichotomous var. as the nbr of levels

dai≠bj = I (a = i · b = j) for a & bI and similarly for a & c and for c & b

I SWIRL course Regression ModelsI Lesson 8 : MultiVar Examples3

Page 46: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Interactions Wage eq. : ethnicity & education

I cps_int<-lm(log(wage)~experience+I(experience^2)+education*ethnicity, data=CPS1988)

I Only one of the “+” from cps_lm has been replaced by *I coeftest(cps_int)

I A + compact version of summary( )I That can also be used on some other regression cmds

I The regression outputs the e�ects of education & ethnicityI called “main e�ects”I and the product of education & an indicator for the level

“afam” of ethnicityI Why afam ? Probably because it is less numerous than cauc

Page 47: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

Interactions Wage eq. : ethnicity & education

I afam has a neg. e�ect on the interceptI lower average wage for african-americanI AND on the slope of education

I lower return of education for african-americanI The e�ect is not much significant though

I since a 5% significance with a sample of nearly 30 000individuals is not much convincing

Page 48: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Linear Regressions

PredictionsI First define the values for which you want to predict.

I We simplify the model to exp. & educ. for ease of presentationI Let’s say we want to show the e�ect of Exp. at an average

level of Educ.I Create a new data frame w/ a C of average Educ & a C of all

the possible values of ExpI Note that in the Census, some people have negative

experience !I This is due to the way we compute Exp.

I Use a predict( ) cmd onI the lm object of interest : cps_lm hereI the new data set for which we want prediction : cps2 here

I predict( ) can not only gives a prediction but also boundsI Plot that on the data

I detach(CPS1988) when you are done to avoid confusion

Page 49: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

Page 50: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

When building a model, there are 2 contradictory forces

I If we omit a regressor, and it is in fact relevantI unobserved heterogeneity & inconsistency of LS estimatorsI we sometimes can deal w/ that using instruments or panel

I If we include irrelevant regressor that are correlated w/relevant ones

I we create multicollinearity w/ the csqce that both relevant &irrelevant regressors may appear non-signif.

I That may even occur w/ 2 relevant regressors, e.g. in aQuantity-Price relation, the price of the subtitutes goods arerelevant, but may be correlated w/ own price

Page 51: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

Collinearity – Endogeneity Trade-o�

I From a statistical point of view, 2 collinear variables carry thesame information

I Their separate influence on the dependant variable cannot beassessed in the present sample

I Be pragmatic : reject one of the 2 or merge them in some waythat makes sense in context

I It is not really possible to escape such a trade-o�I Especially since in a particular sample, a relevant regressor may

coincidentally appear non significant (if the sample is not large)I Theory does not help by nature

I since an empirical model is a trial of a modelI theory helps interpreting results, not guide them

Page 52: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

Progressive Inclusion

I is an old way of looking at model building

1. Among potential regressors x , take the one w/ highestcorrelation w/ y

2. Regress y on that single regressorI Is it significant ?

I No : you don’t have a modelI Yes : estimate the one-regressor model & compute its residuals

3. Among the remaining regressors, take the one w/ highestcorrelation w/ the residuals

4. Repeat previous steps with progressively more regressorsI Until one that is non-significant

Page 53: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

Progressive Inclusion

I The issue w/ this approach is that if there is several relevantregressors

I then at least the first step might be inconsistentI because at least one relevant regressor is missing

I This is a very serious issue that leads to non-sensical results

Page 54: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

Progressive Elimination

I Instead, consider the “largest reasonnable set of regressors”I can be linked to the theory you want to test or to previous

experienceI It is risky to just run this “encompassing” regression and

report the resultsI because of multicollinearity

Page 55: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

Progressive EliminationI Gradually remove regressors one by one

I Examine how the estimates of the remaining regressors evolveI If there is a noticeable increase in significance

I but not so much change in estimatesI collinearity was an issue

I If estimated coe�cients change wildlyI omitted regressor endogeneity

I HoweverI dropping collinear regressor could lead to jumps in coef

estimatesI after all, collinearity a�ects their variance

I dropping a relevant regressor does not necessarily lead tomajor changes in the other coef

I when that regressor is not much correlated to the others

Page 56: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Discussing Regressors and Model Building

Summing upI Model yi = —

0

+ —1

x

1i + —2

x

2i + ‘i (no missing relevantregressor)

I estimation by MCO when x

2

and x

1

are correlatedI if they are not, there is NO serious consequences for ˆ—

1

I “not relevant but correlated to a relevant regressor” might notbe empirically common

x

2

Consequences on —̂1

on —̂2

relevant

included May appear insignificantnot incl. Inconsistent –

not relev.

included May appear insignificant should æ 0not incl. ? ? –

Page 57: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Document Edition Functionalities

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

Page 58: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Document Edition Functionalities

Writing with R

I A few packages are designed to use R to write reports directly

1. The text is written directly in the script in the Editor windowI Math formulas in latex may be includedI Of course, R commands (graphics, regressions...)

2. If the data change, or the model, everything is adjustedautomatically

3. LATEX helps choose an appropriate formatI report, paper, presentation

Page 59: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Document Edition Functionalities

SWeave – Knitr – Markdown

ISWeave simply send the whole script to LATEX

Iknitr does the same but combine other packages and solvesome issues in SWeave

IMarkdown is the current standard

I The script is directly printed using LATEX or .doc (Word) orhtml (webpage)

I Self-teach (I won’t look into it)I http ://rmarkdown.rstudio.com/lesson-1.htmlI https ://www.r-bloggers.com/how-to-create-reports-with-r-

markdown-in-rstudio/

Page 60: Research in Applied Econometrics Chapter 1. R - UDLrisques-environnement.universite-lyon.fr/IMG/pdf/rae_1_r.pdf · Research in Applied Econometrics Chapter 1. R ... I dataframes,

Research in Applied Econometrics Chapter 1. R

Document Edition Functionalities

Should we sum up ?

I Anything ?