MAP 553 Apprentissage statistiquegiraud/MAP55… · 0.03 0.04 0.05 variable zeta^2 zeta^2 Figure:Valeurs de 2 i. En bleu: seuil a ˝ 2 = 2˙2=n. En rouge: seuil a ˝2 = 2˙2 log(p)=n

MAP 553Apprentissage statistique

Christophe Giraud

CMAP, Ecole Polytechnique

PC6

1/25

Christophe Giraud MAP 553 Apprentissage statistique

1 Seuillage dur et Cp de Mallows

2 Estimateur Lasso et seuillage doux

3 Analyse Lineaire Discriminante

2/25


Seuillage dur

3/25


Regression lineaire

Modele: y = Xθ + ξavec

observations: y ∈ Rn

design: X matrice n × p connue (et fixe)

parametre: θ ∈ Rp a estimer

bruit: ξ ∼ N (0, σ2In)

Exemple:Xi ,j = ϕj(xi ) avec xi = i/n et ϕj la base trigonometrique.

4/25


0.0 0.2 0.4 0.6 0.8 1.0

-3-2

-10

12

3

observations

x

y

5/25


Hypothese (ORT)

Design orthogonal: 1nXTX = Ip

Transformation des donnees:

z =1

nXT y = θ +

1

nXT ξ︸︷︷︸= ζ

avec ζ ∼ N (0, σ2

n Ip).

6/25


Notation

Pour m ⊂ {1, . . . , p}, on note θm l’estimateur defini par

(θm)j = zj1j∈m, pour j = 1, . . . , p.

Autrement dit: θm = Projvect{ej , j∈m}(z) avec (e1, . . . , ep) basecanonique de Rp.

Estimateur ”progressif”

Estimateurs comme a la PC5: θ{1,...,M}, M = 1, . . . , p.

Estimateur progressif: θprog = θ{1,...,M} avec M minimisant

Cp

(θ{1,...,M}

)= |z − θ{1,...,M}|22 +

2σ2M

n.

7/25


0.0 0.2 0.4 0.6 0.8 1.0

-3-2

-10

12

3

pour differentes valeurs de M

x

y

M=4M=17M=40

8/25


0 10 20 30 40 50 60

80

100

120

140

160

180

200

le critere Cp de Mallows

M

Cp

0.0 0.2 0.4 0.6 0.8 1.0

-3-2

-10

12

3

estimateur progressif

x

y

signalprogressif

9/25


Seuillage dur

Estimateur par seuillage dur: θH defini pour τ > 0 par

θHj = zj1|zj |>τ , pour j = 1, . . . , p.

Quel avantage de θH sur θprog ?

10/25


0.0 0.2 0.4 0.6 0.8 1.0

-3-2

-10

12

3

progressif versus seuillage dur

x

y

signalseuillage durprogressif

11/25


Quel niveau de seuillage?

Le choix τ2 = 2σ2/n ne convient pas!

Il faut prendre τ2 = 2σ2 log(p)/n ou plus grand.

0.0 0.2 0.4 0.6 0.8 1.0

-3-2

-10

12

3different niveaux de seuillage dur

y

signalseuillage à sqrt(2/n)seuillage à sqrt(2*log(p)/n)

Figure: En bleu: τ 2 = 2σ2/n. En rouge: τ 2 = 2σ2 log(p)/n.

12/25


Explication:

il faut seuiller a τ =√

2σ2 log(p)/n pour ne pas prendre trop de”bruit”.

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.01

0.02

0.03

0.04

0.05

variable zeta^2

zeta^2

Figure: Valeurs de ζ2i . En bleu: seuil a τ 2 = 2σ2/n.

En rouge: seuil a τ 2 = 2σ2 log(p)/n. 13/25


Seuillage doux et Lasso

14/25


Estimateur Lasso

Defini pour τ > 0 par:

θL = arg minθ∈Rp

{1

n|y − Xθ|22 + 2τ

p∑j=1

|θj |}. (1)

Design orthogonal

Sous l’hypothese (ORT), le probleme (1) est equivalent a


{ p∑j=1

(zj − θj)2 + 2τ

p∑j=1

|θj |}.

avec z = 1nXT y .

15/25


Estimateur Lasso avec Design Orthogonal

Sous l’hypothese (ORT), l’estimateur Lasso


{1

n|y − Xθ|22 + 2τ

p∑j=1

|θj |}.

est donne par

θLj = zj

(1− τ

|zj |

)+

, j = 1, . . . , p

ou (x)+ = max(x , 0) et z = 1nXT y .

16/25


0.0 0.2 0.4 0.6 0.8 1.0

-3-2

-10

12

3

seuillage doux et seuillage dur

y

signalseuillage durseuillage doux

Figure: Seuillage dur (en bleu) et seuillage doux (en rouge) avecτ 2 = 2σ2 log(p)/n

17/25


Design non-orthogonal

Cadre non-orthogonal: lorsque (ORT) n’est pas verifiee, leprobleme d’optimisation

minθ∈Rp

{1

n|y − Xθ|22 + τ2

p∑j=1

1{θj 6=0}

}est NP-hard en general alors que le probleme

minθ∈Rp

{1

n|y − Xθ|22 + 2τ

p∑j=1

|θj |}

est un probleme d’optimisation convexe pour lequel il existe desalgorithmes efficaces.

18/25


Classification

19/25


Donnees:

points Xi ∈ Rp avec label Yi ∈ {0, 1} pour i = 1, . . . , n.

0

0

0

0

0

0 00

00

00

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

0

-4 -2 0 2 4

-6-4

-20

24

6

1

1

111

1

1 1

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

111

1

1

11

x ?

Objectif: predire la classe d’un nouveau point x .20/25


Cadre statistique

On supposera les (Xi ,Yi ) i.i.d. avec

P(Yi = k) = πk , pour k = 0, 1

Loi(Xi |Yi = k) = gk(x) dx , pour k = 0, 1.

Classifieur de Bayes

Le classifieur h∗ : Rp → {0, 1} defini par

h∗(x) = 1{π1g1(x)>π0g0(x)}

verifieP(h∗(X ) 6= Y ) = min

hP(h(X ) 6= Y ).

21/25


Preuve:

L’egalite min(π0g0, π1g1) = (π0g01h∗=1 + π1g11h∗=0) donne

P(h(X ) 6= Y ) = π0P(h(X ) = 1|Y = 0) + π1P(h(X ) = 0|Y = 1)

=

∫π0g01h=1 +

∫π1g11h=0

≥∫

(π0g01h∗=1 + π1g11h∗=0)(1h=1 + 1h=0)

≥∫π0g01h∗=1 +

∫π1g11h∗=0︸︷︷︸

= P(h∗(X )6=Y )

.

�

22/25


Cadre gaussien

Loi(Xi |Yi = k) = N (µk ,Σk), pour k = 0, 1

c’est a dire

gk(x) = (2π)−p/2√

det(Σ−1k ) exp

(−1

2(x − µk)T Σ−1

k (x − µk)

).

Cas ou Σ0 = Σ1 = Σ

Lorsque Σ0 = Σ1 = Σ on a

h∗(x) = 1 ⇐⇒ (µ1 − µ0)T Σ−1

(x − µ1 + µ0

2

)> log(π0/π1).

23/25


0

0

0

0

0

0 00

00

00

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

0

-4 -2 0 2 4

-6-4

-20

24

6

Frontière du classifieur de Bayes

1

1

111

1

1 1

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

111

1

1

11

xh*(x)=0

Cas ou Σ1 = Σ0 = Σ

24/25


ACP versus ALD pour reduire la dimension

Exemple:

X = mesure de la composition chimique (55 composes) den = 162 souches de E. coli.Y = souche commensale ou pathogene

-1 0 1 2 3

-1.0

-0.5

0.0

0.5

1.0

1.5

ACP

ComExPECInPEC

-4 -2 0 2 4

-4-2

02

4

ALD

InPECInPEC

InPEC

ExPEC

InPEC

ExPEC

InPEC

InPEC

InPECInPEC

InPEC

InPEC

InPECInPEC

ExPEC

ExPEC InPEC

InPEC

ExPEC

ExPEC

ExPECExPEC

ExPECExPECExPEC

ExPEC

ExPEC

ExPEC

InPEC

ExPEC

ExPEC

InPECInPEC

InPECInPEC

InPEC

InPEC

InPEC

InPEC

InPEC

InPEC

InPEC

InPECInPEC

InPEC

ExPEC

ExPEC

ExPECCom

InPEC

InPECInPEC

InPEC

ComCom

ExPECExPEC

ExPEC

ExPECExPEC

ExPEC

ExPEC

ExPEC

InPEC

ComCom

Com

ComCom

Com

ComComCom

Com

Com

Com

ComCom

ComCom

ComComCom

Com

Com

Com

Com

ComComCom

ComCom

Com

Com

ComComCom

Com

Com

ComCom

Com

Com

ComCom

Com

ComCom

Com ComComCom

Com

Com

ComCom

Com Com

Com

Com

Com

ComCom

ComCom

Com

Com

Com

ComCom

ComCom

Com

Com

Com

Com

Com

Com

Com

ComCom

Com

Com

ComCom

Com

Com

Com

Com

ComCom

Com

Com Com

ComCom

Com

InPEC

ExPECExPEC

InPEC

ExPEC

25/25


Documents

MAP 553 Apprentissage statistiquegiraud/MAP55… · 0.03 0.04 0.05 variable zeta^2 zeta^2 Figure:Valeurs de 2 i. En bleu: seuil a ˝ 2 = 2˙2=n. En rouge: seuil a ˝2 = 2˙2 log(p)=n