31
Factorisation en matrices (semi-)non-négatives : problématique, applications et tendances Matthieu PUIGT Travail collaboratif incluant au LISIC G. ROUSSEL G. DELMAIRE M. OMIDVAR A. LIMEM C. DORFFER R. CHREIKY F. YAHAYA Univ. Littoral Côte d’Opale, EA 4491 – LISIC, F-62228 Calais, France [email protected] http://www-lisic.univ-littoral.fr/~puigt/ 18 juin 2018 M. Puigt (Semi-)NMF: problem, applications, and trends 1

Factorisation en matrices (semi-)non-négatives

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Factorisation en matrices (semi-)non-négatives

Factorisation en matrices (semi-)non-négatives :problématique, applications et tendances

Matthieu PUIGT

Travail collaboratif incluant au LISIC

G. ROUSSEL G. DELMAIRE M. OMIDVARA. LIMEM C. DORFFER R. CHREIKY F. YAHAYA

Univ. Littoral Côte d’Opale, EA 4491 – LISIC, F-62228 Calais, France

[email protected]

http://www-lisic.univ-littoral.fr/~puigt/

18 juin 2018

M. Puigt (Semi-)NMF: problem, applications, and trends 1

Page 2: Factorisation en matrices (semi-)non-négatives

Foreword: matrix factorizationMany problems in machine learning and in signal/image processing can berewritten as a system of equations X ≈ A · S :

X A S

≈ ·

M. Puigt (Semi-)NMF: problem, applications, and trends 2

Page 3: Factorisation en matrices (semi-)non-négatives

Foreword: matrix factorizationMany problems in machine learning and in signal/image processing can berewritten as a system of equations X ≈ A · S :

X A S

≈ ·

source localization/separationX observed data matrix, A mixing matrix, S source matrix

If A estimated (and sensor array geometry is known)

î mixture estimation (source localization )

If S estimated (and sensor array geometry is known)

î source separation (beamforming )

M. Puigt (Semi-)NMF: problem, applications, and trends 2

Page 4: Factorisation en matrices (semi-)non-négatives

Foreword: matrix factorizationMany problems in machine learning and in signal/image processing can berewritten as a system of equations X ≈ A · S :

X A S

≈ ·

source localization/separationX observed data matrix, A mixing matrix, S source matrix

Is source separation a Signal Processing or an AI topic? Ask Google(Google AI blog submitted to SIGGRAPH’18).

M. Puigt (Semi-)NMF: problem, applications, and trends 2

Page 5: Factorisation en matrices (semi-)non-négatives

Foreword: matrix factorizationMany problems in machine learning and in signal/image processing can berewritten as a system of equations X ≈ A · S :

X A S

≈ ·

Low-rank matrix approximationX noisy matrix (possibly with missing entries), A weight matrix, S latentvariable matrix

Topic modeling (e.g., Google News)

Collaborative filtering (e.g., Netflix Prize)

Graph analysis (after a transformation into a matrix – e.g., yellow cab /velib network)

Mobile sensor calibration (Ph.D. theses of Clément Dorffer & FaroukYahaya)

M. Puigt (Semi-)NMF: problem, applications, and trends 2

Page 6: Factorisation en matrices (semi-)non-négatives

NMF: why is it so popular?

In many problems, matrices X ≈ A · S are non-negativechemical source separationhyperspectral imagerymobile sensor calibration

î Non-negativity on A and/or S yield better interpretability

NMF applied to face dataset (source: Lee & Seung, 1999)

M. Puigt (Semi-)NMF: problem, applications, and trends 3

Page 7: Factorisation en matrices (semi-)non-négatives

NMF: why is it so popular?

In many problems, matrices X ≈ A · S are non-negativechemical source separationhyperspectral imagerymobile sensor calibration

î Non-negativity on A and/or S yield better interpretability

Principal Component Analysis applied to face dataset (source: Lee & Seung, 1999)

M. Puigt (Semi-)NMF: problem, applications, and trends 3

Page 8: Factorisation en matrices (semi-)non-négatives

Application (1)Separation of pollutant sources (Limem et al., 2012–2014, Chreiky et al., 2015–2017)

Collaborative work with UCEIV (D. Courcot, F. Ledoux)

Observation of air-suspended fine particulate matter over time

Particles are analyzed by chemists who provide chemical concentrationsof each species (e.g., iron, zinc, calcium, sodium, etc.) in ng/m3

î Information stored in X ' A · SObserved data are mixtures of “chemical profiles" (=chemical signatureswhich are specific to each emission source – factory chimney, exhaust,marine particles, etc – lines of S)

We proposed Informed NMF : using partial expert’s (in S) andmeteorological knowledge (informing A) to improve the separationenhancement

Application on Dunkirk air quality

M. Puigt (Semi-)NMF: problem, applications, and trends 4

Page 9: Factorisation en matrices (semi-)non-négatives

Application (1)Separation of pollutant sources (Limem et al., 2012–2014, Chreiky et al., 2015–2017)

On estimated source profile (from S)

M. Puigt (Semi-)NMF: problem, applications, and trends 4

Page 10: Factorisation en matrices (semi-)non-négatives

Application (1)Separation of pollutant sources (Limem et al., 2012–2014, Chreiky et al., 2015–2017)

Global origin of sensed PM with respect to time (A)

M. Puigt (Semi-)NMF: problem, applications, and trends 4

Page 11: Factorisation en matrices (semi-)non-négatives

Application (1)Separation of pollutant sources (Limem et al., 2012–2014, Chreiky et al., 2015–2017)

Contribution of iron source in PM (from A)

M. Puigt (Semi-)NMF: problem, applications, and trends 4

Page 12: Factorisation en matrices (semi-)non-négatives

Application (2)Mobile sensor calibration (Dorffer et al., 2015–2018)

Sensed phenomenon =⇒ voltageVoltage =⇒ phenomenon?

Sensor calibration neededNot always physically possible

î Blind sensor calibration

M. Puigt (Semi-)NMF: problem, applications, and trends 5

Page 13: Factorisation en matrices (semi-)non-négatives

Application (2)Mobile sensor calibration (Dorffer et al., 2015–2018)

Sensed phenomenon =⇒ voltageVoltage =⇒ phenomenon?

Sensor calibration neededNot always physically possible

î Blind sensor calibration

M. Puigt (Semi-)NMF: problem, applications, and trends 5

Page 14: Factorisation en matrices (semi-)non-négatives

Application (2)Definitions

A rendezvous is a temporal and spatial vicinity between two sensors(Saukh et al., 2013).

A scene S is a discretized area observed during a time interval[t , t + ∆t). A spatial pixel has a size lower than ∆d , where ∆t and ∆ddefine the vicinity of the rendezvous.

Sensor 1

Sensor 2

Sensor 3

Rendezvous

Scene S

stacking

Column

Observed matrix X

Sensors

Spatialsamples

M. Puigt (Semi-)NMF: problem, applications, and trends 6

Page 15: Factorisation en matrices (semi-)non-négatives

Application (2)Factorization

Blind calibration revisited as a weighted NMF problem (affine model)

W ◦

x(1, 1) · · · x(1,m − 1) y(1)...

......

x(n, 1) · · · x(n,m − 1) y(n)

︸ ︷︷ ︸

X

' W ◦

y(1) 1

......

y(n) 1

︸ ︷︷ ︸

A

·[α1 α2 · · · αm−1 1β1 β2 · · · βm−1 0

]︸ ︷︷ ︸

S

We proposed methods:

adding information on A (sparse assumptions) and S (sensorinformation)

handling more complex calibration models (e.g., nonlinear)

Estimation of S

0 20 40 60 80 9510−16

10−12

10−8

10−4

100

Missing Value (%)

RMSE

IN-CalSaukh et al.

0 20 40 60 80 9510−2

10−1

100

101

Missing Value (%)

RMSE

IN-CalSaukh et al.

M. Puigt (Semi-)NMF: problem, applications, and trends 7

Page 16: Factorisation en matrices (semi-)non-négatives

Application (2)Reconstruction accuracy

Estimation of ATheoretical scene

x coordinate

ycoordinate

20 40 60 80 100

20

40

60

80

100

First informed NMF (IN-Cal)

x coordinate

ycoordinate

20 40 60 80 100

20

40

60

80

100

M. Puigt (Semi-)NMF: problem, applications, and trends 8

Page 17: Factorisation en matrices (semi-)non-négatives

Application (2)Reconstruction accuracy

Estimation of ATheoretical scene

x coordinate

ycoordinate

20 40 60 80 100

20

40

60

80

100

+ using sparse assumptions on A(SpIN-Cal)

x coordinate

ycoordinate

20 40 60 80 100

20

40

60

80

100

M. Puigt (Semi-)NMF: problem, applications, and trends 8

Page 18: Factorisation en matrices (semi-)non-négatives

The future of NMF... in LISIC

We now briefly discuss some of our theoretical work directions for the next 5years:

1 being fast to process the data deluge (Ph.D. thesis of F. Yahaya)

2 Not a public information (submitted ANR)3 Not a public information (starting work by G. Delmaire, G. Roussel, D.

Courcot et al.)4 Not a public information (starting work with S. Verel, G. Roussel et al.)5 Not a public information (starting work with G. Delmaire, G. Roussel et

al.)

M. Puigt (Semi-)NMF: problem, applications, and trends 9

Page 19: Factorisation en matrices (semi-)non-négatives

The future of NMF... in LISIC

We now briefly discuss some of our theoretical work directions for the next 5years:

1 being fast to process the data deluge (Ph.D. thesis of F. Yahaya)2 Not a public information (submitted ANR)

3 Not a public information (starting work by G. Delmaire, G. Roussel, D.Courcot et al.)

4 Not a public information (starting work with S. Verel, G. Roussel et al.)5 Not a public information (starting work with G. Delmaire, G. Roussel et

al.)

M. Puigt (Semi-)NMF: problem, applications, and trends 9

Page 20: Factorisation en matrices (semi-)non-négatives

The future of NMF... in LISIC

We now briefly discuss some of our theoretical work directions for the next 5years:

1 being fast to process the data deluge (Ph.D. thesis of F. Yahaya)2 Not a public information (submitted ANR)3 Not a public information (starting work by G. Delmaire, G. Roussel, D.

Courcot et al.)

4 Not a public information (starting work with S. Verel, G. Roussel et al.)5 Not a public information (starting work with G. Delmaire, G. Roussel et

al.)

M. Puigt (Semi-)NMF: problem, applications, and trends 9

Page 21: Factorisation en matrices (semi-)non-négatives

The future of NMF... in LISIC

We now briefly discuss some of our theoretical work directions for the next 5years:

1 being fast to process the data deluge (Ph.D. thesis of F. Yahaya)2 Not a public information (submitted ANR)3 Not a public information (starting work by G. Delmaire, G. Roussel, D.

Courcot et al.)4 Not a public information (starting work with S. Verel, G. Roussel et al.)

5 Not a public information (starting work with G. Delmaire, G. Roussel etal.)

M. Puigt (Semi-)NMF: problem, applications, and trends 9

Page 22: Factorisation en matrices (semi-)non-négatives

The future of NMF... in LISIC

We now briefly discuss some of our theoretical work directions for the next 5years:

1 being fast to process the data deluge (Ph.D. thesis of F. Yahaya)2 Not a public information (submitted ANR)3 Not a public information (starting work by G. Delmaire, G. Roussel, D.

Courcot et al.)4 Not a public information (starting work with S. Verel, G. Roussel et al.)5 Not a public information (starting work with G. Delmaire, G. Roussel et

al.)

M. Puigt (Semi-)NMF: problem, applications, and trends 9

Page 23: Factorisation en matrices (semi-)non-négatives

The future of NMF... in LISIC

We now briefly discuss some of our theoretical work directions for the next 5years:

1 being fast to process the data deluge (Ph.D. thesis of F. Yahaya)2 Not a public information (submitted ANR)3 Not a public information (starting work by G. Delmaire, G. Roussel, D.

Courcot et al.)4 Not a public information (starting work with S. Verel, G. Roussel et al.)5 Not a public information (starting work with G. Delmaire, G. Roussel et

al.)

M. Puigt (Semi-)NMF: problem, applications, and trends 9

Page 24: Factorisation en matrices (semi-)non-négatives

Being fast to process the data deluge

Historical NMF techniques andtheir extensions (including some ofours)

Using slow techniques for theupdate rules (MU)

î

Fastening NMF

Distributed computing (e.g.,Liu et al., 2010)

Online factorization (e.g.,Mairal et al., 2010)

Fast solver (e.g., Guan et al.,2012, Dorffer et al., 2017)

Randomized strategies (e.g.,Zhou et al., 2012 or Tepper &Sapiro, 2016,

Yahaya et al.,submitted)

M. Puigt (Semi-)NMF: problem, applications, and trends 10

Page 25: Factorisation en matrices (semi-)non-négatives

Being fast to process the data deluge

Historical NMF techniques andtheir extensions (including some ofours)

Using slow techniques for theupdate rules (MU)

î

Fastening NMF

Distributed computing (e.g.,Liu et al., 2010)

Online factorization (e.g.,Mairal et al., 2010)

Fast solver (e.g., Guan et al.,2012, Dorffer et al., 2017)

Randomized strategies (e.g.,Zhou et al., 2012 or Tepper &Sapiro, 2016,

Yahaya et al.,submitted)

M. Puigt (Semi-)NMF: problem, applications, and trends 10

Page 26: Factorisation en matrices (semi-)non-négatives

Being fast to process the data deluge

Historical NMF techniques andtheir extensions (including some ofours)

Using slow techniques for theupdate rules (MU)

î

Fastening NMF

Distributed computing (e.g.,Liu et al., 2010)

Online factorization (e.g.,Mairal et al., 2010)

Fast solver (e.g., Guan et al.,2012, Dorffer et al., 2017)

Randomized strategies (e.g.,Zhou et al., 2012 or Tepper &Sapiro, 2016, Yahaya et al.,submitted)

M. Puigt (Semi-)NMF: problem, applications, and trends 10

Page 27: Factorisation en matrices (semi-)non-négatives

Randomized strategies for NMFKey idea

Compress some matrices in the NMF problem in order to reduce its size

Compression based on Random Projections (preserving pairwisedistances)

X A S

≈ ·

M. Puigt (Semi-)NMF: problem, applications, and trends 11

Page 28: Factorisation en matrices (semi-)non-négatives

Randomized strategies for NMFKey idea

Compress some matrices in the NMF problem in order to reduce its size

Compression based on Random Projections (preserving pairwisedistances)

M. Puigt (Semi-)NMF: problem, applications, and trends 11

Page 29: Factorisation en matrices (semi-)non-négatives

Randomized strategies for NMFKey idea

Compress some matrices in the NMF problem in order to reduce its size

Compression based on Random Projections (preserving pairwisedistances)

Reconstruction of rank-15 matrices of size 500×500 (left), 5000×5000 (mid-dle) and 10000× 10000 (right). Perf criterion:

RRE =‖X − AS‖F‖X‖F

0 5 10 1510

−6

10−4

10−2

100

CPU time (s)

RRE

0 5 10 1510

−5

100

CPU time (s)

RRE

0 5 10 1510

−4

10−3

10−2

10−1

100

CPU time (s)

RRE

In blue: the fast NeNMF method

In red: our proposed faster-than-fast randomized NeNMF

M. Puigt (Semi-)NMF: problem, applications, and trends 11

Page 30: Factorisation en matrices (semi-)non-négatives

Randomized strategies for NMFKey idea

Compress some matrices in the NMF problem in order to reduce its size

Compression based on Random Projections (preserving pairwisedistances)

Reconstruction of rank-15 matrices of size 500×500 (left), 5000×5000 (mid-dle) and 10000× 10000 (right). Perf criterion:

RRE =‖X − AS‖F‖X‖F

0 0.1 0.2 0.3 0.4 0.510

−5

100

0 5 10 1510

−5

100

CPU time (s)

RRE

0 5 10 1510

−4

10−3

10−2

10−1

100

CPU time (s)

RRE

In blue: the fast NeNMF method

In red: our proposed faster-than-fast randomized NeNMF

M. Puigt (Semi-)NMF: problem, applications, and trends 11

Page 31: Factorisation en matrices (semi-)non-négatives

Conclusion

NMF = powerful tool for signal, image processing and machine learning

Good expertise in LISIC, with methodological developments

Still many ideas and room for innovative work

Questions?

M. Puigt (Semi-)NMF: problem, applications, and trends 12