Soumission du manuscrit prévue pour fin 2013 « Exploration temps …pagesperso.univ-brest.fr/~ailliot/SlidesBerder/Saulquin.pdf · 2013-07-12 · I - Analyse temps-fréquence et

Saulquin Bertrand, ACRI-ST Antoine Mangin, ACRI-ST

Grégoire Mercier, Telecom Bretagne Ronan Fablet, Telecom Bretagne

« Exploration temps-fréquence des séries

temporelles géophysiques, des pistes de

valorisation ?»

Soumission du manuscrit prévue pour fin 2013

Page 2

Domaines d’activités concernés:

Démélangeage de signaux => +++ Recherche, +++ Applications

Etude du climat => +++ Recherche, + Applications

Retours vers les capteurs, optimisation de réseaux de surveillance=> +Recherche, +++

Applications

Apprentissage / Modélisation / Prévision => +Recherche, +++ Applications

Analyse des séries temporelles, les enjeux

Page 3

Saulquin B., Fablet R., Mangin A., Mercier G., Fanton d'Andon O. “Multi-scale

event-based mining in geophysical time series: an application to the satellite-

derived Sea Surface Temperature from 1985 to 2009”, submitted to IEEE in

February 2013.

I - Analyse temps-fréquence et démélangeage de signaux

Page 4

Constat:

Les méthodes généralement utilisées (EOF – PCA) pour étudier les séries temporelles nécéssitent:

- Un signal stationnaire et une distribution gaussienne de la variable étudiée (en gros pas d’extrêmes et

pas de tendance).

- La notion de significativité d’un paramètre statistique est souvent peu ou pas traitée (niveaux de

corrélation significatifs …).

Nous proposons dans cet article une nouvelle approche, basée sur l’exploitation des décompositions en ondelettes,

pour aborder l’exploitation des séries temporelles basée sur:

- l’extraction d’événements significatifs par rapport au bruit local.

- la caractérisation des évènements à l’aide d’indices synthétiques.

- l’exploitation d’une base de données évènements à l’aide d’outils statistiques avancés.

Pour illustrer notre approche, nous appliquons notre approche à la température de surface (SST) de 1985-2009


Page 5

Le concept de la représentation des séries temporelles à partir d’évènements significatifs par rapport à un bruit auto-

corrélé local.

Spectre d’énergie théorique (Fourier) d’un bruit coloré


Nino

Seuil de significativité en énergie

Transformée en ondelettes

Page 6

Chaque évènement est caractérisé par les indicateurs suivants:

- S, T, dS, dT.

- Son énergie

- Les axes de l’élipse (informations de propagation)

Une série temporelle 2D (de dimension n) est alors représentée à l’aide de n_events * n_paramètres

Avantages:

Une meilleure discrétisation de l’information significative en temps et en échelle.

Des nouveaux outils pour étudier les liens de causalités entre les phénomènes.

Inconvénient:

Complexité accrue nécessitant des outils statistiques complexes pour explorer les bases de

données (multi-paramètres, estimations complexes des paramètres)


Page 7

Un exemple simple d’analyse de la base de données evts: extraction des échelles caractéristiques des

anomalies de la température de 1989 à 2009:

Distribution en échelle de 32000 séries temporelles Extraction automatique des échelles caractéristiques


Mixture de gaussiennes

Page 8

Application I : L’analyse de la distribution spatiale des échelles caractéristiques

Intérêt à discrétiser en échelles


< 4 mois

3.36 y

MEI

Page 9

Intérêt à discrétiser en échelles


3;36 y

Application I : L’analyse de la distribution spatiale des échelles caractéristiques

=1.41y

Page 10


Intérêt à discrétiser en temps

Application I : L’analyse de la distribution temporelle des échelles caractéristiques

Page 11

Application III, la prédiction de El Nino

We identified a specific distribution of the density

of the high frequency SSTA events related to the

ENSO regimes showing an increase of 10%

during El Niño periods and the PDF of the time

shifts between the starting times of the high

frequency and the low frequency events at 3.36

year-scale showed an increase of 20% of the

number of high frequency events 12 months in

advance in the Pacific, and an increase of 13%

10 months advance in in the Indian Ocean.

These three specific indexes underlie the added

value of the proposed event-based mining

approach (usually ENSO is detected 2 months

ahead).

Distributions of the time shifts between starts of collocated high frequency

and low frequency events. As the reference low-frequency event category, we

consider the 3.36 year characteristic scale, i.e. an exhibited scale from both our

analysis and the power spectrum of the of ENSO signal


Intérêt à discrétiser en temps

Page 12

Conclusions on the event-based data mining method.

Our approach resorts to a normalized representation through the detection of significant time-frequency events,

which account for the auto-correlation and noise level of each time series. This is regarded as a key feature for the considered

geophysical time series, which depict large autocorrelation level (typically from 0.3 to 0.8) and great spatial variabilities of the variance.

Our main methodological contribution lies in the definition and detection of the significant elementary time-frequency events. It provides

a simple and quantitative means to unmix the processes of interest and study their time/scale relationships.

The proposed event-based analysis enables us to achieve a joint analysis of a large set of time series through the identification of event

categories and associated characteristic scale ranges.

This is of key interest as small scale processes in time and space might affect large scale ones in time and space and vice-

versa. Frankignoul [6] showed using a simple model that the short scale wind forcing at local scale could introduce a large geographical

scale in the SSTA.

Our event-based methodology opens new perspectives for the analysis of multivariate time series such wind and SST,

light and chlorophyll-a, light and SST and chlorophyll-a. While we considered here the interaction between events of the same geophysical

variable, this type of analysis could be applied to two or more different variables. The ability to track events among different time series is

also a possible topic of research. Finally, the event-based detection could be used to remove the contribution

of non-stationary signals such as ENSO that can affect the long term trend estimation [32] and correlation analysis.


Page 13

Saulquin B., Fablet R., Mangin A., Mercier G., Antoine D., Fanton d'Andon O. “Detection

of linear trends in multi-sensor time series in presence of auto-correlated noise:

application to the chlorophyll-a SeaWiFS and MERIS datasets and extrapolation to the

incoming Sentinel 3 - OLCI mission” accepted at JGR in March 2013.

II Etude du climat, retours vers les capteurs,

optimisation des réseaux de surveillance

Page 14

Qu’est ce qu’une tendance: une question philosophique ?

Notre définition: une tendance est la pente estimée dans le signal auquel on a enlevé les composantes

stationnaires (cycle saisonnier) et si possible non stationnaires (evts, El Nino …).

Le modèle linéaire est trop simple mais conservatif !



Page 15

The detection of long-term trends in geophysical time series is a key issue in climate change

studies. This detection is affected by many factors: the amplitude of the trend to be

detected, the length of the available datasets, and the noise properties. Although the auto-

correlation observed in geophysical time series does not bias the trend estimate, it affects the

estimation of its uncertainty and consequently the ability to detect, or not, a significant trend.

Ignoring the auto-correlation level typically leads to an over-detection of significant trends.

Satellite time series have been providing remote observations of the sea surface for several

decades. Due to satellite lifetime, usually between 5 and 10 years, these time series do not

cover the same period and are acquired by different sensors with different characteristics. These

differences lead to unknown level shifts (biases) between the datasets, which affect the trend

detection

The estimation of the level shift between the two time series using an inter-calibration

procedure prior to the estimation of the shared linear trend is statistically relevant if one

accounts for the uncertainty of the level shift in the variance of the trend estimate.

Neglecting this uncertainty resorts to a null-shift case.

Trend detection : the problem ?



Page 16

Trend detection: case of a single-sensor dataset

The trend uncertainty:

(4)



Page 17

(*********)



Page 18



Page 19

Trend detection: application to the SeaWiFS (1998-2009) and MERIS (2002-2011) datasets



Page 20

Where the time t is in any case relative to the start of the first time series, which is considered as

the reference. T0 is the starting time of the second time series, and n1, n2, are respectively the

length of the first and second time series. μ and ῳ are respectively the intercept term and the

linear trend shared by the two time series. δ is the unknown level shift of the second time series

compared to the first one, supposed here as constant in time. U=1 for t >= T0 and U = 0 for t < T0.

N1t and N2t are the auto-correlated noises of the two time series.

Trend detection: case of a multi-sensor dataset



Page 21

The use of the joint chl-a SeaWiFS-MERIS dataset over

the period 1998-2011 led to the detection of 60% of

significant trends, compared to 41 % for the SeaWiFS

dataset only and 50% for the MERIS dataset only,

contributing to a better characterization of region-specific

patterns in the detected trends.

Global 2.83 x 10-4 1.59 x 10-1 1.0 x 10-2 3.20 x 10-3

Atlantic 8.27 x 10-4 -1.59 x 10-1 1.0 x 10-2 4.60 x 10-3

Pacific 7.27 x 10-4 -7.49 x 10-2 1.0 x 10-2 2.80 x 10-3

Indian Ocean -1.40 x 10-3 -1.14 x 10-1 9.60 x 10-3 2.00 x 10-3

Trend detection: application to the SeaWiFS & MERIS (1998-2011) datasets



Page 22

Effect of the time overlap or the gap-time (in months)

between two time series of 60 months on the trend

uncertainty coefficient G (Eq. 9).

Trend detection: optimisation of an observation network



60 months

60 months

60 months

60 months

DT

The varying parameter

DT

Page 23


Figure 5: Effect of the length of the second time series on

the uncertainty trend coefficient G (Eq. 9) with (a) a one year

overlap and (b) a one year gap.



60 months

L

60 months

L

The varying parameter

Page 24

Estimated duration of needed Sentinel 3 - OLCI measurements to enhance the joint

SeaWiFS - MERIS detection of long-term linear trend: from simulations of model (Eq. 9, see text for details)




Page 25

Conclusions sur l’étude de tendances dans des séries géophysiques

The two major statistical factors governing a trend estimation and detection in a single-sensor time-series are the auto-correlation

and the variance of the noise. The estimated noise auto-correlation showed a latitudinal distribution with a greater mean value of

0.35, compared to 0.25 for greater latitudes. This difference leads to an increase of 16% of the uncertainty on the

estimation of the same trend in these two different areas.

When two time series are available, the trend detection depends on the uncertainty on the level shift between the datasets. In

case of an overlap, the shift uncertainty is diminished. The use of the joint chl-a SeaWiFS-MERIS dataset over the period

1998-2011 led to the detection of 60% of significant trends, compared to 41 % for the SeaWiFS dataset only and 50% for the

MERIS dataset only, contributing to a better characterization of region-specific patterns in the detected trends.

Optimizing an observation network for the long term monitoring implies to minimize the effect of the unknown level shift by

organizing time overlaps between successive missions. From our analysis and for a noise auto-correlation level greater than

0.3 as observed in mean four our dataset, an overlap of 12 months has been found optimal to lower the uncertainty on

the level shift and to minimize the uncertainty on the trend estimate within two time series of 60 months.



Page 26

Conclusions sur l’étude de tendances dans des séries géophysiques

With no overlap between time series, the estimation of a potential level shift and its uncertainty is

needed (**).

We estimated a mean value of 53 months for the needed Sentinel 3 – OLCI observations, with some region-

dependent fluctuations between 40 to 68 months. This simulation was carried out using an uncertainty level on

the shift between OLCI and MERIS of the same magnitude than the one estimated between SeaWiFS and

MERIS. These results are coherent with the expected lifetime of the Sentinel 3-OLCI mission, and suggest that

the analysis of the global long-term patterns should actually benefit from the joint analysis of SeaWiFS, MERIS

and Sentinel 3-OLCI datasets.



Page 27

Le dernier volet de la thèse sera basé sur l’estimation de différents modes de co-

variation entre les variables géophysiques (regréssions) et d’utiliser à postériori ces

modes estimés pour prévoir des quantités géophysiques uniquement à partir

d’observations de surfaces.

Constat:

- les dynamiques entre les paramètres ne sont pas constantes au cours

du temps et des zones géographiques.

(ex: Chl = f (PAR , SST, CHL))

Solution (méthodologie en cours de développement … ):

Estimation de modèles probabilistes de mélanges de modes (régressions)

III Apprentissage / Modélisation / Prévision

Page 28

Probabilité conditionnelle d’une VA d ’appartenir

à une régression (dynamique géophysique …)

Sachant X

En cas de mixture (θ) la probabilité

conditionnelle (à maximiser) devient :

Prédiction

Probabilité à postériori d’un individu d’appartenir

au mode k:


Page 29

A optimiser:

- le nombre de classes (dynamiques entre les paramètres géophysiques)

- l’ordre des paramètres (récursivité temporelle)

=> par maximisation de la variance expliquée

A interpréter:

- les différentes dynamiques géophysiques


Exemple sur l’anomalie de chlorophylle-a

Page 30


Page 31

Publication …. en cours mais l’application reste à définir vers de la haute

résoltion

A Suivre … acte III …


Documents

Soumission du manuscrit prévue pour fin 2013 « Exploration temps …pagesperso.univ-brest.fr/~ailliot/SlidesBerder/Saulquin.pdf · 2013-07-12 · I - Analyse temps-fréquence et