Gestion efficace de Séries Temporelles en P2P Application à l'analyse technique et...

Preview:

Citation preview

Gestion efficace de Séries Temporelles en P2P

Application à l'analyse technique et l'étude des objets mobiles

G. Gardarin, B. Nguyen, L. Yeh, K. Zeitouni, B. Butnaru, I. Sandu-Popa

Laboratoire PRiSM – Université de Versailles Saint-Quentin

BDA’09 - Namur

Gardarin et al. -- BDA'09 2

Motivation

Technical Analysis (Economy) Determine buy / sell operations based on time

series calculations parameter tuning Empirical / Tuning : many simulations to be run Very large time series (quotes every 15 secs over

thousands of items, with years of data) Objective : delegate computing power and caching

in a P2P network Mobile objects

Compute aggregate queries over time series of sensor data (see paper for more details)

Gardarin et al. -- BDA'09 3

/!\ Time Series vs. Data Streams /!\

Time Series Persistent data queried on demand using complex queries. Historical type data. Size is an issue.Current commercial performance : under 1M per second

Data Stream Transient data queried by simple continuous

queries (event detection). Real Time oriented.

GBS 30445.860360101000

Gardarin et al. -- BDA'09 4

Contributions

Extensible TS model with functional operators compatible with XQuery 1.1

Efficient P2P techniques for XQuery 1.1 with special TS management

Current XQ 1.1 engines have very poor performance.

Gardarin et al. -- BDA'09 5

Outline

IntroductionTime Series ModelP2P TS computingExperimentsConclusion

Time Series Model

Gardarin et al. -- BDA'09 7

Date (ISO)

Value (xs:double)

TS Entry

ROSeS Model : an infinite vector

Date 2007-01-05 2007-01-08 … 2009-03-06

Value 5517.35 5518.59 … 2534.45

PX1-Close (CAC)

TS define a (metric) vector spaceTS3=TS1+s*TS2

Gardarin et al. -- BDA'09 8

Granularity

Some choice of semantics needs to be made in order to perform scale change.

Adopted semantics : Entry Date = interval start (included) Next entry date = interval end (excluded)

Need to define beginning of day with ms. precision

Gardarin et al. -- BDA'09 9

Null values

Unknown value (?)Undefined value (!)

Date 2007-01-05 2007-01-06 2007-01-08 2007-01-09 2009-03-06

Value 5517.35 ! 5518.59 ? 2534.45

??*

!*!

s

s

???

!?!

!!!

!!

??

val

val

Assume value ! for all dates preceding the first one.Management of “end” of TS needs ! value

Gardarin et al. -- BDA'09 10

Relational like operators

Filter Map

Gardarin et al. -- BDA'09 11

Union and Intersection

???

!?!

!!!

!!

!?

!21

val

val

valval

valvalval

???

??!

!!!

!

??

?21

valval

val

valval

valvalval

Gardarin et al. -- BDA'09 12

K-ary joins

JOINfun(S1, ... Sk) =

{[t, m] | [t, val1] in S1 and … [t, valk] in Sk and m = fun(val1, …valk)}

/!\ Must define behavior for null values.

Gardarin et al. -- BDA'09 13

Some window functions

w

tSwtStSMAVGw

w

tStSMAVG w

t

wtw

][][]1,[*1][],[

Moving Average

][][

][*100][

iLiG

iGiR

ww

ww

Relative Strength Index (RSI) Moving Average Convergence/Divergence (MACD)

)()(

)(

26129 SMAVGSMAVGXAVG

SMACD

In general : “Constant” or Linear complexity in wLinear in t

Gardarin et al. -- BDA'09 14

SELL = SEL>1.1(MAVG26(S) /MAVG12(S)))

BUY = SEL>0(XAVG9(MAVG12(S) - MAVG26(S)))

Some buy/sell rules

Gardarin et al. -- BDA'09 15

TS/XML : a practical exchange format

TS Schema

XQ 1.1 mavg implementation (naïve)

/!\ Limited maths functions

Benefit from the expressive power of XQ to write rules !

Gardarin et al. -- BDA'09 16

Preliminary resultsN W

Our JAVA

System X

1000 10 <1 16

1000 50 <1 45

1000 100 <1 91

2000 10 <1 28

2000 50 <1 90

2000 100 <1 178

4000 10 <1 53

4000 50 <1 179

4000 100 <1 357

16000 10 4 212

16000 50 4 765

16000 100 4 1404

100000 10 25 1914

100000 50 25 5026

100000 100 25 9251

500000 10 128 9862

500000 50 130 28259

500000 100 129 49347

Xeon-X5450@3.00GHzJava 6, 1GB Heap

mavg

MACD (java)

Gardarin et al. -- BDA'09 17

XQ Problems

Important overhead with XML type-checking and structure (limit to xs:double)

Limited Maths functionsTS are in fact manipulated in let clauses

Enhance our XQ processor with non-XQ functions on XML-TS data that respect our schema

P2P TS Computing

Gardarin et al. -- BDA'09 19

How can we achieve scalability ?

Observation : Many runs of a given user share intermediate

results Many users share intermediate results

Divide computation cost by nDivide disk read/write time by nDivide memory usage by n

Gardarin et al. -- BDA'09 20

TS Distribution – horizontal partitioning

CHORD DHT

/!\ Overlap is necessary /!\Choice limits window size

N/K

(N)

Gardarin et al. -- BDA'09 21

DHT

Two sorts of “key/value” pairs Key : TSName list of slices IDs (numbered) Key : TSName+SliceID peer containing the slice

Connect/Disconnect is managed by the DHT

Computation algorithm P1 wants to compute Q1 P1 gets the location of all TS Slices needed Ship query to peers Compute query on peer (if possible) Transfer results to P1

Gardarin et al. -- BDA'09 22

(Naïve) Caching

JOIN(MAVG(CAC40,10), SCALE(RSI(CAC40, 14), 100), SUM)

[7, 8, 9]

Limitation : equivalent expressionsOpen issue : how to choose peer ?

Experiments

Gardarin et al. -- BDA'09 24

XQ2P Prototype

98% XQuery 1.0 compliant database (java)

XQ 1.1 window functionalitiesOptimized external TS functionsP2P storage and computing

http://cassiopee.prism.uvsq.fr/XQ2P/

Gardarin et al. -- BDA'09 25

P2PTester infrastructure

Gardarin et al. -- BDA'09 26

Experimental evaluation using P2PTester (4 machines)

P TINDEX TR TP TQ

TNETTP2P

8 7,1 56,8 4473 <1 400 4930

16 6,3 100,8 2176 <1 400 2677

32 7,4 236,8 1106 <1 400 1743

64 8,4 537,6 580 <1 400 1518

128 8,9 1139,2 286 <1 400 1825

256 9,7 2483,2 140 <1 400 3023

NETQPRPP TTTTT 2

NETPSC TTT /

Gardarin et al. -- BDA'09 27

Relative gain simulation (no caching)

Conclusion

Already efficient, still lots to do…

Gardarin et al. -- BDA'09 29

Current / Future Work

TS Granularity operators“Enhanced” Caching (canonical form

transformation)Date-based join optimizationTS Distance computation, top-kXQ 1.1 window operator optimizationOther XQ2P improvements

Merci !

Recommended