Upload
judith-little
View
226
Download
7
Embed Size (px)
Citation preview
Gestion efficace de Séries Temporelles en P2P
Application à l'analyse technique et l'étude des objets mobiles
G. Gardarin, B. Nguyen, L. Yeh, K. Zeitouni, B. Butnaru, I. Sandu-Popa
Laboratoire PRiSM – Université de Versailles Saint-Quentin
BDA’09 - Namur
Gardarin et al. -- BDA'09 2
Motivation
Technical Analysis (Economy) Determine buy / sell operations based on time
series calculations parameter tuning Empirical / Tuning : many simulations to be run Very large time series (quotes every 15 secs over
thousands of items, with years of data) Objective : delegate computing power and caching
in a P2P network Mobile objects
Compute aggregate queries over time series of sensor data (see paper for more details)
Gardarin et al. -- BDA'09 3
/!\ Time Series vs. Data Streams /!\
Time Series Persistent data queried on demand using complex queries. Historical type data. Size is an issue.Current commercial performance : under 1M per second
Data Stream Transient data queried by simple continuous
queries (event detection). Real Time oriented.
GBS 30445.860360101000
Gardarin et al. -- BDA'09 4
Contributions
Extensible TS model with functional operators compatible with XQuery 1.1
Efficient P2P techniques for XQuery 1.1 with special TS management
Current XQ 1.1 engines have very poor performance.
Gardarin et al. -- BDA'09 5
Outline
IntroductionTime Series ModelP2P TS computingExperimentsConclusion
Time Series Model
Gardarin et al. -- BDA'09 7
Date (ISO)
Value (xs:double)
TS Entry
ROSeS Model : an infinite vector
Date 2007-01-05 2007-01-08 … 2009-03-06
Value 5517.35 5518.59 … 2534.45
PX1-Close (CAC)
TS define a (metric) vector spaceTS3=TS1+s*TS2
Gardarin et al. -- BDA'09 8
Granularity
Some choice of semantics needs to be made in order to perform scale change.
Adopted semantics : Entry Date = interval start (included) Next entry date = interval end (excluded)
Need to define beginning of day with ms. precision
Gardarin et al. -- BDA'09 9
Null values
Unknown value (?)Undefined value (!)
Date 2007-01-05 2007-01-06 2007-01-08 2007-01-09 2009-03-06
Value 5517.35 ! 5518.59 ? 2534.45
??*
!*!
s
s
???
!?!
!!!
!!
??
val
val
Assume value ! for all dates preceding the first one.Management of “end” of TS needs ! value
Gardarin et al. -- BDA'09 10
Relational like operators
Filter Map
Gardarin et al. -- BDA'09 11
Union and Intersection
???
!?!
!!!
!!
!?
!21
val
val
valval
valvalval
???
??!
!!!
!
??
?21
valval
val
valval
valvalval
Gardarin et al. -- BDA'09 12
K-ary joins
JOINfun(S1, ... Sk) =
{[t, m] | [t, val1] in S1 and … [t, valk] in Sk and m = fun(val1, …valk)}
/!\ Must define behavior for null values.
Gardarin et al. -- BDA'09 13
Some window functions
w
tSwtStSMAVGw
w
tStSMAVG w
t
wtw
][][]1,[*1][],[
Moving Average
][][
][*100][
iLiG
iGiR
ww
ww
Relative Strength Index (RSI) Moving Average Convergence/Divergence (MACD)
)()(
)(
26129 SMAVGSMAVGXAVG
SMACD
In general : “Constant” or Linear complexity in wLinear in t
Gardarin et al. -- BDA'09 14
SELL = SEL>1.1(MAVG26(S) /MAVG12(S)))
BUY = SEL>0(XAVG9(MAVG12(S) - MAVG26(S)))
Some buy/sell rules
Gardarin et al. -- BDA'09 15
TS/XML : a practical exchange format
TS Schema
XQ 1.1 mavg implementation (naïve)
/!\ Limited maths functions
Benefit from the expressive power of XQ to write rules !
Gardarin et al. -- BDA'09 16
Preliminary resultsN W
Our JAVA
System X
1000 10 <1 16
1000 50 <1 45
1000 100 <1 91
2000 10 <1 28
2000 50 <1 90
2000 100 <1 178
4000 10 <1 53
4000 50 <1 179
4000 100 <1 357
16000 10 4 212
16000 50 4 765
16000 100 4 1404
100000 10 25 1914
100000 50 25 5026
100000 100 25 9251
500000 10 128 9862
500000 50 130 28259
500000 100 129 49347
[email protected] 6, 1GB Heap
mavg
MACD (java)
Gardarin et al. -- BDA'09 17
XQ Problems
Important overhead with XML type-checking and structure (limit to xs:double)
Limited Maths functionsTS are in fact manipulated in let clauses
Enhance our XQ processor with non-XQ functions on XML-TS data that respect our schema
P2P TS Computing
Gardarin et al. -- BDA'09 19
How can we achieve scalability ?
Observation : Many runs of a given user share intermediate
results Many users share intermediate results
Divide computation cost by nDivide disk read/write time by nDivide memory usage by n
Gardarin et al. -- BDA'09 20
TS Distribution – horizontal partitioning
CHORD DHT
/!\ Overlap is necessary /!\Choice limits window size
N/K
(N)
Gardarin et al. -- BDA'09 21
DHT
Two sorts of “key/value” pairs Key : TSName list of slices IDs (numbered) Key : TSName+SliceID peer containing the slice
Connect/Disconnect is managed by the DHT
Computation algorithm P1 wants to compute Q1 P1 gets the location of all TS Slices needed Ship query to peers Compute query on peer (if possible) Transfer results to P1
Gardarin et al. -- BDA'09 22
(Naïve) Caching
JOIN(MAVG(CAC40,10), SCALE(RSI(CAC40, 14), 100), SUM)
[7, 8, 9]
Limitation : equivalent expressionsOpen issue : how to choose peer ?
Experiments
Gardarin et al. -- BDA'09 24
XQ2P Prototype
98% XQuery 1.0 compliant database (java)
XQ 1.1 window functionalitiesOptimized external TS functionsP2P storage and computing
http://cassiopee.prism.uvsq.fr/XQ2P/
Gardarin et al. -- BDA'09 25
P2PTester infrastructure
Gardarin et al. -- BDA'09 26
Experimental evaluation using P2PTester (4 machines)
P TINDEX TR TP TQ
TNETTP2P
8 7,1 56,8 4473 <1 400 4930
16 6,3 100,8 2176 <1 400 2677
32 7,4 236,8 1106 <1 400 1743
64 8,4 537,6 580 <1 400 1518
128 8,9 1139,2 286 <1 400 1825
256 9,7 2483,2 140 <1 400 3023
NETQPRPP TTTTT 2
NETPSC TTT /
Gardarin et al. -- BDA'09 27
Relative gain simulation (no caching)
Conclusion
Already efficient, still lots to do…
Gardarin et al. -- BDA'09 29
Current / Future Work
TS Granularity operators“Enhanced” Caching (canonical form
transformation)Date-based join optimizationTS Distance computation, top-kXQ 1.1 window operator optimizationOther XQ2P improvements
Merci !