38
Using Load Test to Automatically Compare the Subsystems of a Large Enterprise System Haroon Malik , Bram Adams & Ahmed E. Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University, Kingston, Canada Parminder Flora & Gilbert Hamann Performance Engineering Research In Motion, Waterloo, Canada

Compsac2010 malik

  • Upload
    sailqu

  • View
    107

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compsac2010 malik

Using Load Test to Automatically Compare the Subsystems of a Large Enterprise

SystemHaroon Malik, Bram Adams & Ahmed E. Hassan

Software Analysis and Intelligence Lab (SAIL)Queen’s University, Kingston, Canada

Parminder Flora & Gilbert Hamann Performance Engineering

Research In Motion, Waterloo, Canada

Page 2: Compsac2010 malik

Today's Large scale systems (LSS) are composed of many underlying subsystems.

These LSS grow rapidly in size to handle growing traffic, complex services and business critical functionality

Performance analyst have to face the challenge of dealing with performance bugs as processing is spread across thousands of subsystems and mail lion of hardware nodes

Page 3: Compsac2010 malik

LOAD TESTING

Page 4: Compsac2010 malik

LOAD TESTING

Load Generator-1

Load Generator-2

Monitoring Tool

Performance counter Log

Performance Repository

System

Page 5: Compsac2010 malik

Environment Setup Load test execution Load test analysis Report generation

CURRENT PRACTICE

1 2 3 4

Page 6: Compsac2010 malik

CHALLENGES…

Page 7: Compsac2010 malik

LARGE NUMBER OF PERFORMANCE COUNTERS

Page 8: Compsac2010 malik

LIMITED TIME

Page 9: Compsac2010 malik

RISK OF ERROR

2 + 2 = 5

Page 10: Compsac2010 malik

Automated

Methodology

Required

Page 11: Compsac2010 malik

homeLikeJustWorkNowreally

:::::::coolcpt

Just man Work home smash lunch day pretty beer ready working home day smash pretty

Time getting get well dude dinner bucket head really heading got time night get dude got

Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562

PC-1

PC-2

PC-3

Lot of Data Our Methodology Signature

METHODOLOGY

Page 12: Compsac2010 malik

homeLikeJustWorkNowreally

:::::::coolcpt

Just man Work home smash lunch day pretty beer ready working home day smash pretty

Time getting get well dude dinner bucket head really heading got time night get dude got

Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562

PC-1

PC-2

PC-3

Lot of Data Our Methodology Signature

METHODOLOGYDatabase

Mail Web

Page 13: Compsac2010 malik

METHODOLOGY

Commits/Sec

Writes/Sec

CPU Utilization

Database Cache % Hit

Subsystems Base-Line Load Test - 1 DeviationMatch

0.59

1

0.99

Page 14: Compsac2010 malik

METHODOLOGY STEPS

1 2 3 4 5 6

Data Preparation

Counter Normalization

Dimension Reduction

Crafting Performance Signatures

Extracting Performance Deviations

Report Generation

Page 15: Compsac2010 malik

CASE STUDY

Page 16: Compsac2010 malik

MEASURING THE PERFORMANCE

Base- Line

Test- 1t1 t2 t3 t4 t5 t6

Deviations Predicted (P)

Deviations Occurred (O)

PO= P ∩ O

Precision = P ∩ O/ P = 1/4 = 0.25

Recall = P ∩ O/ O = 1/3 = 0.33

Page 17: Compsac2010 malik

RESEARCH QUESTIONS

Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests?

Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS?

How is the performance of our methodology affected by different sampling intervals?

Page 18: Compsac2010 malik

Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests?

RQ-1

Page 19: Compsac2010 malik

APPROACH

4 Load tests 8 hours700 performance counters eachMonitoring interval 15 sec 1922 instances

Baseline test 85% data reductionTest-1 Baseline test reproductionTest-2 Synthetic fault injection via mutationTest-3 Increased the work load intensity (8X)

Page 20: Compsac2010 malik

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Base Line Test Test-A Synthesized Test 8X- Load

Performance Counters

impo

rtan

ce

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Web Server- A

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Application System

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Web Server- B

Database

Page 21: Compsac2010 malik

FINDINGS

Our methodology help performance analysts to identify sub-systems with performance deviations relative to prior tests

SubsystemsLoad Test

Test-A Synthesized 8-X loadData Base 0.997 0.732 0.826Web Server-A 1.000 0.701 0.795Web Server-B 1.000 0.700 0.790Application 1.000 0.623 0.681

Page 22: Compsac2010 malik

Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS?

RQ-2

Page 23: Compsac2010 malik

1 33 65 97 129

161

193

225

257

289

321

353

385

417

449

481

513

545

577

609

641

673

705

737

769

801

833

865

897

929

961

99335

40

45

50

55

60

65

70

75

80

% CPU Utilization

Observations

Page 24: Compsac2010 malik

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

35

40

45

50

55

60

65

70

75

80

% CPU Utilization

Observations

Page 25: Compsac2010 malik

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Page 26: Compsac2010 malik

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Page 27: Compsac2010 malik

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Page 28: Compsac2010 malik

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Page 29: Compsac2010 malik

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Page 30: Compsac2010 malik

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Page 31: Compsac2010 malik

Database(30-mins)

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Base-Line Test Load Test

Database(15-mins)

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1 Database(10-mins)

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1 Database(5-mins)

Performance Counters

impo

rtan

ce

Page 32: Compsac2010 malik

FINDINGSTime-(Observations) Database30-mins (120) 115-mins ( 60) 110-mins (40) 0.98935-mins (20) 0.8255

Early identification of deviations within 10 minutes or 40 Observations

Page 33: Compsac2010 malik

How is the performance of our methodology affected by different sampling intervals?

RQ-3

Page 34: Compsac2010 malik

Two Load Test 2 hours, Each Monitoring rate– 15 sec Fault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval

30 min – 4 Samples 15 min – 8 Samples

Baseline

Load Test -1

30-min

APPROACH

Page 35: Compsac2010 malik

Baseline

Load Test -1

30-min

Two Load Test 2 hours, Each Monitoring rate– 15 secFault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval

30 min – 4 Samples 15 min – 8 Samples

APPROACH

Page 36: Compsac2010 malik

Baseline

Load Test -130-min

Two Load Test 2 hours, Each Monitoring rate– 15 secFault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval

30 min – 4 Samples 15 min – 8 Samples

15-min

APPROACH

Page 37: Compsac2010 malik

Small sample yield high RECALL

FINDINGS

Test Run Database Web Server -1 Web Server- 2 Application System Average  

Min Obs Samples Recall Prec Recall Prec Recall Prec Recall Prec Recall Prec30 120 4 0.50 1.00 0.50 1.00 0.30 1.00 0.25 1.00 0.325 1.000  

15 60 8 0.62 1.00 0.62 1.0 0.62 1.0 0.50 1.0 0.590 1.000  

10 40 12 1.00 0.90 1.00 0.9 1.00 0.9 0.9 0.69 0.975 0.847  

5 20 24 1.00 0.70 1.00 0.7 1.00 0.8 1.00 0.66 1.000 0.715  

All - 0.78 0.90 0.78 0.90 0.73 0.92 0.66 0.83 0.738 0.890  

Large sample yield high PRECISION

Methodology performs best at 10 minutes time interval with nice balance of both recall and precision

Page 38: Compsac2010 malik