Memory Efficient Signature Matching in Deep Packet
194
Fakult ¨ at f ¨ ur Elektrotechnik und Informationstechnik Technische Universit¨ at M ¨ unchen Memory Efficient Signature Matching in Deep Packet Inspection Applications at Line Rates Shiva Shankar Subramanian Vollst¨ andiger Abdruck der von der Fakult¨ at f¨ ur Elektrotechnik und Informationstechnik der Technischen Universit¨ at M¨ unchen zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) genehmigten Dissertation. Vorsitzender: Prof. Dr.-Ing. Georg Sigl Pr¨ ufende der Dissertation: 1. Prof. Dr. sc.techn. Andreas Herkersdorf 2. Prof. Dr.-Ing. Ulf Schlichtmann Die Dissertation wurde am 19.12.2018 bei der Technischen Universit¨ at M¨ unchen eingereicht und durch die Fakult¨ at f¨ ur Elektrotechnik und Informationstechnik am 19.08.2019 angenommen.
Memory Efficient Signature Matching in Deep Packet
Memory Efficient Signature Matching in Deep Packet Inspection
Applications at Line Rates Memory Efficient Signature Matching in
Deep Packet Inspection Applications at Line Rates
Shiva Shankar Subramanian
Doktor-Ingenieurs (Dr.-Ing.)
genehmigten Dissertation.
Vorsitzender: Prof. Dr.-Ing. Georg Sigl
Prufende der Dissertation: 1. Prof. Dr. sc.techn. Andreas
Herkersdorf 2. Prof. Dr.-Ing. Ulf Schlichtmann
Die Dissertation wurde am 19.12.2018 bei der Technischen
Universitat Munchen eingereicht und durch die Fakultat fur
Elektrotechnik und Informationstechnik am 19.08.2019
angenommen.
Abstract
Deep Packet Inspection (DPI) is the process of inspecting all the
headers and the payload in a network packet. Various content-aware
networking applications such as intrusion detection/prevention,
load balancing, copyright enforcement and content aware Quality-
of-Service (QoS) drive the need for performing DPI in the
next-generation networks. Comparing the payload bytes against the
signatures, which are defined through string and regular expression
patterns is the most time-critical function in DPI. So, it is
essential to accelerate the signature matching function to perform
DPI at multi gigabit rates (∼10 Gbps) in modern embedded network
processors.
The signatures or patterns which are used for payload inspection
are either represented through the Non-Deterministic Finite
Automaton (NFA) or the Deterministic Finite Au- tomaton (DFA). The
DFA is generally preferred to represent the signatures, especially
for high speed signature matching applications as it is highly
processing efficient in com- parison to the NFA. On the other hand,
the DFA is highly storage inefficient due to the presence of
redundant state transitions and the exponential state blow-up
corresponding to certain signature sets. So, the DFA is generally
compressed before being stored in the memory, while the payload
bytes are actually compared against the compressed DFA.
The transition compression algorithms play a significant role in
defining the memory footprint of the compressed DFA, as well as in
mandating the rate at which the signa- ture matching function can
be performed. The state-of-the-art transition compression
algorithms either focus on effectively compressing the DFA [1, 2,
3] or focus on com- pressing the DFA in such a way that the
signature matching can be performed at multi gigabit rates [4, 5],
but not both. Addressing this shortcoming, two transition compres-
sion methods, the MSBT and the LSCT are proposed and evaluated in
this dissertation which can effectively compress the DFA as well as
perform the signature matching func- tion at about ∼ 10 Gbps. The
proposed methods efficiently compress the redundant transitions
through a combination of bitmaps and bitmasks. Additionally, the
linearity in the arrangement of state transitions in the DFA is
further leveraged to compress the redundant bitmasks in the
compressed DFA. The MSBT and the LSCT are capable of achieving
transition compression rates of the order of over 98% primarily due
to the introduction of bitmasks, which is 4-5% higher than the
state-of-the-art bitmap based compression algorithms. The
improvement in the transition compression rates and the bitmask
compression reduces the memory footprint of the compressed DFA by
∼70% in comparison to the state-of-the-art transition compression
algorithms.
A bitmap based signature matching engine called BiSME is proposed
and implemented which performs the decompression corresponding to
the MSBT. BiSME encompasses two efficient storage architectures,
the Packed Storage Methodology and the Shared Memory Methodology to
effectively store the compressed DFA in the on-chip memories in a
con-
iii
Abstract
figurable manner without resulting in memory wastage. The Bitmask
Optimized BiSME (BOBiSME) extends the BiSME architecture to support
bitmask compression further reducing the area requirements of
BiSME. The BiSME and the BOBiSME were effec- tively pipelined to
perform the signature matching function at 9.3 Gbps and 10.6 Gbps,
respectively. The signature matching engines were designed and
synthesized on the TSMC 28nm technology node and are capable of
storing a maximum of 1000 string sig- natures. The BiSME and
BoBiSME require 1.43 mm2 and 1.2 mm2 of silicon area and only
consume 155 mW and 170 mW of power, respectively making them
area-efficient and power-efficient hardware coprocessor
architectures. The BiSME and the BOBiSME architectures were
prototyped on the cadence palladium platform and network traffic
traces of over 2 GBytes consisting of different traffic
characteristics were injected to validate the hardware
implementation.
iv
Acknowledgments
This dissertation would not have been possible without the support
and encouragement from a number of people to whom I would like to
express my gratitude from the bottom of my heart.
First and foremost, I would like to thank Prof. Dr. sc. techn.
Andreas Herkersdorf for agreeing to be my doctoral advisor. The
profound technical discussions which I had with him greatly helped
me to structurally formulate my research problems and propose
innovative solutions during my research work. His extensive
experience in working with industry based research programs enabled
me to not only make quality contributions to the academic
community, but also to conceive effective solutions which could be
easily adopted by the industry. Additionally, I would like to thank
him for accommodating me as a visiting researcher in his institute,
which also allowed me to experience the research environment in
TUM.
I would like to thank Dr. Pinxing Lin from Intel Technology Asia
Pte. Ltd. for accepting to be my industry supervisor for the
research program. He has always been the go-to person during all
the challenging times and I would like to thank him for all the
great discussions which he has had with me over these times. His
calm demeanor always put me at ease and is a quality which I have
always admired in him.
I would also like to thank Dr.-Ing. Thomas Wild for his technical
insights during various stages of my research work. I really
enjoyed the discussions which I had with him and I really
appreciate his prompt and timely feedback over the course of the
research project.
I would like to express my gratitude to Mr. Mario Traeber for
believing in me and giving me an opportunity to be the first
candidate for the pioneering Industry PhD Program (IPP). I would
like to thank Ms. Ko Kah Goh and Mario for the enormous effort
which they put in to define the organization of this program. I
would also like to thank Mr. Bing Tao Xu, for supporting my career
aspirations and allowing me to proceed with the IPP. My sincere
thanks to Mr. Daniel Artusi for his continuous moral support
throughout the course of the IPP. Finally, I would like to thank
Intel Technology Asia Pte. Ltd., Singapore for funding this
research program and providing me an opportunity to work on such a
great research topic.
I would like to acknowledge the technical contribution of certain
members within Intel without whose help this work would not have
been possible. My thanks to Roshini John who setup the software
simulations on automata models which helped me to get a deeper
understanding on automata based signature matching. My thanks to
Hariharan for helping me to prepare the synthesis setup for the
hardware implementation of the signature matching engine which was
developed in this research work. My thanks to Anmol Prakash
Surhonne, Nihar Sriram and Sagar Paramesh for their support in
the
v
Acknowledgments
preparation of the evaluation setup to verify the hardware
implementation. Additionally, I would like to thank all my
colleagues in Intel Technology Asia Pte. Ltd who have helped me
over the course of this program. My special thanks to Abishek Godaa
for all the thought provoking discussions which he has had with me.
Many thanks to Jyothi, Kavitha, Dharmesh, Seshagiri, Giridhar,
Jincy, Rekha, Suresh Nagaraj, Suresh Kumar for being great friends
and colleagues at work. Many thanks to all my colleagues in
Institute for Integrated Systems in TUM who made my stay memorable
in Munich. Especially, my special thanks to Michael Vonbun for all
his help during my stay in Munich and during the course of the
preparation of this dissertation. I would also like to thank Mrs.
Doris Zeller, Mrs. Draga Verena and Mrs. Gabriele Spohrle for all
their secretarial support during the course of the program.
Last but not least, I would like to thank my family for their
unconditional love and affection which gave me the strength to take
up such a monumental journey. I would also like to thank my wife
Varsha Sharma for her love. Her support and encouragement was in
the end what made this dissertation possible. Knowing that these
words are wholly inadequate, I express my deepest thanks to them
for bearing with me, for being so constant and supportive, and for
giving me their faith and trust. It is to them that I dedicate this
dissertation.
vi
Contents
List of Figures xi
List of Tables xv
1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 1 1.2 Introduction to Deep Packet
Inspection . . . . . . . . . . . . . . . . . . . 3
1.2.1 Packet Inspection Methodologies - Overview . . . . . . . . .
. . . 3 1.2.2 Steps in Deep Packet Inspection . . . . . . . . . . .
. . . . . . . . 4 1.2.3 Signature Matching - Requirements &
Challenges . . . . . . . . . . 6
1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 8 1.4 Dissertation Organization . . . . . . . . .
. . . . . . . . . . . . . . . . . . 10
2 State-of-the-Art 13 2.1 String Based Signature Matching Engines .
. . . . . . . . . . . . . . . . . 13 2.2 Automata Based Approaches
. . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Introduction to NFA & DFA . . . . . . . . . . . . . . . .
. . . . . 15 2.2.2 Automata Based Signature Matching - Complexity
Analysis . . . . 18
2.3 DFA Compression . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 18 2.3.1 State Compression . . . . . . . . . . . . .
. . . . . . . . . . . . . . 19
2.3.1.1 State Explosion Problem . . . . . . . . . . . . . . . . . .
19 2.3.1.2 State Compression Solutions . . . . . . . . . . . . . .
. . 21
2.3.2 Transition Compression . . . . . . . . . . . . . . . . . . .
. . . . . 23 2.3.2.1 Software Oriented Algorithms . . . . . . . . .
. . . . . . 23 2.3.2.2 Hardware Oriented Algorithms . . . . . . . .
. . . . . . . 26 2.3.2.3 Alphabet Compression . . . . . . . . . . .
. . . . . . . . 32
2.3.3 Summary of DFA Compression . . . . . . . . . . . . . . . . .
. . . 34 2.4 Line Rate Signature Matching Engine Implementations .
. . . . . . . . . 34
2.4.1 FPGA - Logic Based Implementations . . . . . . . . . . . . .
. . . 35 2.4.2 GPU Based Signature Matching Engines . . . . . . . .
. . . . . . . 36 2.4.3 Hardware Accelerators - ASIC . . . . . . . .
. . . . . . . . . . . . 36
2.4.3.1 Automata Storage - TCAM . . . . . . . . . . . . . . . . .
37
vii
Contents
2.5 State-of-the-Art Summary . . . . . . . . . . . . . . . . . . .
. . . . . . . . 39
3 Bitmask: A Secondary Indexing Layer for Bitmap based Transition
Com- pression 43 3.1 Member State Bitmask Technique . . . . . . . .
. . . . . . . . . . . . . . 44
3.1.1 MSBT - An Example . . . . . . . . . . . . . . . . . . . . . .
. . . 44
3.1.2 Compressed DFA Organization . . . . . . . . . . . . . . . . .
. . . 47
3.1.3 Decompression . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 48
3.1.3.2 Hardware Decompression Engine for MSBT - Logical Block
Level Description . . . . . . . . . . . . . . . . . . . 50
3.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 51
3.2.1 Leader Transition Bitmask . . . . . . . . . . . . . . . . . .
. . . . 53
3.2.2 Compressed DFA Organization - LSCT . . . . . . . . . . . . .
. . 54
3.2.3 Decompression . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 54
3.2.3.2 Hardware Decompression Engine for LSCT - Logical Block
Level Description . . . . . . . . . . . . . . . . . . . . . .
57
3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 58
3.3.3 Estimated Memory Usage . . . . . . . . . . . . . . . . . . .
. . . . 62
3.3.4 Functional Evaluation - Software Model . . . . . . . . . . .
. . . . 64
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 65
4 Memory Footprint Optimizations - MSBT & LSCT 69 4.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 69
4.2 Improved Transition Compression through State Grouping . . . .
. . . . . 70
4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 70
4.2.2.1 The Divide Step . . . . . . . . . . . . . . . . . . . . . .
. 73
4.2.2.2 State Reorganization . . . . . . . . . . . . . . . . . . .
. 74
4.2.2.4 Optimal Leader State Identification . . . . . . . . . . . .
78
4.2.3 Complexity Analysis Comparison . . . . . . . . . . . . . . .
. . . . 79
4.2.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . .
. . . . 80
4.2.5 Discussion & Summary . . . . . . . . . . . . . . . . . .
. . . . . . 84
viii
Contents
4.3 Alphabet Compression . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 85 4.3.1 Combining Alphabet Compression with MSBT
& LSCT . . . . . . 85 4.3.2 Experimental Evaluation . . . . . .
. . . . . . . . . . . . . . . . . 88
4.3.2.1 Transition Compression Rate . . . . . . . . . . . . . . . .
89 4.3.2.2 Estimated Memory Usage . . . . . . . . . . . . . . . . .
. 90
4.3.3 Discussion & Summary . . . . . . . . . . . . . . . . . .
. . . . . . 90 4.4 Bitmask Compression . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 91
4.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 92 4.4.2 Bitmask Optimized Member State Bitmask
Technique . . . . . . . 94
4.4.2.1 Bitmask Compression . . . . . . . . . . . . . . . . . . . .
94 4.4.2.2 Memory Organization . . . . . . . . . . . . . . . . . .
. . 97 4.4.2.3 Transition Decompression . . . . . . . . . . . . . .
. . . . 97 4.4.2.4 Hardware Engine for Decompression - Logical
Block Level
Description . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . .
. . . . 102
4.4.3.1 Estimated Memory Usage . . . . . . . . . . . . . . . . . .
102 4.4.3.2 Functional Evaluation - Software Model . . . . . . . .
. . 104
4.4.4 Discussion & Summary . . . . . . . . . . . . . . . . . .
. . . . . . 105 4.5 Conclusion . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 106
5 Hardware Coprocessors for Signature Matching 109 5.1 Overview . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
109 5.2 Bitmask Storage . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 110
5.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 110 5.2.2 Packed Storage Methodology . . . . . . . . .
. . . . . . . . . . . . 112
5.2.2.1 Split Memory Implementation . . . . . . . . . . . . . . .
113 5.2.2.2 Bitmask Extraction . . . . . . . . . . . . . . . . . .
. . . 115
5.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 117 5.3 Compressed Transition Storage . . . . . . . . . .
. . . . . . . . . . . . . . 117
5.3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 117 5.3.2 Shared Memory Methodology . . . . . . . . . .
. . . . . . . . . . . 120
5.3.2.1 Transition Memory Access . . . . . . . . . . . . . . . . .
121 5.3.3 Discussion & Summary . . . . . . . . . . . . . . . .
. . . . . . . . 124
5.4 BiSME - Internal Architecture . . . . . . . . . . . . . . . . .
. . . . . . . 124 5.4.1 Memory Shell . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 126 5.4.2 Address Decoder &
Memory Access Multiplexer . . . . . . . . . . 128 5.4.3
Decompression Engine . . . . . . . . . . . . . . . . . . . . . . .
. . 129 5.4.4 BOBiSME - Modified BiSME to support Bitmask
Compression . . 132 5.4.5 Throughput . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 134
5.5 Deep Packet Inspection Accelerator . . . . . . . . . . . . . .
. . . . . . . . 135 5.5.1 DPIA Interfaces . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 136 5.5.2 DPIA Internal Architecture
. . . . . . . . . . . . . . . . . . . . . . 137 5.5.3 Network Data
Management Engine . . . . . . . . . . . . . . . . . . 138
5.5.3.1 Configurable Stream Mapping . . . . . . . . . . . . . . .
138
ix
Contents
5.6 Experimental Evaluation & Discussion . . . . . . . . . . .
. . . . . . . . . 141 5.6.1 Synthesis Results . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 141 5.6.2 Signature Capacity . .
. . . . . . . . . . . . . . . . . . . . . . . . . 144
5.6.2.1 BiSME . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 144 5.6.2.2 BOBiSME . . . . . . . . . . . . . . . . . . . . . . .
. . . 144
5.6.3 Hardware Implementation Validation . . . . . . . . . . . . .
. . . . 145 5.6.3.1 Traffic Generation . . . . . . . . . . . . . .
. . . . . . . . 146 5.6.3.2 Internal Architecture of the Evaluation
Setup . . . . . . . 149 5.6.3.3 Results & Discussion . . . . .
. . . . . . . . . . . . . . . 152
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 159
x
List of Figures
1.1 A smart home which has many IoT devices connected to it. (Image
down- loaded from [6]). . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 2
1.2 An overview of various packet inspection methodologies . . . .
. . . . . . 4
1.3 Overview of the various steps to be performed in deep packet
inspection . 5
2.1 Classification of various signature matching engines proposed
in the liter- ature . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 14
2.2 A signature set with 2 signatures abc, def.*12 represented as
(a) NFA and (c) DFA. The states traversed during (b) NFA and (d)
DFA based signature matching. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 17
2.3 (a) state explosion scenario 1 (b) state explosion scenario 2
(c) state ex- plosion scenario 3 . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 20
2.4 (a) A DFA with state transitions for 5 states and 4 characters
(b) The compressed D2FA representation of the DFA (c) δFA
representation of the DFA . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 24
2.5 A signature a[b-d]ef represented as a DFA and through RCDFA . .
. . . . 25
2.6 (a) A sequence of 8 Transitions within a state (b) The bitmap
correspond- ing to the state transitions to identify if a state
transition is compressed (c) Compressed state transition
representation after bitmap based com- pression stored in a unique
transition list (d) Transition decompression - Example . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.7 Example to explain the FEACAN transition compression (a) An
uncom- pressed DFA with 8 states and state transitions for 8
characters for each state (b) The compressed DFA after bitmap based
intra-state transition compression (c) The DFA states grouped into
subsets of states after intra- state compression (d) The compressed
DFA after inter-state transition compression . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 (a) An uncompressed DFA with 8 states and state transitions for
8 charac- ters for each state (b) The compressed DFA after RCDFA
(c) The bitmaps across the vertical state axis (d) Unique bitmap
after being combined . . 30
2.9 (a) Bitmaps corresponding to character ‘d’ and ‘f’ combined (b)
Addi- tional transitions stored corresponding to the unique
transition list for character ‘f’ . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . 31
2.10 Transition Compression through Alphabet Compression in a DFA .
. . . . 33
xi
List of Figures
3.1 (a) Original DFA before compression (b) The DFA after bitmap
based intra-state transition compression (c) The DFA states grouped
into sub- set of states (d) Encoded state representation of the DFA
states (e) The DFA after inter-state transition compression (f) The
MTB and the cu- mulative sum of transitions after inter-state
compression (g) Compressed DFA generated after FEACAN . . . . . . .
. . . . . . . . . . . . . . . . . 45
3.2 The compressed transitions stored in the Leader and Member
Transitions Tables while the control information is stored in
Address Mapping Table and Member Bitmask Table . . . . . . . . . .
. . . . . . . . . . . . . . . . 48
3.3 Representation of the compressed DFA with respect to MSBT
decompres- sion system . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 50
3.4 Functional description of the hardware based decompression
architecture for MSBT . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 51
3.5 (a) Compressed DFA after the MSBT (b) Compressed DFA after LSCT
(c) The LTB & the MTB for each state . . . . . . . . . . . . .
. . . . . . 52
3.6 The compressed DFA organized into the AMT, BT and the TT after
LSCT transition compression . . . . . . . . . . . . . . . . . . . .
. . . . . 55
3.7 Representation of the compressed DFA with respect to the LSCT
decom- pression system . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 56
3.8 Logical block level description of the hardware based
decompression en- gine for the LSCT . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 57
3.9 Comparison of transition compression rate across various
techniques . . . 60
3.10 Comparison of transition and control memory usage across
various tech- niques . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 63
3.11 Overview of the software based simulation environment to
verify the tran- sition compression . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 64
4.1 (a) An uncompressed DFA with 8 states and 8 charcaters (b) DFA
af- ter Intra-state transition compression step in MSBT (c) The DFA
split into 3 groups after the state clustering step (d) DFA after
Inter-state transition compression step in MSBT (e) State 7 merged
with G1 as it shares the same bitmap defying the constraint with
transition threshold (f) More transitions compressed when
Inter-state transition compression is performed after merging G1
and G2 . . . . . . . . . . . . . . . . . . . . 71
4.2 Example of DFA states clustered into a set of 5 groups using
the divide step . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 74
4.3 Example of state reorganization step performed on the states
after the divide step . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 76
4.4 State 9 merged with G2 as part of conquer step . . . . . . . .
. . . . . . . 77
4.5 States 5 and 8 swapped in G3 as part of optimal leader state
identification 79
4.6 Percentage reduction in the Leader and Member transitions
between FEA- CAN and DC grouping in MSBT, T=80%, B=256 . . . . . .
. . . . . . . 82
4.7 Estimated on-chip SRAM memory usage to store the compressed DFA
. . 84
xii
List of Figures
4.8 (a-c) MSBT transition compression in a DFA without Alphabet
Com- pression (d-f) MSBT transition compression in a DFA after
Alphabet Compression . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 87
4.9 Example showing the LSCT implemented after Alphabet Compression
. . 87
4.10 Estimated on-chip SRAM memory usage to store the compressed
DFA . . 91
4.11 Example of redundant bitmasks generated during MSBT transition
com- pression . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 93
4.12 Compressing the MTBs to reduce the storage requirements of
bitmasks . 95
4.13 Organization of the compressed DFA between the control and the
data memories . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 97
4.14 Compressed Transition fetch from the MTT when the unique
bitmask bit for the member state is 1 . . . . . . . . . . . . . . .
. . . . . . . . . . . . 98
4.15 Compressed Transition fetch from the MTT when the unique
bitmask bit for the member state is 0 . . . . . . . . . . . . . . .
. . . . . . . . . . . . 100
4.16 Functional Description of the Hardware Accelerator to perform
the Tran- sition Decompression . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 102
4.17 The total number of MTBs generated before and after Bitmask
Compression103
4.18 Comparison of estimated on-chip SRAM memory usage to store the
com- pressed DFA before and after bitmask compression . . . . . . .
. . . . . . 103
4.19 Comparison of the control memory usage between MSBT and BOMSBT
. 104
5.1 Overview of the Deep Packet Inspection Accelerator . . . . . .
. . . . . . 109
5.2 MTB width variation across signature sets and within a
signature set . . 111
5.3 Example of how the bitmasks are stored using the “Packed
Storage Method- ology” . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 113
5.4 Packed Storage Methodology - split memory storage
implementation . . . 114
5.5 Packed Storage Methodology - split memory implementation data
reor- ganization . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 115
5.6 Extracting the bitmask from the data fetched from the memory .
. . . . . 116
5.7 Hardware logic used to extract the bitmask from the memory . .
. . . . . 118
5.8 Shared memory architecture with state transitions flexibly
stored in the LTT, the MTT and the shared memories . . . . . . . .
. . . . . . . . . . . 120
5.9 Overview of memory block allocation during compile time . . . .
. . . . . 122
5.10 Overview of the transition memory access using the shared
memory method- ology . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 123
5.11 (a) Internal Architecture of the Signature Matching Engine (b)
SRAM interface description . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 125
5.12 (a) DFA state encoding (b) Organization of the compressed
transitions in the memory (c) Table to store the MTB and the
cumulative sum of tran- sitions (d) Table to store the base
addresses and other control information 127
5.13 Details of the byte stream interface and signature match
output interface 129
xiii
List of Figures
5.14 (a) DFA state encoding (b) Organization of the compressed
transitions in the memory (c) Member Bitmask Table to store the MTB
and the cumulative sum of transitions (d) Table to store the base
addresses and other control information . . . . . . . . . . . . . .
. . . . . . . . . . . . . 133
5.15 (a) ALS & LTBFS split into multiple pipeline stages in
BiSME (b) ALS and LTBFS split into multiple pipeline stages in
BOBiSME (c) Pipelined operation of BiSME with context based byte
interleaving . . . . . . . . . 136
5.16 Block level description of the Deep Packet Inspection
Accelerator . . . . . 137 5.17 Functional description of the
programmable rule based stream to context
mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 139 5.18 The accepting states used as the hash index
to identify the postprocessing
function associated with a signature match . . . . . . . . . . . .
. . . . . 140 5.19 (a) DPIA with multiple instances of signature
matching engine (b) Sig-
nature count scalability (c) Signature matching throughput
scalability . . 140 5.20 A pictorial representation of the
evaluation setup used to verify the SME
hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 147 5.21 Functional overview of Traffic generation .
. . . . . . . . . . . . . . . . . . 148 5.22 Detailed block level
internal architecture of the evaluation setup . . . . . 150 5.23
Extraction of the payload bytes and injection into the signature
matching
engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 151 5.24 Signature Matching results on the exact
match signature set in BiSME
for 10 passes . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 154 5.25 Signature matching results after 2.3 GB of
controlled traffic inspection on
the exact match signature set . . . . . . . . . . . . . . . . . . .
. . . . . . 155 5.26 Signature matching results after 2.3 GB of
controlled traffic inspection on
the bro217 signature set . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 155 5.27 Signature matching results after 1 GB of
random traffic inspection on the
bro217 signature set . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 157 5.28 Signature matching results after 2.3 GB of
controlled traffic inspection on
the exact match signature set . . . . . . . . . . . . . . . . . . .
. . . . . . 158 5.29 Signature matching results after 2.3 GB of
controlled traffic inspection on
the bro217 signature set . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 158 5.30 Signature matching results after 1 GB of
random traffic inspection on the
bro217 signature set . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 159
A.1 Example DFA to explain the trace generation mechanism . . . . .
. . . . 175
B.1 Example of the Accumulative Parallel Adder circuitry to perform
the population count function . . . . . . . . . . . . . . . . . . .
. . . . . . . . 178
xiv
List of Tables
2.1 Processing and storage complexity associated with NFA & DFA
. . . . . . 18 2.2 Comparison of various platforms with respect to
signature matching en-
gine implementation . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 35
3.1 Signature sets used for evaluation . . . . . . . . . . . . . .
. . . . . . . . . 58 3.2 Signature characteristics after bitmap
compression and state grouping . . 59 3.3 Comparison of the average
number of transitions in the member state
after compression . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 61 3.4 Average number of transitions fetched before
fetching the compressed
state transition . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 61 3.5 Parameters used for memory usage estimation
. . . . . . . . . . . . . . . . 62 3.6 Comparison of signature
matching results across the compression methods
across different PM values . . . . . . . . . . . . . . . . . . . .
. . . . . . . 66
4.1 Algorithmic complexity of various algorithms proposed in DC . .
. . . . . 80 4.2 Comparison of the Number of Groups and the Average
Members per
Group between FEACAN & DC, B=256 . . . . . . . . . . . . . . .
. . . . 81 4.3 Comparison of transition compression rate achieved
in MSBT with FEA-
CAN and DC state grouping, B=256 . . . . . . . . . . . . . . . . .
. . . . 81 4.4 Comparison of transition compression achieved in
LSCT with FEACAN
and DC state grouping, B=256 . . . . . . . . . . . . . . . . . . .
. . . . . 83 4.5 Comparison of transition compression achieved in
MSBT & LSCT when
the maximum number of states in the groups, T=80% . . . . . . . . .
. . 83 4.6 Summary of the Signature set before and after alphabet
compression . . . 89 4.7 Compressed Transitions Before and After
Alphabet Compression . . . . . 90 4.8 Comparison of signature
matching results across the compression methods
across different PM values . . . . . . . . . . . . . . . . . . . .
. . . . . . . 105
5.1 MTB address calculation example . . . . . . . . . . . . . . . .
. . . . . . 113 5.2 Example values used for the parameters to
explain the Shared Memory
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 121 5.3 Shared memory architecture - memory addressing
. . . . . . . . . . . . . . 122 5.4 Synthesis Area Results . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . 142 5.5
Comparison of BiSME against other Hardware Engines . . . . . . . .
. . 143 5.6 Signature sets compiled into BiSME . . . . . . . . . .
. . . . . . . . . . . 144 5.7 Signature sets compiled into BOBiSME
. . . . . . . . . . . . . . . . . . . 145 5.8 Evaluation BOBiSME
string signature capacity . . . . . . . . . . . . . . . 145 5.9
Source and Destination MAC address combinations to identify a
context . 149
xv
xvi
1 Introduction
1.1 Motivation
The evolution of the Internet of Things (IoT) and the possibility
to connect numerous electronic devices through a multitude of
communication technologies is revolutionizing the way in which we
interact with these devices. The ability to remotely monitor and
manage these devices enable systems to make data driven decisions;
further saving time for people and businesses to improve the
quality of our lives. According to a report from Accenture [7], the
evolution of the IoT devices is still in its nascent stages and the
total number of connected devices is expected to hit about 40
billion by 2024. The growth in the number of IoT devices is spread
across a multitude of environments such as industrial automation,
wearables, transportation, infrastructure applications, home
automation etc.
A wide range of IoT devices and applications are emerging for use
in home, which include connected devices such as personal
computers, mobile phones, tablets, smart televisions, gaming
consoles, smart camera and many other smart appliances as shown in
Figure 1.1. Though these devices bring additional convenience to
our lives, the net- working capability in these devices present
cybercriminals an opportunity to exploit them [8]. Issues such as
competitive cost, technical constraints and the time to market
pressure challenge the manufacturers to adequately design effective
security features into these devices. Distributed Denial-of-Service
(DDoS) attacks such as mirai botnet [9] is an example of the
devastation that could be caused by cybercriminals who capitalize
on the always on networking connectivity in these devices.
Additionally, the sheer growth in the number of IoT devices kindles
their interest to target these devices. The connected devices in
the home network, not only allow the attackers to execute complex
attacks from the compromised devices, but also allow them to access
precious user data within the devices and the other devices which
are connected to the home network. Various studies by researchers
have found that home appliances such as smart refrigerators or
smart televisions are susceptible to be compromised [10, 11], and
the biggest worry is that the end users are completely caught
unaware of it. Cisco’s annual security report states that the
patching rate for these IoT devices is minimal and in most cases,
the vulnerabilities are not even patched [12]. So, securing the IoT
devices in the home net- work is a major issue which has to be
addressed before converting our home, really into a Smart
Home.
The Residential Gateway Router (RGR) is a centralized networking
hub in the con- nected home. The RGR has also been the networking
node which is used to manage the network communication between the
network service provider and the devices connected to the network.
However, due to the explosion in the number of connected devices in
the
1
1 Introduction
Figure 1.1: A smart home which has many IoT devices connected to
it. (Image downloaded from [6]).
home network, the RGR now has to take the additional responsibility
of securing and managing the devices connected to it. So, it is
essential to implement Content-Aware Networking (CAN) services such
as intrusion detection/prevention, load balancing, con- tent aware
Quality-of-Service (QoS), copyright enforcement etc. in the RGR, in
addition to enabling connectivity with the network service
provider. In this way, the RGR be- comes the central networking
device in the home network through which the various connected
devices can be effectively secured and managed.
A Network Processor (NP) is the heart of the RGR, which is
responsible to analyze the network packets which are generated by
the devices connected to the home network. The data which is
communicated in a network packet is generally classified into the
header and the payload. The payload consists of the actual
application data which is exchanged by the network devices, while
the header information in a packet helps to identify the next-hop
device to which the packet is sent to. In order to perform the
packet for- warding function (switching/routing), the NP in the RGR
generally inspects the packet header across various layers
specified as part of the Open Systems Interconnection (OSI)
specification [13]. However, in order to perform the content aware
networking functions, it is essential to inspect the complete
network packet including the headers at various
2
1.2 Introduction to Deep Packet Inspection
layers and the content of the payload in a packet. So, it is
essential to perform Deep Packet Inspection (DPI) in the NP, the
technology which allows to completely inspect a network packet to
enable content-aware networking in the next generation RGR.
A typical network processor which is used in the RGR consists of
multiple wireline1
and wireless network interfaces2 through which it can enable
communication between multiple connected devices. The incredible
development in the wireline (2.5/5 Gbps Ethernet [17]) and the
wireless (802.11ax [18]) communication technologies which are used
in the home networking ecosystem are allowing devices to
communicate at multigi- gabit rates, even within the home network.
Moreover, the deployment of services such as fiber-to-the-home
(FTTH) [19, 20], mandates the network processors used in the RGR to
perform packet processing at multi-gigabit rates with the network
service provider [21]. As more applications begin to leverage the
available bandwidth in the home network, it is crucial to perform
DPI at multi-gigabit rates to process packets in a content aware
manner.
The next section provides a short introduction to various packet
inspection method- ologies and further describes the challenges
associated in performing DPI.
1.2 Introduction to Deep Packet Inspection
1.2.1 Packet Inspection Methodologies - Overview
Figure 1.2 shows the various layers3 which are inspected as part of
network packet processing. Based on the depth of the layers until
which a packet is inspected, the packet inspection methodology is
classified into the Shallow Packet Inspection (SPI), Medium depth
Packet Inspection (MPI) and the Deep Packet Inspection (DPI)
[22].
The SPI and the MPI methodologies primarily inspect the packet
header alone. The difference between the two methodologies arises
depending on the specific header layers which are inspected in a
packet. The packet headers corresponding to layers 2, 3 and 4 alone
are inspected as part of the SPI, while the header information
across all the header layers are inspected as part of the MPI.
Various rules are defined to inspect specific portions of the
headers in a packet, while the rules also consist of actions which
have to be applied after a rule match. It should be noted that the
location of the header fields which are inspected in a packet are
known a priori, as the headers adhere to standard network
protocols. So, the processing associated with respect to the
SPI
1The current generation network processors in an RGR consists of 4
Ethernet ports each of which is capable of communicating at
10/100/1000 Mbps [14, 15, 16].
2Next-generation network processor architectures used for the RGR
are enabled with multiple (typically 4) WiFi baseband processors
which provide a consolidated wireless throughput of more than
multiple gigabits (∼6 Gbps) per second [14, 15, 16].
3According to the OSI model, the information in a network packet is
primarily split into 7 layers; physical layer, data link layer,
network layer, transport layer, session layer, presentation layer
and the application layer. The physical layer forms the lowest
level of abstraction while the application layer is highest. The
physical layer represents the medium in which the data is
communicated. So the header information is generally added for the
other 6 layers and the packet inspection is also performed
corresponding to these 6 layers.
3
Figure 1.2: An overview of various packet inspection
methodologies
and the MPI primarily involve searching for specific fields in the
header whose location is known.
As shown in Figure 1.2, in the case of the DPI, the packet payload
is also inspected in addition to the packet headers across various
layers. So, the rules which are used for packet inspection in DPI
consist of two parts. The first part consists of specific fields
which have to match in the packet header. After the required fields
in the packet headers match, a signature is searched in the payload
portion of the packet. The signatures which are used for payload
inspection are either composed of strings or regular expressions.
Unlike the header inspection in which the location of the inspected
portion in the header is known, the location of the signature which
is being searched in the payload is unknown. Moreover, the
signature can start at any byte location within the sequence of
payload bytes and can even be split across the payload portions of
multiple packets. So, all the payload bytes across multiple packets
have to be sequentially inspected to check if the signatures are
found within the payload byte sequence. Thus, the payload
inspection is a computationally challenging task in DPI in
comparison to header inspection.
1.2.2 Steps in Deep Packet Inspection
The various functions which are performed as part of DPI can be
broadly classified into packet normalization, packet reordering,
packet prefiltering, signature matching and the postprocessing
[23]. Figure 1.3 shows a logical ordering of these functions and a
short description of these functions is outlined below.
4
1.2 Introduction to Deep Packet Inspection
Figure 1.3: Overview of the various steps to be performed in deep
packet inspection
• Normalization: Normalization [24] is the first step in DPI, in
which various sanity checks are performed on a network packet
before performing the payload inspection. Various researchers [24,
25] have identified that the network and the transport layer
headers can be synthetically configured to evade the signature
matching function. So, the main aim of normalization is to
eliminate those mal- formed packets and to ensure that only those
packets which comply to the network protocols are inspected.
• Packet Reordering: The packets which are processed by a network
processor are generated by different network devices and have to be
initially split into network packet streams before performing the
payload inspection. Furthermore, the packets corresponding to a
stream should also be reordered to make sure that they are
inspected in the right sequence [26]. The normalization and the
packet reordering functions have to be performed on each and every
individual packet.
• Packet Prefiltering: Since the main aim of DPI is to enable
content aware networking, the signatures are generally associated
with a certain higher layer ap- plication protocol, e.g., HTTP, FTP
etc. So, the signature database is also divided into subsets of
signatures, where each of the subsets identify the signatures
associ- ated with a specific application protocol. The application
protocol information is extracted after inspecting the header in
the transport layer or the application layer headers. The extracted
information can be used to compare the packet payloads against a
signature subset, instead of comparing the payload bytes against
all the
5
1 Introduction
signatures. The process of comparing the packet payload against a
signature sub- set after inspecting the packet headers is called
packet pre-filtering and has been used by various DPI
implementations [27, 28, 29]. The packet pre-filtering is a one
time step which is generally performed during the packet
classification stages where a new stream is categorized into a
specific higher layer protocol.
• Signature Matching: The signature matching is the most important
step in DPI in which the payload bytes are compared against the
signatures. In this step, each and every byte in the payload is
compared against the database of signatures. Since there can be
multiple signatures in a signature subset, it is essential to match
the payload bytes against all the signatures at once to perform the
function in a computationally effective manner. Moreover, since the
application data is split into multiple packet payloads, there are
chances that a signature is split across multiple network packets.
So, it is essential to match signatures across multiple network
packets. To summarize, the signature matching is the most
computation- ally challenging task in DPI and is also the apt
function for hardware acceleration to perform DPI at multi gigabit
line rates [30]. So, considering the fact that the network traffic
at such high rates have to be inspected by the network processors
in the RGR, hardware acceleration of signature matching becomes
quintessential to perform line rate signature matching within the
home network [31].
• Postprocessing: Once a signature is successfully matched, the
action correspond- ing to a signature is performed on the packet or
the stream in this step. The action associated with a signature is
generally described as part of its definition, while the
criticality of the postprocessing function varies depending on the
individual signatures.
To summarize, the signature matching function is the most time
critical function which enables the network processors to perform
line rate signature matching. The next section discusses the
computational and storage challenges associated with the signature
matching function.
1.2.3 Signature Matching - Requirements & Challenges
The signatures which are used for DPI are either represented
through strings or regular expressions. Since the signatures cannot
be directly processed by modern processor ar- chitectures, they are
either converted into the Deterministic Finite Automaton (DFA) or
the Non-deterministic Finite Automaton (NFA). Both these are
machine readable state machine representations which are
functionally equivalent to the signature set. Since the processing
complexity of comparing a payload byte against the DFA is constant
(O(1)), it is the preferred form to represent the signatures,
especially for high speed signature matching applications [32]. On
the contrary, the DFA is highly storage ineffective due to the
presence of redundant state transitions in it. Moreover, when the
signatures are described through regular expressions, the total
number of states generated in the DFA explodes exponentially and
this problem is referred to as the state explosion problem
[33].
6
1.2 Introduction to Deep Packet Inspection
So, the standard approach proposed by the academia is to compress
and store the DFA in the memory and eventually perform the
signature matching against the compressed DFA [34]. The state
explosion problem during the DFA generation is addressed by
performing state compression, while the redundant transitions are
compressed through the transition compression algorithms. Since
both these approaches address orthogonal problems, the transition
and the state compression algorithms are orthogonal to each other
[35].
The transition compression algorithms play an important role in
defining the memory footprint of the compressed DFA [1]. Achieving
high transition compression rates reduces the memory footprint of
the compressed DFA. However, it is also important to make sure that
the algorithmic approach towards transition compression enables the
decompression to be performed in a dedicated hardware accelerator.
This would enable the signature matching function to be performed
at line rates which in turn allows to perform DPI at line
rates.
The transition compression algorithms proposed in the literature
either focus on ef- ficiently compressing the DFA or focus on
performing the decompression at line rates, but not both. The
transition compression algorithms [1, 32, 36, 3] which belong to
the former category achieve high transition compression rates
typically of the order of over 95%. However, in these techniques,
the redundant transitions are compressed at the cost of increased
memory bandwidth which doesn’t allow the decompression to be
performed through dedicated hardware accelerators [4]. On the other
hand, the transition compres- sion algorithms which belong to the
latter [4, 5] focus on compressing the DFA through the bitmaps.
Compressing the redundant state transitions through the bitmap
maintains the constant processing complexity as that of the DFA,
which allows the decompression to be performed through dedicated
accelerators. However, these algorithms require the bitmap to be
stored together with the compressed DFA to identify the location of
the compressed transition. So, in order to reduce the number of
bitmaps stored together with the compressed DFA, the bitmap based
compression solutions proposed in the literature compromise on the
transition compression rates and only achieve ∼90-95%. Considering
the strategic importance of transition compression, there is a
requirement for a class of transition compression algorithms which
can effectively compress the DFA as well as en- able hardware
acceleration of the decompression function to perform signature
matching at line rates.
The signature matching throughput achieved by a hardware
accelerator engine in the case of DPI applications completely
depends on the organization of the compressed state transitions in
the memory. As in the case of any computing system, the data access
latency plays a big role in deciding the throughput that is
achieved in the case of signature matching applications. The
signature matching throughput achieved is the highest when the
compressed DFA is completely stored in the on-chip memories as the
data can be accessed at low latencies. However, if the compressed
DFA is stored in a combination of on-chip and off-chip memories,
the throughput of the hardware accelerator completely depends on
the latency to fetch the data from the off-chip memories [37].
There is a possibility to reduce the access latencies through the
introduction of cache memories as in the case of [37]. However,
loading the specific portions of the compressed DFA
7
1 Introduction
into the cache memories can become highly volatile, as this
completely depends on the characteristics of the payload bytes
being inspected. Thus, a great amount of effort will be required to
effectively design a caching algorithm to load the data into the
cache memories. So, if the DFA is compressed efficiently in such a
way that it is completely stored in the on-chip memories, the
hardware accelerator can be effectively used to perform the
signature matching function at line rates.
Considering that the hardware accelerator will be part of the
network processor used in the RGR, the following requirements
should be satisfied so that the architecture becomes reusable
across multiple generation of systems.
• Scalability: The signature matching engine should be scalable to
support increas- ing network bandwidth requirements and increasing
signature counts. While there is no specific data that is currently
available regarding the specific signature counts required for the
RGR, the significant growth in the new malware [38] will require to
support signature counts of the order of few thousands. The
signature matching throughput to be supported by the engine also
depends on the network bandwidth and the application scenarios.
Considering that multiple network interfaces are available in the
RGR which can support network bandwidth at multiple gigabits per
second, the signature matching throughput to be supported will
cross the 10 Gbps barrier in the future for the RGR.
• Flexbility & Programmability: When the compressed DFA is
completely stored in the on-chip memories, it is essential to make
sure that the data is ef- fectively stored in the physical
memories. Effective storage architectures should be proposed to
store the compressed DFA in a flexible way so that the on-chip mem-
ory space is effectively utilized. Additionally, the access to the
physical memory and sharing of physical memory space should be made
programmable to intro- duce a certain level of flexibility with
respect to compressed DFA storage. In this way, when the
accelerator is integrated with a network processor, the processor
can configure the functionality of the accelerator depending on
specific application requirements.
1.3 Thesis Contributions
Addressing the challenges mentioned above, two transition
compression methods are proposed in this dissertation which can
efficiently compress the DFA and also enable the decompression to
be performed in dedicated hardware accelerators. Following up on
the compression algorithms, a flexible, scalable, programmable
hardware accelerator is proposed which enables to perform the
signature matching functions at ∼10 Gbps. The following are the key
contributions of this dissertation:
• Transition Compression Through Bitmaps & Bitmasks: Though the
bitmap based transition compression methods proposed in the
literature perform the sig- nature matching through dedicated
hardware accelerators, they compromise on
8
1.3 Thesis Contributions
the transition compression rates to reduce the number of bitmaps
stored together with the compressed DFA. Addressing this drawback,
two bitmap based transi- tion compression methods called the Member
State Bitmask Technique (MSBT) and the Leader State Compression
Technique (LSCT) are proposed in this dis- sertation. These methods
achieve transition compression rates of about 97-98%, a 4-5%
improvement over the state-of-the-art solutions [39, 40]. The
redundant state transitions in the DFA are compressed through a
combination of bitmaps and bitmasks, a secondary layer of indexing
introduced in these methods. Though the bitmasks have to be stored
together with the compressed transitions, the im- provement in the
transition compression rates reduces the memory footprint of the
compressed DFA by 50% in comparison to the state-of-the-art bitmap
based solutions. The compressed DFA generated after bitmap based
transition compres- sion mainly consists of the compressed
transitions and the control data (bitmaps, bitmasks etc.), which
enable to locate the compressed transitions. The main idea behind
the proposed solutions is to add more control data in the form of
bitmasks to effectively index the redundant state transitions.
Thus, even a small improve- ment in the transition compression rate
considerably reduces the overall memory footprint of the compressed
DFA.
• Optimizing the Memory Footprint of the Compressed DFA after the
MSBT & the LSCT: Since, the compressed DFA is intended to be
completely stored in the on-chip memories after the MSBT and the
LSCT, it is paramount to make sure that the memory footprint of the
compressed DFA is well optimized. The following three methods are
proposed to further improve the memory footprint of the compressed
DFA.
1. Divide & Conquer State Grouping Method: The state grouping
is one of the integral steps in the proposed methods in which the
states are grouped into subsets after which the redundant
transitions are compressed. So, a compression-aware Divide and
Conquer (DC) state grouping method is proposed for this step,
through which the transition compression rates are improved by a
variable factor of 0.5-2%, thus reaching overall compression rates
of about 98-99% [41].
2. Bitmask Compression: As part of the MSBT and the LSCT, the
bitmask is generated for all the states in the DFA which enables to
effectively compress the redundant state transitions after bitmap
based compression. However, a majority of the bitmasks which are
generated in this process are redundant and are compressed through
the bitmask compression process. Experimental evaluation of the
bitmask compression shows that about ∼60-70% of the bit- masks are
redundant which are efficiently compressed through the proposed
method.
3. Combining Alphabet Compression with Bitmap Compression: Al-
phabet compression is a well known method which is used to compress
those indistinguishable characters in an alphabet, i.e, the ASCII
character set in the
9
1 Introduction
case of DPI applications. The alphabet compression is proposed to
be com- bined together with the MSBT and the LSCT, as the bitmap
alone cannot compress certain redundant state transitions in the
DFA.
All of the methods proposed above are orthogonal to each other and
can be imple- mented together to optimize the memory footprint of
the compressed DFA together with the MSBT and the LSCT. When
compared with the state-of-the-art solutions, the overall memory
footprint of the compressed DFA reduces by 70%, an additional
improvement of 20%, when combined together with the MSBT and the
LSCT.
• Hardware Coprocessor for Signature Matching: Utilizing the
proposed methods to compress the DFA, a Bitmap based Signature
Matching Engine called BiSME is proposed to perform the signature
matching function in a dedicated accel- erator to achieve line rate
DPI [42]. BiSME is a programmable, flexible and scalable
coprocessor which stores the compressed DFA in on-chip memories
after perform- ing the MSBT. The compressed DFA is flexibly stored
in the on-chip SRAMs through various efficient storage
architectures. The Shared Memory Methodology efficiently stores the
dynamically varying compressed transitions, whose count is
signature dependent, in the predefined on-chip memory boundaries.
The Packed Storage Methodology enables to store the unstructured
bitmasks flexibly, without wasting precious on-chip memory
resources. The BiSME is effectively pipelined to achieve a
signature matching throughput of 9.3 Gbps. Furthermore, an
extension to BiSME called BOBiSME is proposed which implements the
bitmask decom- pression in the hardware and is capable of
performing signature matching at 10.6 Gbps. The proposed signature
matching engines were synthesized on a commercial 28nm technology
library operating at 0.81V. The BiSME consumes 1.43 mm2 of silicon
area and consumes 0.155W power while the BOBiSME consumes 1.18
mm2
of silicon area and consumes 0.167W power. A compiler was designed
to convert the signature sets into BiSME and BOBiSME memory
formats. The functionality of the hardware implementation was
thoroughly verified in the Cadence Palladium platform by injecting
over 2 GB of identical traffic to BiSME, BOBiSME and a DFA based
signature matching engine. The identical signature matching results
verified the functional correctness of the hardware
implementation.
1.4 Dissertation Organization
The dissertation is organized as follows. Chapter 2 first provides
a detailed overview of the state-of-the-art signature matching
engines and summarizes the advantages and disadvantages of the
various methods proposed in the literature.
Chapter 3 proposes and evaluates the MSBT and the LSCT in which the
bitmasks are introduced to significantly improve the transition
compression rates in comparison to existing bitmap based
solutions.
10
1.4 Dissertation Organization
Chapter 4 proposes and evaluates three different methods, i.e, the
Divide and Conquer state grouping method, the alphabet compression
and the bitmask compression which focus on further reducing the
memory footprint of the compressed DFA.
Chapter 5 first discusses the two flexible storage architectures
through which the compressed DFA can be efficiently stored in the
on-chip memories and then the result- ing hardware engines through
which the transition decompression is performed after compressing
the DFA through the Member State Bitmask Technique. Furthermore,
this chapter discusses the key results with respect to the
achievable signature matching throughput and the further the
functional evaluation of the proposed hardware acceler-
ators.
Chapter 6 finally concludes the dissertation and discusses the
directions for some of the future work.
11
2 State-of-the-Art
String and regular expression based signature matching is a problem
which has been well addressed by the research community [34]. The
signatures which have been used for DPI have greatly evolved over
time. Initially, the signatures were primarily composed using
simple string patterns. However, due to the high false positive
rate observed during string based signature matching, Sommer et
al., [43] proposed the usage of regular ex- pressions to describe
the signatures. Since the regular expressions offer better
flexibility in comparison to strings, most of the applications
prefer to use the regular expressions to define the signatures
[44].
The signature matching engines have correspondingly evolved in
relation to the evo- lution in the signature representations. Based
on the type of signatures which are pro- cessed, the signature
matching engines can primarily be classified into string based and
automata based signature matching engines as shown in Figure 2.1
[34]. The string based signature matching engines are capable of
processing string signatures alone, while the automata based
signature matching engines are capable of processing both string
and regular expression signatures. Due to this advantage, a
majority of the research work has primarily focused on automata
based signature matching engines [34].
Existing research work on automata based signature matching engines
primarily fo- cuses on two aspects, i.e, automata compression and
line-rate signature matching imple- mentations [34, 23]. This
chapter provides an overview of solutions which focus on both of
these aspects.
This chapter is organized as follows. Section 2.1 provides an
overview of the various string based signature matching engines
proposed in the literature. Section 2.2 provides an introduction to
the theory of automata and the basics of automata based signature
matching. Section 2.3 provides an overview of the various DFA
compression approaches while Section 2.4 provides an overview of
the state-of-the-art solutions which target the problem of line
rate signature matching. Finally, Section 2.5 provides an overall
summary of the state-of-the-art approaches with respect to automata
based signature matching.
2.1 String Based Signature Matching Engines
The very first signature matching engines which were proposed in
the literature were targeted to process string based signatures and
are classified under the string based sig- nature matching engines
[34]. Initially, various classical string matching algorithms such
as Knuth Morris Pratt (KMP) [45], Aho-Corasick (AC) [46],
Boyer-Moore (BM) [47], etc., were used to match the payload bytes
against the string signatures. Dharmapurikar et al., [48]
identified that these algorithms are primarily tuned towards
software based
13
2 State-of-the-Art
Figure 2.1: Classification of various signature matching engines
proposed in the literature
signature matching implementations and are not useful for line rate
signature match- ing. So, they proposed to use the bloom filters
[49] for string based line rate signature matching implementations.
The bloom filter is a time and space efficient probabilistic data
structure that stores a database of strings compactly in a memory
vector. How- ever, the result of the querying process using the
bloom filter primarily indicates if an element (signature) is
likely to be present in the bloom filter or not present, i.e.,
false positives are possible as part of the querying process but
false negatives will never occur. So, a secondary source is
required to confirm if the signatures were identified after the
querying process.
The usage of bloom filters for signature matching involves a filter
programming step, in which k different hash functions are used to
set k different bits in an m-bit vector. For each of the signatures
in the signature set, the hash functions generate a set of indices
at which the bits in the vector are set to 1. As part of signature
matching, the payload bytes are passed to the k different hash
functions to generate indices at which the memory vector is
queried. If all the indices in the memory vector are set, a
signature is set to be identified within the payload byte stream.
Since a single m-bit vector is used to store multiple signatures,
the identified index could have possibly been set by a different
signature other than the one that matched. So, a secondary analyzer
is used to verify the signature matching results. However, the most
important property of this data structure is that the computation
time involved in performing the query is independent of the number
of strings stored in the vector.
The very first bloom filter based signature matching methodology
was proposed in [48], where multiple bloom filters in parallel were
used to accelerate the string matching func- tion. A hash based
secondary analyzer was used to distinguish between a false-positive
and a true positive. On the other hand, Nourani et al. [30]
proposed a bloom filter
14
2.2 Automata Based Approaches
based prefilter, to enable a first level filtering and a hash based
hardware engine to per- form signature matching at multi-gigabit
rates. There were various other enhancements which were proposed
using the bloom filter based architectures for signature matching
following [48] and [30]. However, due to the emergence of regular
expressions as the pre- ferred mode to describe the signatures, the
research community has primarily focused on automata based
signature matching engines which are capable of processing both
string and regular expression signatures [44, 34, 23, 50, 33]. The
next section provides an introduction to the theory behind the
automata and further describes the basics of automata based
signature matching approaches.
2.2 Automata Based Approaches
2.2.1 Introduction to NFA & DFA
An automaton or a Finite State Machine (FSM) is a state machine
which can under- stand the language expressed by a set of
signatures comprising both string and regular expression
signatures. To simplify, an automaton is an equivalent description
of a sig- nature set in the machine readable format. A finite
automaton [51] is represented as a 5-tuple(Q, Σ, δ, q0, F ) as
shown below:
• Q, is a finite set of states,
• Σ is a finite set of characters called alphabet,
• δ: Q ×Σ→ Q1, is the state transition function,
• q0 ∈ Q, is the initial state,
• F ⊆ Q, is the set of accepting states.
An automaton primarily consists of a finite set of nodes called the
states, Q and labeled directed edges between the states which are
called the state transitions. The labels in the directed edges
correspond to either one or multiple characters in the alphabet, Σ.
An automaton consists of a single initial state called the root
state, q0 from which the state machine starts the signature
matching function. The state transition function, δ takes a state
and a character2 as an input and generates a single or multiple
next states as the output. In an automaton, F is the subset of all
the states in Q, which identify a signature match and is referred
to as the set of accepting states.
As part of the signature matching, the root state is first assigned
as the current state. Subsequently, each and every payload byte is
input to the state transition function which computes the next
state transition corresponding to a current state and payload byte
combination. The computed next state is assigned as the current
state and the
1In the case of the Non-Deterministic Finite Automaton, the state
transition function generates a set of next states P(Q)
corresponding to a state-character combination, i.e, δ: Q ×Σ→
P(Q)
2As part of the dissertation, the terms character and payload byte
are used interchangeably to represent an individual payload byte in
the network packets.
15
2 State-of-the-Art
subsequent character is input to the machine to compute the next
state transition. This process continues until all the payload
bytes have been compared against the automaton. At any point of
time in this process, if the computed next state belongs to the
accepting states, then a signature is set to be detected by the
automaton.
Since the 8-bit extended American Standard Code for Information
Interchange (ASCII)3
character set is used for the internet communication, the
signatures for DPI applications are constructed using the ASCII
character set. A signature set can either be represented through
the the Non-deterministic Finite Automaton (NFA) or the
Deterministic Finite Automaton (DFA). The primary difference
between the representations is the output of the state transition
function, which also affects the total number of states in the
machine4. In the case of the NFA, the number of next states
generated by the state transition function is non-deterministic,
i.e, the state transition function can either gen- erate a single
next state, multiple next states or even no next state. On the
other hand, in the case of the DFA, the state transition function
always generates a single next state resulting in a deterministic
output. It should be noted that both the machines (NFA & DFA)
are equivalent representations of the language represented by a
signature set.
A signature set which consists of two different signatures, abc and
def.*12 is further used to explain the concepts behind the NFA
& DFA representations. The former is a string signature and is
only matched if there is a sequence of an ‘a’, followed by a ‘b’
and a ‘c’ in the payload. On the other hand, the latter is a
regular expression which is only matched if there is a sequence of
a ‘def’, followed by a ‘12’ with zero or more characters in between
‘def’ and ‘12’. Figure 2.2(a) and (c), respectively show the NFA
and the DFA representations corresponding to the signature set. The
initial state in both the automaton representations have been shown
using a green circle, while the red circles represent the accepting
states.
Figure 2.2(b) and (d), respectively show the sequence of states
traversed in the NFA and the DFA when the byte sequence to be
inspected is composed of ‘defabc12’. It can be seen that the byte
sequence being inspected consists of both the signatures which are
part of the signature definitions. In the case of the NFA, due to
the non-determinism associated with the state transitions, multiple
states are active at the same time during the signature matching as
seen in Figure 2.2(b). The states which are active during the
signature matching operation are called the ‘active set’ of states.
So, the state transition function corresponding to a character has
to be executed for all of the states in the active set. On the
other hand, it can be seen from Figure 2.2(d) that only a single
state is active during DFA based signature matching. Irrespective
of the automaton representations, it can be seen from Figure 2.2
that the signatures matched by the sequence of bytes is identical
in both the NFA and the DFA, which further prove their functional
equivalence.
Converting the signatures into an automaton consists of various
steps. A variety of algorithms [52, 53, 54] have been proposed in
the literature to convert a signature into
3With respect to DPI applications, the extended ASCII character set
is generally referred to as the ASCII character set. So as part of
the dissertation, the usage of ASCII character set refers to the 8
bit extended ASCII character set.
4In the rest of the dissertation, the term state-character
combination will be used to refer the inputs to the state
transition function.
16
2.2 Automata Based Approaches
Figure 2.2: A signature set with 2 signatures abc, def.*12
represented as (a) NFA and (c) DFA. The states traversed during (b)
NFA and (d) DFA based signature matching.
the NFA, which is the first step in converting the signatures into
an automaton. After converting the signatures into the NFA, the
subset construction algorithm [52] is used to convert the NFA into
the DFA. As part of the subset construction algorithm, all the
unique active set combinations seen in the NFA are converted to
unique DFA states. In theoretical worst-case scenarios, if there
are ‘N’ states in the NFA, the equivalent DFA could consist of a
maximum of 2N states. However, this exponential state blow-up
called the state explosion, only results when the regular
expressions in the signatures consist of constrained and
unconstrained repetitions of wildcards or large character ranges
[33, 50, 44]. Once the DFA is generated from the signature set, the
state minimization algorithm [55] is used to generate the most
compact DFA equivalent to the signature set.
17
2.2.2 Automata Based Signature Matching - Complexity Analysis
Table 2.1 shows a comparison of the processing complexity and the
storage cost incurred to store the NFA and the DFA. Since, the
state transition function in the DFA results in a single next
state, the processing complexity associated in comparing an
individual payload byte is always constant. On the other hand,
since the state transition function has to be executed across
multiple states in the active set, the worst case processing
complexity in comparing a payload byte in the case of the NFA is
exponentially related to the number of states5. However, with
respect to automata storage, in worst case scenarios, the amount of
memory required to store the DFA exponentially grows with respect
to the number of states while it is linear in the case of the NFA.
To summarize, the DFA is processing efficient and is storage
inefficient while the NFA is processing inefficient and is storage
efficient.
Table 2.1: Processing and storage complexity associated with NFA
& DFA
Processing complexity / payload byte Storage cost
NFA O(N2) O(N)
DFA O(1) O(2N)
Due to the constant processing complexity associated to process a
payload byte, the DFA is generally preferred for high speed
signature matching applications [32]. Though, the constant
processing complexity in the DFA enables to achieve signature
matching at high rates, the storage problem has to be addressed for
efficient signature matching implementations [1]. Addressing the
storage problem in the DFA, the research commu- nity has primarily
focused on compressing the DFA before it is stored in the memory
[23, 34]. The next section provides a detailed overview of the
various DFA compression algorithms discussed in the
literature.
2.3 DFA Compression
Since the DFA is a deterministic state machine representation, it
stores a single state transition for each of the characters in the
alphabet, Σ for every state. Consequently, the total number of
state transitions which are stored in the DFA, linearly depends on
the total number of states generated as well as the total number of
characters in the alphabet. Assuming that the DFA consists of ‘S’
states6, the total number of state transitions (DFA Trans) that
have to be stored in the memory is represented by (2.1).
DFA Trans = S × Σ (2.1)
5The number of states here represents the total number of states
generated in the NFA when a signature set is converted into the
automata.
6Its a convention in the filed of automata theory to represent a
DFA with ’S’ states corresponding to an NFA with ’N’ states.
18
2.3 DFA Compression
The DFA compression methods proposed in the literature can be
broadly classified into the state compression and the transition
compression approaches [34]. The state compression algorithms
primarily focus on reducing the number of states in the DFA, while
the transition compression algorithms focus on compressing the
redundant state transitions in the DFA. A detailed description of
both the approaches are described below.
2.3.1 State Compression
2.3.1.1 State Explosion Problem
Under certain circumstances, the conversion of an NFA into the DFA
can potentially result in an exponential explosion in the number of
states created in the DFA. This is called the state explosion
problem [50] and is typically addressed by the state compres- sion
algorithms. The exponential explosion of states primarily occurs
when the regular expressions consist of constrained or
unconstrained repetitions either over the wildcard character (.) or
over character ranges. Various analyses [50, 33] have identified
that the state explosion problem can be narrowed down to the
following 3 scenarios7 as described below:
• Scenario 1: Length Restriction on an Anchored Regular Expression
State explosion occurs when the signature contains the regular
expression of the following format: ˆA+[A-Z]{k}B. This specific
regular expression pattern can be broken down into 3 parts. The
first part is the prefix (A+), the middle part is the portion with
the length restriction on the character range or the wildcard
([A-Z]k) and the final part is the suffix (B). The state explosion
in this case occurs due to the overlap in the character(s) seen in
the prefix portion with the character range seen in the middle
portion. In this scenario, the generated DFA consists of states
which record all possible combinations in the overlap occurrence in
order to maintain functional correctness to the original signature
representation. In addition to the states generated as part of the
prefix and the suffix, O(k2) states are further generated to track
the overlap occurrence. For example, Figure 2.3(a) shows this
scenario where the DFA corresponding to the signature ˆA +[ˆ\n]{3}B
consists of 9 states (states 1-10) to keep track of all possible
combinations which potentially lead to a signature match. It should
be noted that an anchored regular expression (expressions which
have ˆ as the first symbol) is matched in the payload bytes, if and
only if the pattern exactly matches from the first byte within the
payload byte sequence.
• Scenario 2: Length Restriction on an Un-anchored Regular
Expression The second scenario is very similar to the first, but
results in a case where the regular expression does not have the
anchor symbol, i.e., the regular expression is of the format,
.*A[A-Z]{k}B. In this scenario, the state explosion also occurs
because of the overlap between the characters in the prefix and the
ones in the character
7The examples associated to the 3 scenarios have been used from
[34]
19
2 State-of-the-Art
Figure 2.3: (a) state explosion scenario 1 (b) state explosion
scenario 2 (c) state explosion scenario 3
class in the middle portion. Since, this is not an anchored regular
expression, there is no restriction with respect to the occurrence
of the signature within the sequence of payload bytes. For example,
an ‘A’ seen anywhere in the sequence of bytes would potentially
lead the DFA to search for the signature, irrespective of whether
the previous few bytes resulted in a partial signature match. In
this scenario, O(2k) additional states are required to keep track
of all possible combinations leading towards a potential signature
match. Figure 2.3(b) shows an example of this scenario, where a DFA
is shown corresponding to the regular expression .*A.{2}B.
• Scenario 3: Regular Expression Combinations Unlike the previous
scenar- ios where the state explosion resulted because of the
specific characteristics of the individual signature, this scenario
occurs due to the interaction of multiple regu- lar expressions
with wildcard terms in it. Two regular expressions of the format
RE1=sub1.*sub2 and RE2=sub3.*sub4 are used to show the state
explosion in this scenario, where each subi could refer to a
sequence of characters. The states corre- sponding to the
sub-expressions are duplicated as shown in Figure 2.3(c) resulting
in state explosion. This scenario occurs as the unrestricted
wildcard character in
20
2.3 DFA Compression
a signature could potentially match the sub expressions of other
signatures. So, the resulting DFA should be able to keep track of
the following scenarios:
1. After matching a sub-expression within a regular expression
(e.g. sub1 in RE1 / sub3 in RE2), the second sub-expression should
be tracked for a possible signature match (e.g. sub2 in RE1/sub4 in
RE2).
2. After matching the first sub-expression in a certain regular
expression (e.g. sub1 in RE1 / sub3 in RE2), the sub-expressions
which lead to a different signature should be tracked
simultaneously (e.g. sub3 in RE2 / sub1 in RE1).
Thus when multiple regular expressions with unrestricted wildcard
character repe- titions are converted into a single DFA, the number
of states in the resulting DFA grows rapidly.
2.3.1.2 State Compression Solutions
Various solutions have been proposed in the literature targeting
the state explosion prob- lem. Based on the algorithmic approach
used to solve the state explosion problem, the solutions can be
classified into rule grouping, semi-deterministic FAs and the
decomposed FAs [34]. A brief overview of each of the approaches is
described below:
• Rule Grouping: Since the state explosion primarily happens when
multiple sig- natures are combined together to create a single DFA,
one of the ways to avoid this problem is to create multiple DFAs
corresponding to a signature set. Solutions such as [33, 56] have
proposed rule grouping algorithms to split the signatures into ‘m’
subsets, so that a single DFA is created for each of the subsets.
Since the pay- load bytes corresponding to the network streams have
to be compared against the original signature set, they have to be
matched against each of the ‘m’ individual DFAs. So, the cost paid
for the rule grouping is the increase in the processing com-
plexity, i.e., O(m), to process each of the individual payload
bytes. For example, the solution proposed in [56], which is the
state-of-the-art rule grouping algorithm generates 7 DFAs
corresponding to a signature set consisting of only 107 regu- lar
expressions. When the signature set is directly converted into a
single DFA, it consists of more than a million states. After rule
grouping, the 7 DFAs only consist of a total of 15k states.
However, with an increase in the number of sig- natures to be
supported, the total number of groups to which the signatures have
to be split also increases, which linearly reduces the achievable
signature matching throughput. One of the methods which has been
discussed in the literature is the usage of multicore processors to
perform the signature matching function against multiple DFAs in
parallel to hide the linear reduction in the signature matching
throughput. However, the number of DFAs to which the signatures are
split com- pletely depends on the characteristics of the signature
set. So, the scalability of this approach to support increasing
number of signatures is a major concern with this approach
[34].
21
2 State-of-the-Art
• Semi-deterministic FA: Solutions such as the Hybrid Finite
Automata (HFA) [50], the Tunable Finite Automata (TFA) [57], the
DFA with extended character set (DFA-eC) [58], etc., fall into this
category and are semi-deterministic automata representations. In
the case of the HFA, a certain portion of the automaton is
deterministic while the signature structures which lead to state
explosion are made non-deterministic. So, the number of states in
the active set during signature matching varies depending on
whether the deterministic or the non-deterministic portion of the
HFA is traversed. On the other hand, solutions such as [57, 58]
propose to have a fixed number states which are always active
during signature matching. In the case of the TFA, the number of
states which are active is greater than ‘1’ and is predefined to a
fixed value which is configurable (i.e., 2/3/4), while in the case
of DFA-eC, it is always set to 2. With respect to state
compression, more than 90% of the states on average are compressed
by all of these methods while the individual number of states
compressed vary depending on the characteristics of the specific
signature sets. Since there is no standard signature set which is
used for evaluation purposes, evaluating the state compression
results across the techniques discussed above is not straight
foward. In general, the cost paid for the state compression in
these solutions is the increase in the memory bandwidth due to the
semi-deterministic nature of the generated automata. In the case of
the HFA, the memory bandwidth varies dynamically depending on the
specific portion of the automata that is being traversed. In worst
case scenarios, the memory bandwidth required in the case of the
HFA is equivalent to that of the NFA. In the case of the DFA-eC,
the memory bandwidth doubles which also reflects through the 50%
reduction in signature matching performance in comparison to that
of the DFA.
• Decomposed FA: Decomposed FAs are a collection of solutions in
which certain additional control information is augmented to the
state transitions to combat the state explosion. Solutions such as
the eXtended Finite Automata (XFA)[44], the Jump Finite Automata
(JFA)[35], etc., belong to this category. For example, if the
regular expression is of the format ab.*cd, a set bit is used to
indicate that the sub expression ab has been matched in the state
transition. The state transition leading to the state after
matching ‘c’ is only transitioned to, after checking the status of
the set bit. In this way, the state explosion resulting from the
combination of regular expressions with unrestricted wildcard
characters is easily combated. The additional information which is
augmented into the state transitions varies depending on the
characteristics of the signature set. On the other hand, Bechhi et
al. [59] proposed the usage of multiple counters to keep track of
the characters seen in the length restricted regular expressions8,
to evade the state explosion in the DFA. So, the decomposed FAs
introduce additional control information along with the state
transitions, to solve the state explosion problem. Unlike the semi-
deterministic FAs, the memory bandwidth associated with the state
transition fetch in the case of the decomposed FAs is the same as
that of the DFA. However, the additional processing which is
introduced as part of the execution of the state
8Explained as part of scenario 1 and scenario 2 in the previous
section.
22
2.3 DFA Compression
transition function depends on the information augmented in the
state transition [34].
After performing the state compression, the total number of states
in the DFA reduces from S to S′ (S′ S). However, each of the states
in the state compressed DFA stores the state transitions for all
the characters in the alphabet Σ. So, even after performing the
state compression, a major portion of the state transitions are
redundant and have to be compressed through the transition
compression algorithms. So, performing the transition compression
is mandatory in the case of the DFA irrespective of whether the
state compression is performed or not.
2.3.2 Transition Compression
Though there are 256 state transitions for each of the states in
the DFA, a majority of these state transitions are redundant and
leads the machine towards the root state or one of states closer to
the root state [1]. The process of compressing these redundant
state transitions in the DFA is called as transition compression.
As discussed previously, transition compression can either be
performed on the DFA or a state compressed DFA. Various transition
compression algorithms have been proposed in the literature and
they can be broadly classified into the hardware oriented and
software oriented algorithms [39]. This section also discusses the
alphabet compression mechanism which com