Upload
sebastien
View
219
Download
2
Embed Size (px)
Citation preview
21
978-1-4244-2332-3/08/$25.00 © 2008 IEEE
A Recursive Multifunction Circuit for Leading-Digit
Detection and Comparison
Sébastien Roy
Département de génie électrique et de génie informatique
Université Laval, Québec, Canada, G1K 7P4
T: 1-418-656-2131 ext. 2981, F: 1-418-656-3159, e-mail: [email protected]
Abstract— It is known that fast, fully combinational leading-digit detector circuits can be generated efficiently by recognizingtheir inherent hierarchical structure. It is shown herein that thisstructure is not only hierarchical, but also recursive. This recur-sivity fully defines a minimal-complexity circuit, thus garanteeingoptimal circuit synthesis. Such a circuit having an N -bit operandgenerates all output bits with log
2(N) combinational stages.
It also makes possible a recursive parameterizable descriptionin VHDL or other hardware description languages supportingrecursion. For standard cell generation, it is amenable to efficient,automatic recursive routing. Furthermore, the same recursivestructure can serve as the basis of other useful arithmeticfunctions, such as a fast comparator (log
2(N) stages). Therefore,
such a multifunction circuit could be employed in the designof fast, low-complexity arithmetic-logic units (ALUs) inside mi-croprocessors, digital signal processors, or application-specificsystem-on-chip (SoC) designs.
I. INTRODUCTION
A. Combination synthesis
It is well known that some combinational functions are
difficult to implement in an efficient manner and / or with
a short critical path when the operands are wide because each
output bit is a function of most or all input bits. The classical
case is that of addition where the naive case (ripple-carry
adder) does not scale well, having a delay that is proportional
to operand width. This problem was identified very early in
the history of computing and fundamental techniques, based
on the concepts of propagating and generating positions, were
proposed dating back to the works of Babbage in 1851.
One possibility proposed at the time is the carry-lookahead
concept, where the carry at each position is generated as
an optimal two-stage sum-of-products (SOP) or product-of-
sum (POS) combinational circuit using e.g. Karnaugh maps
or their automated equivalent, the Quine-McClusky algorithm.
Unfortunately, this approach provides little gain since carry
positions of high weight comprise gates of proportionally high
fan-in, thus leading to a significant slowdown in conventional
circuits (e.g. CMOS) in spite of the fact that only two levels
of gates are traversed. Furthermore, the complexity (gate
and / or transistor count) of each individual circuit grows
proportionally with the weight of the position, eventually
making wide adders based on this technique impractical.
This shows that for such circuits, the use of automated
synthesis does not necessarily lead to a good solution. From
a basic truth table description, an optimal two-level synthesis
process would yield the carry-lookahead structure, with all of
its drawbacks. If multi-level synthesis is used, it is not clear
what would be generated, since there is no clear optimality
criterion beyond two levels. It is certain, however, that the
natural structure of the problem would not be recognized. This
requires recognizing the inherent hierarchy in such problems,
leading to the synthesis of circuits yielding intermediate si-
gnals across a number of stages. This would lead to one of
the many known parallel-prefix adder structures.
This shows that finding the optimal architecture for a given
circuit function through automated synthesis remains an open
problem. However, Verma, Brisk and Lenne [1] have proposed
a progressive decomposition approach to impose hierarchy in
the automated synthesis. While this seems promising, it does
not reach the level of optimality possible by applying hierarchy
prior to automated synthesis.
Aside from the difficulty in recognizing hierarchy in some
circuits, automated combinational synthesis suffers from two
drawbacks :
1. When the number of logic levels is allowed to exceed
2 (it is generally the case), no unambiguous definition of
optimality exists.
2. When multiple functions are implemented, it is not clear
how to identify common intermediate signals / circuit
modules which could potentially be shared.
In other words, there is currently no synthesis engine to
which one can say :
“Here are two circuits realizing two distinct functions which
we believe to be related ; synthesize a single circuit which
realizes both functions by maximizing the degree of sharing.”
B. Leading-digit detectors
In any floating-point processor, operand normalization is
a crucial operation. It consists of shifting the operand to the
left until the most significant non-zero digit is in the left-most
position. The number of positions to shift is a function of the
position of the most significant (or leading) non-zero digit.
This is one of many instances where a special circuit is
called for to detect the position of the leading digit. For
normalisation, it makes sense to implement this circuit in such
a way as to obtain the position of the leading-digit by counting
from the left. The said position then corresponds to the number
of leading zeros and gives exactly the number of positions to
shift. In this context, the circuit is referred to as a Leading
22
Zero Detector (LZD) since it counts the number of leading
zeros. The term Leading-Digit Detector (LDD) is used herein
because it is more generic in the sense that : (a) it does not
necessarily imply computing the position from the left ; (b)
the leading-digit is not necessarily a ’1’, depending on the
application and the arithmetic representation.
Fully combinational design of an LDD is no light affair
since each output bit is a complicated function of all the inputs.
It is therefore related to the addition problem since all output
bits are high fan-in functions. Classical logic minimization
techniques such as Karnaugh maps would prove cumbersome
with this problem, and would not yield an efficient solu-
tion. The similarity with addition suggests that hierarchical,
parallel-prefix-like structures are possible. Furthermore, it will
be seen that the optimal solution involves joint generation
of all output bits. However, even software synthesis tools
which have the capability of exploiting commonalities between
many combinational functions will not recognize the simple
recursive structure presented herein.
Okobdzija [2] has identified the hierarchical nature of the
problem. Exploiting that structure, he has shown that parti-
tioning the problem prior to logic synthesis was better than
logic synthesis alone both in terms of area and propagation
delay. Other authors have published similar solutions [3],
[4] afterwards. Furthermore, there are many patents on priority
encoders (synonymous with lead-digit detector), most notably
one by Xilinx [5] from 2006.
None of these works, however, have clearly shown that the
circuit is in fact recursive, i.e. it is self-similar across stages.
This fact leads to a systematic design procedure which can be
embodied in a parameterizable HDL description. Furthermore,
it will be shown that comparison is a related function which
can be handled by the same circuit with little additional logic,
thus yielding a multifunction recursive circuit suitable for
automated recursive synthesis such as described in [6].
II. CIRCUIT ARCHITECTURE
A. Leading-digit detector
To bring out the recursive structure of the leading-digit
detection circuit, we start by examining the case of a 2-
bit operand. In the following discussion, we will assume
without loss of generality that the position Pn we will extract
corresponds to the bit with weight 2n, i.e. the position is
counted from the right, starting with 0. We will further assume
that the leading digit to be detected is a ’1’. The 2-bit LDD
has two outputs : position P and existence E. The existence
output indicates whether there is in fact a ’1’ in the operand.
Hence, if E = 0, the operand is "00" and the P output has no
meaning. The circuit operation as described leads to a fairly
simple truth table (see Table I).
A single OR-gate suffices to implement the 2-bit LZD, as
shown in Figure 1a. This circuit constitutes what we will call
the LDD foundation block and is represented henceforth by
the symbol of Figure 1b.
We can build a 4-bit LZD by replicating the 2-bit LZD
and adding some glue logic. We obtain the circuit of Figure
I1 I0 P E
0 0 X 00 1 0 11 0 1 11 1 1 1
TABLE I
TRUTH TABLE FOR 2-BIT LDD WITH INPUT VECTOR (I1I0).
I1 I0
EP
(a) circuit
LDDFB
P E
I1 I0
(b) symbol
Fig. 1. 2-bit leading-digit detector
2, where the added logic amounts to one gate and one
multiplexer. All outputs are generated with a 2-stage delay.
An arborescent structure becomes apparent when studying a
16-bit LDD circuit, such as pictured in Figure 3. The existence
bits at all levels are generated by a binary tree of two-input OR
gates. From the existence bits at the various levels, position bit
P0 is provided through a binary tree of two-input multiplexers.
A shorter, but otherwise identical, multiplexer tree generates
P1. Likewise, P2 is generated by a single multiplexer. Finally,
P3 is directly equal to E(1, 3) (existence bit of subtree 1 at
stage 3), i.e. it is generated from a zero-height tree.
It is noteworthy that much circuitry is shared by all outputs,
yet P0, P1 and P2 are generated from relatively independent
multiplexer subcircuits. The only common circuitry they share
is the OR-gate tree which generates their inputs. Furthermore,
it should be obvious that the structure can readily be generali-
zed to any desired depth h to handle operands of width 2h. All
outputs are produced with an h-stage delay. In technologies
where multiplexers have shorter delays than logic gates (e.g.
transmission-gate based multiplexers in CMOS), the position
vector would be produced least-significant bit first.
However, the recursive nature of this circuit is not im-
mediately apparent. The OR-gate tree by itself is definitely
recursive, being composed of two identical subtrees of h − 1stages. The various multiplexer trees are also dyadic trees,
E
Sel0
1
I1 I0I3 I2
P0
P1
Fig. 2. 4-bit leading-digit detector circuit.
23
Sel0
1
Sel0
1
Sel0
1
Sel0
1
I1 I0I3 I2I5 I4I7 I6I9 I8I11 I10I13 I12I15 I14
Sel0
1
Sel0
1
Sel0
1
Sel0
1
E
Sel0
1
P0
Sel0
1
Sel0
1
P1
P2
P3
Fig. 3. 16-bit leading-digit detector circuit.
LDD, h
LDD, h − 1 LDD, h − 1
Sel01
E E
Ph−1 E
h − 1
P P
(Ph−2 · · ·P0)
(I2h−1−1 · · · I0)
I
(I2h−1 · · · I2h−1)
I
Fig. 4. Recursive structure of 2h-bits, h-stages LDD.
but complications arise because their inputs are taken at
various points and stages of the OR-gate tree. Nonetheless,
this conceptual difficulty can be lifted, simply by representing
the circuit as a single tree. The recursive structure is illustrated
in Figure 4. It can be seen the right-hand side h − 1-stages
LDD takes I2h−1−1 through I0 as inputs, while the left-hand
side handles inputs I2h−1 through I2h−1 .
B. Comparator
The same structure can act as a comparator by interlacing
the two operands and keeping only the least-significant bit
(P0) of the position output vector (see Figure 5). The output
P0 now indicates which operand is larger while the existence
output E indicates whether the two operands are equal, in
which case output P0 becomes irrelevant. Comparing Figure
5 with the LDD circuit of Figure 4, the only difference is the
foundation block and the absence of the subtrees for P1 and
P2.
The circuit works by extracting the position of the leading
differing digit in the interlaced operands. Only the positions
where the two operands differ are of interest, therefore the
foundation block takes the form of an exclusive-OR gate
(see Figure 5). Indeed, the existence output E will then
Sel0
1
Sel0
1
Sel0
1
Sel0
1
A0 B0A1 B1A2 B2A3 B3A4 B4A5 B5A6 B6A7 B7
Sel0
1
Sel0
1
E
Sel0
1
P0
Fig. 5. 8-bit comparator circuit.
indicate whether all bits are equal (in which case E = 0).
Furthermore, the output P0 will yield the least-significant bit
of the position of the most significant digit which is different in
both operands. Referring to Figure 5, an odd position indicates
that operand A = (A7A6A5 · · ·A0) is larger, while an even
position indicates that A < B.
With slight modifications, such a circuit can easily support
two’s complement operands. It can be verified that if the
sign bits are inverted and fed to the comparator as standard
msb’s, all combinations of signs will be correctly treated.
In other words, the operands in Figure 5 would become(
A7A6A5 · · ·A0
)
and(
B7B6B5 · · ·B0
)
. Another option is to
treat the sign bits separately and feed only the remaining bits
to the tree structure.
C. Generalized comparator / LDD
A slightly more sophisticated tree structure can be evolved
by incorporating two modifiers called “flip” and “invert.”
Both modifiers have advantages for both comparator and LDD
functions. Hence, they will be described here in the context
of a bifunction circuit capable of both comparisons and lead-
digit detection. This duality in functionality affects only the
foundation blocks. The rest of the circuit remains unchanged,
although it is slightly modified from the previous sections to
accommodate the “flip” modifier.
The new foundation block is detailed in Figure 6. Three
new input signals appear which were not present in either the
LDD or comparator circuits : 1- signal F , which corresponds
to the “flip” modifier ; 2- signal Inv, which corresponds to
the “invert” modifier, and 3- signal C/L̄ which selects the
comparator or LDD function.
While the functionality of the third signal should be obvious
(circuit acts as a comparator if it is high, as an LDD otherwise),
the other two signals require some explanations. When Invis low, the circuit functions normally. But when Inv = 1,
all inputs are inverted. For the LDD function, this implies
that the position of the leading zero will be isolated instead
of the leading one. This can be useful for instance when
working with a negative two’s complement operand. Likewise,
the comparator function would then isolate the leading zero
24
An Bn
Sel01
Sel01
Inv C/L̄
Sel0
1
F
P
Sel0 1
E
′0′
Fig. 6. Foundation block for generalized comparator / LDD function
C / LDD, h
C / LDD, h − 1 C / LDD, h − 1
Sel01
E E
Ph−1 E
h − 1
Sel0 1
F
FF
P P
(Ph−2 · · ·P0)
(I2h−1−1 · · · I0)
I
(I2h−1 · · · I2h−1)
I
Fig. 7. Recursive structure of 2h bits, h-stages generalized comparator /
LDD circuit
in the positions that differ in the two interlaced operands.
This identifies the larger magnitude of two negative two’s
complement operands.
The signal F flips the logical weight ordering of the circuit
so that trailing digit positions are identified. With the LDD
function and Inv = 1, this makes it possible to find the
trailing zero position, a prerequisite for a fast incrementer
function (described below). Likewise, combining F = 1 and
Inv = 0 isolates the trailing ’1’ position, a prerequisite for
decrementation.
Figure 7 shows the overall recursive structure of the ge-
neralized circuit. With respect to the comparator and LDD
functions, the only addition in the unifying components is a
2-to-1 multiplexer, necessary to correctly implement the flip
modifier. The Inv and C/L̄ inputs are omitted for clarity.
III. SYNTHESIS RESULTS
Table II shows synthesis results on a Virtex-4 target, more
specifically a 4VFX140 device with a speed grade of 11, for
a 32-bit LDD circuit, 16-bit comparator circuit, and 32-bit
comp / LDD circuit. Table III show similar synthesis results
in CMOS for a 0.18 µm technology by TSMC with a supply
voltage of 1.8V. It can be seen that moving to the joint Comp
/ LDD circuit has no significant impact on the critical path.
On the other hand, it may seem surprising that the number
of FPGA slices is slightly more for the Comp / LDD circuit
than the sum of the two elementary functions. This is easy
Circuit slices LUTs critical path
LDD 32 bits 19 33 8.925 ns
Comp 16 bits 13 22 9.072 ns
Comp / LDD 39 71 11.275 ns
TABLE II
SYNTHESIS RESULTS ON VIRTEX-4 (4VFX140-11) FPGA TARGET
Circuit area critical path
LDD 32 bits 701.9 µm2 2.30 ns
Comp 16 bits 901.5 µm2 1.64 ns
Comp / LDD 3153.4 µm2 2.47 ns
TABLE III
SYNTHESIS RESULTS ON TSMC 0.18 µM CMOS
to explain given the considerable added complexity of the
foundation block which provides much additional flexibility
with its Flip and Inv signals. However, the area of the Comp
/ LDD circuit in CMOS is surprisingly high and warrants
additional investigation in the future.
IV. CONCLUSION
Three recursive circuits have been presented for fundamen-
tal arithmetic operations which are amenable to automatic
scaling and synthesis through structural recursion in VHDL.
This highlights the recursive structure inherent in many arith-
metic operations which are considered to be problematic for
automated synthesis. Exploiting this recursive structure leads
to an elegant design which exhibits many desirable proper-
ties : scalability, minimum depth, equal depth for all outputs,
constant and low fan-in and fan-out. Also, the combined
Comp / LDD circuit shows how two distinct functions can be
combined advantageously in a single circuit. The full potential
of this circuit, which has many secondary modes, has not been
fully explored due to lack of space. It can, among other things,
be used as a basis to produce other functions, such as fast
incrementation / decrementation.
Finally, the concepts presented herein are substantiated by
synthesis results in both Virtex-4 FPGA and 0.18 µm CMOS
technologies.
REFERENCES
[1] A. K. Verma, P. Brisk, and P. Ienne, “Progressive decomposition : Aheuristic to structure arithmetic circuits,” in Proc. Design and Automa-
tion Conference (DAC), San Diego, California, June 4-8, 2007.
[2] V. G. Oklobdzija, “An algorithmic and novel design of a leading zerodetector circuit : comparison with logic synthesis,” IEEE Trans. VLSI
Systems, vol. 2, no. 1, March 1994.
[3] H. Suzuki and al., “Leading-zero anticipatory logic for high-speedfloating point addition,” IEEE J. Solid-State Circuits, vol. 31, pp. 1157–1169, Aug. 1996.
[4] E. Antelo et al., “A novel design of a two-operand normalization circuit,”IEEE Transactions VLSI Systems, vol. 6, no. 1, March 1998.
[5] S. Secatch and J. E. Ogden, “Binary priority encoder,” U. S. Patent7,057,546 B1 assigned to Xilinx Inc., granted June 6, 2006.
[6] P. J. Ashenden, “Recursive and Repetitive Hardware Models in VHDL,”Technical Report TR 160/12/93/ECE, Dept. of Electrical and ComputerEng., U. Cincinnati, Ohio, USA ; also published as Technical Report93–19, Dept. of Computer Science, U. Adelaide, South Australia,Australia