# Cairo University Electronics and Communications Department

Computer Arithmetic:

Recent adders

Hossam A. H. Fahmy



Time delays in the blocks of an adder (one-path algorithm) 1/17

© Hossam A. H. Fahmy

The improved two-path



2/17

## Now comes redundancy



#### Now comes redundancy

### Now comes redundancy



Time delays with redundancy

4/17

The representation



The proposed SD format for floating point numbers





- Normalized numbers in the IEEE format are (almost) a subset of the presented format.
- The conversion back to the IEEE format takes place only when a register is to be stored in the main memory and the conversion delay is overlapped with the store operation.

Therefore, contrary to previous designs using SD numbers, the proposed system effectively hides the conversion delay.



General blocks of the proposed adder (two-path algorithm) 8/17

Circuit statistics and simulation results

| For regular binary, if the result starts by few zeros like $0011$  | 0 · · · |
|--------------------------------------------------------------------|---------|
| we need to shift that result to the left in order to normalize it. |         |

Beside that simple case, in the current design we might get:

An extension of N and P recoders, originally proposed by Daumas and Matula in 1997, is used to correctly find the leading digit in such cases.

9/17

Cost?

|                  |                   | n = 60 | n = 80 | n = 120 |
|------------------|-------------------|--------|--------|---------|
| 4 bits per digit | width increase(%) | 33.3   | 31.6   | 29.6    |
|                  | speed up (%)      | 10.5   | 12.5   | 16.7    |
| 8 bits per digit | width increase(%) | 29.3   | 21.1   | 18.5    |
|                  | speed up (%)      | 7.9    | 10     | 14.3    |

Floating point adder trade off when the fan-in limit is 3.

|                         | FP-Adder            | FP-Multiplier       |
|-------------------------|---------------------|---------------------|
| nodes                   | 46845               | 76523               |
| NMOS                    | 63589               | 104695              |
| PMOS                    | 61649               | 105037              |
| Test vectors            | 5000                | 10000               |
| Model delay             | 34 <i>FO4</i>       | 35 <i>FO</i> 4      |
| Sim delay( $0.6\mu m$ ) | 14ns                | 14.8ns              |
|                         | (33.35 <i>FO</i> 4) | (35.25 <i>FO</i> 4) |
| Sim delay( $0.3\mu m$ ) | 6ns                 | 6.4 <i>ns</i>       |
|                         | (32.40 <i>FO</i> 4) | (34.60 <i>FO</i> 4) |

- The addition is the most frequent operation.
- Two subsequent *dependent* additions will take a long time on a pipelined machine.
- Nielsen *et al.* forward a redundant result to the following addition after just two clock cycles.
- In the third clock cycle, the rounding decision is calculated and is forwarded to the dependent addition.
- The fourth clock cycle produces the full non-redundant result.

A detailed design

- The adder accepts one of the operands in a redundant form and in two "packets".
- It is possible to start a new dependent addition every two clock cycles.
- The result is available, correctly rounded, in the IEEE format in four clock cycles.

13/17

## A comparative analysis

Smith *et al.* in 1999 provided a good amount of details for their various designs.

Their paper is instructive to teach you about the trade-off in considering the amounts of shifts needed and the decision on which is the largest significand.

This is one of a few papers that speak about the exponents, the exceptional inputs, and shows a real layout for a fabricated chip.

Seidel and Even in 2001 presented a nice comparative analysis of a large number of adders from the literature.

Although it is not exhaustive (and has minor mistakes such as in reference to work of Smith mentioned earlier) it is a very good source to understand the various trade-offs.

12/17

- The subnormal numbers are not easy to handle. For those interested, see the corresponding paper.
- Some machines must support multiple formats. The IBM mainframes still provide the old hexadecimal side by side with the binary.
- The need to decimal FP is growing and that too should be supported.

16/17

- Addition is very important and hence it received the attention of too many people over time.
- There is still a possibility to improve, especially when it comes to energy consumption.
- Another direction for future research is the support of multiple formats: binary, hexadecimal, and decimal.

17/17