# Digital VLSI Design

# Lecture 3: Logic Synthesis Part 1

Semester A, 2018-19

Lecturer: Dr. Adam Teman

November 7, 2018



Emerging Nanoscaled Integrated Circuits and Systems Labs

Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited; however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed, please feel free to email <u>adam.teman@biu.ac.il</u> and I will address this as soon as possible.

### **Lecture Outline**





# Introduction

....what is logic synthesis?







# What is Logic Synthesis?

- Synthesis is the process that converts RTL into a technologyspecific gate-level netlist, optimized for a set of pre-defined constraints.
- You start with:
  - A behavioral RTL design
  - A standard cell library
  - A set of design constraints

#### • You finish with:

- A gate-level netlist, mapped to the standard cell library
- (For FPGAs: LUTs, flip-flops, and RAM blocks)
- Hopefully, it's also efficient in terms of speed, area, power, etc.

```
module counter(
    input clk, rstn, load,
    input [1:0] in,
    output reg [1:0] out);
always @(posedge clk)
    if (!rstn) out <= 2'b0;
    else if (load) out <= in;
    else out <= out + 1;
endmodule</pre>
```



module counter ( clk, rstn, load, in, out ); input [1:0] in; output [1:0] out; input clk, rstn, load; wire N6, N7, n5, n6, n7, n8;

```
FFPQ1 out_reg_1 (.D(N7),.CK(clk),.Q(out[1]));
FFPQ1 out_reg_0 (.D(N6),.CK(clk),.Q(out[0]));
NAN2D1 U8 (.A1(out[0]),.A2(n5),.Z(n8));
NAN2D1 U9 (.A1(n5),.A2(n7),.Z(n6));
INVD1 U10 (.A(load),.Z(n5));
OA211D1 U11 (.A1(in[0]),.A2(n5),.B(rstn),.C(n8),.Z(N6));
OA211D1 U12 (.A1(in[1]),.A2(n5),.B(rstn),.C(n6),.Z(N7));
EXNOR2D1 U13 (.A1(out[1]),.A2(out[0]),.Z(n7));
endmodule
```

# What is Logic Synthesis?

- Given: Finite-State Machine  $F(X, Y, Z, \lambda, \delta)$  where:
  - X: Input alphabet
  - Y: Output alphabet
  - Z: Set of internal states
  - $\lambda: X \times Z \rightarrow Z$  (next state function)
  - $\delta: X \times Z \rightarrow Y$  (output function)

- Target: Circuit C(G, W) where:
  - G: set of circuit components
    - $G = \{Boolean gates, flip-flops, etc.\}$
  - W: set of wires connecting G



### Motivation

### Why perform logic synthesis?

- Automatically manages many details of the design process:
  - Fewer bugs
  - Improves productivity
  - Abstracts the design data (HDL description) from any particular implementation technology
  - Designs can be re-synthesized targeting different chip technologies;
    - E.g.: first implement in FPGA then later in ASIC
- In some cases, leads to a more optimal design than could be achieved by manual means (e.g.: logic optimization)

### Why not logic synthesis?

• May lead to less than optimal designs in some cases

### **Simple Example**



© Adam Teman, 2017

# **Goals of Logic Synthesis**

#### • Minimize area

• In terms of literal count, cell count, register count, etc.

#### Minimize power

 In terms of switching activity in individual gates, deactivated circuit blocks, etc.

### Maximize performance

 In terms of maximal clock frequency of synchronous systems, throughput for asynchronous systems

### Any combination of the above

- Combined with different weights
- Formulated as a constraint problem
  - "Minimize area for a clock speed > 300MHz"

### More global objectives

- Feedback from layout
  - Actual physical sizes, delays, placement and routing

### How does it work?

#### Variety of general and ad-hoc (special case) methods:

- Instantiation:
  - Maintains a library of primitive modules (AND, OR, etc.) and user defined modules
- "Macro expansion"/substitution:
  - A large set of language operators (+, -, Boolean operators, etc.) and constructs (if-else, case) expand into special circuits
- Inference:
  - Special patterns are detected in the language description and treated specially

     (e.g.,: inferring memory blocks from variable declaration and read/write statements, FSM detection and generation from always@(posedge\_clk) blocks)
- Logic optimization:
  - Boolean operations are grouped and optimized with logic minimization techniques
- Structural reorganization:
  - Advanced techniques including sharing of operators, and retiming of circuits (moving FFs), and others

# **Basic Synthesis Flow**

- Syntax Analysis:
  - Read in HDL files and check for syntax errors.

read\_hdl -verilog sourceCode/toplevel.v

- Library Definition:
  - Provide standard cells and IP Libraries.

read\_libs "/design/data/my\_fab/digital/lib/TT1V25C.lib"

### • Elaboration and Binding:

- Convert RTL into Boolean structure.
- State reduction, encoding, register infering.
- Bind all leaf cells to provided libraries.

elaborate toplevel

#### Constraint Definition:

• Define clock frequency and other design constraints.

read\_sdc sdc/constraints.sdc





write\_hdl > export/netlist.v



# Compilation

....but aren't we talking about synthesis?







# Compilation in the synthesis flow

- Before starting to synthesize, we need to check the syntax for correctness.
- Synthesis vs. Compilation:
  - Compiler
    - Recognizes all possible constructs in a formally defined program language
    - Translates them to a machine language representation of execution process
  - Synthesis
    - Recognizes a target dependent subset of a hardware description language
    - Maps to collection of concrete hardware resources
    - Iterative tool in the design flow



High Level Language

**Assembly Language** 

**Machine Language** 

Machine Interpretation

Control Signal Specification

Program (MIPS)

Program (e.g., MIPS)

Program (e.g., C)

Compiler

Assembler

# **Compilation with NC-Verilog**

 To compile your Verilog code for syntax checking, use the NC-Verilog tool:

ncvlog <filename.v>

- This will quickly run compilation on your Verilog source code and point you to syntax errors.
- Alternatively, use the irun super command:

irun -compile <filename.v>









### It's all about the standard cells...

- The library definition stage tells the synthesizer where to look for leaf cells for binding and the target library for technology mapping.
  - We can provide a list of *paths* to search for libraries in:

set\_db init\_lib\_search\_path "/design/data/my\_fab/digital/lib/"

 And we have to provide the name of a specific library, usually characterized for a single corner:

read\_libs "TT1V25C.lib"

• We also need to provide .lib files for IPs, such as memory macros, I/Os, and others.

Make sure you understand all the warnings about the libs that the synthesizer spits out, even though you probably can't fix them.

### But what is a library?

- A standard cell library is a collection of well defined and appropriately characterized logic gates that can be used to implement a digital design.
- Similar to LEGO, standard cells must meet predefined specifications to be flawlessly manipulated by synthesis, place, and route algorithms.
- Therefore, a standard cell library is delivered with a collection of files that provide all the information needed by the various EDA tools.



Syntax Analvsis

Library Definitior

Elaboration and Binding

Pre-mapping Optimization

Constraint Definition

Technology Mapping

Post-mapping Optimization

### Example

- NAND standard cell layout
- Pay attention to:
  - Cell height
  - Cell width
  - Voltage rails
  - Well definition
  - Pin Placement
  - PR Boundary
  - Metal layers

Ideally, Standard Cells should be routed entirely in M1 !





• Fillers, Tap cells, Antennas, DeCaps, EndCaps, Tie Cells

19

# **Multiple Drive Strengths and VTs**

- Multiple Drive Strength
  - Each cell will have various sized output stages.
  - Larger output stage → better at driving fanouts/loads.
  - Smaller drive strength → less area, leakage, input cap.
  - Often called X2, X3, or D2, D3, etc.

### Multiple Threshold (MT-CMOS)

- A single additional mask can provide more or less doping in a transistor channel, shifting the threshold voltage.
- Most libraries provide equivalent cells with three or more VTs: SVT, HVT, LVT This enables tradeoff between speed vs. leakage.
- All threshold varieties have same footprint and therefore can be swapped without any placement/routing iterations.



### **Clock Cells**

- General standard cells are optimized for speed.
  - That doesn't mean they're balanced...

$$\min t_{pd} = \min\left(\frac{t_{p,LH} + t_{p,HL}}{2}\right) \neq t_{p,LH} = t_{p,HL}$$

- This isn't good for clock nets...
  - Unbalanced rising/falling delays will result in unwanted skew.
  - Special "clock cells" are designed with balanced rising/falling delays to minimize skew.
  - These cells are usually less optimal for data and so should not be used.
- In general, only buffers/inverters should be used on clock nets
  - But sometimes, we need gating logic.
  - Special cells, such as *integrated clock gates*, provide logic for the clock networks.





## **Sequentials**

- Flip Flops and Latches, including
  - Positive/Negative Edge Triggered
  - Synchronous/Asynchronous Reset/Set
  - Q/QB Outputs
  - Enable
  - Scan
  - etc., etc.





Syntax Analysis

Library Definition

Elaboration and Binding

Pre-mapping Optimization

Constraint

Definition

Technology

Mapping

Post-mapping

Optimization

Report and

export

### **Level Shifters**

 Level shifter cells are placed between voltage domains to pass signals from one voltage to another.

INH



- Requires only one voltage
- Single height cell
- LH (low-to-high) shifter
  - Needs 2 voltages
  - Often double height





# Filler and Tap Cells

#### • Filler cells Must be inserted in empty areas in rows

- Ensure well and diffusion mask continuity
- Ensure density rules on bottom layers
- Provide dummy poly for scaled technologies
- Sometimes, special cells are needed at the boundaries of rows - "End Caps"
- Other fillers may include MOSCAPs between VDD and GND for voltage stability - "DeCAP cells"

### Well Taps needed to ensure local body voltage

- Eliminate latch-up
- No need to tap every single cell

### Back or forward biasing for performance/leakage optimization

- N-well voltage different from VDD
- Substrate or P-well (triple well process) voltage different from VSS
- Bias voltage routed as signal pin or special power net



# Engineering Change Order (ECO) Cells

- An Engineering Change Order (ECO) is a very late change in the design.
  - ECOs usually are done after place and route.
  - However, re-spins of a chip are often done without recreating all-masks. This is known as a "Metal-Fix".

#### • ECOs usually require small changes in logic.

- How can we do this after placement?
- Or worse after tapeout???

### • Solution – Spare (Bonus) Cells!

- Cells without functionality
- Cells are added during design (fill)
- In case of problems (after processing) new metal and via mask  $\rightarrow$  cells get their wanted functionality
- Cell combinations can create more complex functions
  - Ex. AND, NAND, NOR, XOR, FF, MUX, INV,...
- Special standard cells are used to differentiate from real cells.



Syntax

DING

# My favorite word... ABSTRACTION!

### • So, what is a cell?

- I guess that the detailed layout is sufficient to know (guess) anything and everything about a standard cell.
- Or it would be easier, if we got the whole Open Access database of the cell...

#### But do we really need to know everything?

- For example, does logic simulation need to know if your inverter is CMOS or Pseudo-NMOS?
- And does a logic synthesizer need to know what type of transistors you used?

### • No!

- To make life (and calculations) simpler, we will abstract away this info.
- Each tool will get only the data it really needs.

### What files are in a standard cell library?

- Behavioral Views:
  - Verilog (or Vital) description used for simulation, logic equivalence.
- Physical Views:
  - Layout of the cells (GDSII format) for DRC, LVS, Custom Layout.
  - Abstract of the cells (LEF format) for P&R, RC extraction.
- Transistor Level:
  - **Spice/Spectre** netlist for LVS, transistor-level simulation.
  - Often provided both with parasitics (post-layout) and without.
- Timing/Power:
  - Liberty files with characterization of timing and power for STA.
- Power Grid Views:
  - Needed for IR Drop analysis.
- Others:
  - Symbols for displaying the cells in various tools.
  - **OA Libraries** for easy integration with Virtuoso.











# Library Exchange Format (LEF)

- Abstract description of the layout for P&R
  - Readable ASCII Format.
  - Contains detailed PIN information for connecting.
  - Does not include front-end of the line (poly, diffusion, etc.) data.
- Abstract views only contain the following:
  - Outline of the cell (size and shape)
  - Pin locations and layer (usually on M1)
  - Metal blockages

(Areas in a cell where metal of a certain layer is being used, but is not a pin)



# Library Exchange Format (LEF)



Syntax Analysis

Library Definition

# Technology LEF

the technology for use by the placer and router: SITE CORE CLASS CORE; • Layers SIZE 0.2 X 12.0; Name, such as M1, M2, etc. END CORE Layer type, such as routing, cut (via) LAYER MET1 Electrical properties (R, C) TYPE ROUTING ; ٠ PITCH 3.5; **Design Rules** • WIDTH 1.2 ; Antenna data SPACING 1.4 ; DIRECTION HORIZONTAL ; Preferred routing direction RESISTANCE RPERSO .7E-01; • **SITE** (x and y grid of the library) CAPACITANCE CPERSQDIST .46E-04 ; END MET1 CORE sites are minimum standard cell size Can have site for double height cells! LAYER VIA TYPE CUT ; IOs have special SITE. END VIA

Technology LEF Files contain (simplified) information about

- Via definitions
- Units
- Grids for layout and routing

Additional files provide parasitic extraction rules. These can be basic ("cap tables") or more detailed ("QRC techfile). These may be provided as part of the PDK.

Syntax Analysis

Library Definition

Elaboration and Binding

**Pre-mapping** 

# Technology LEF

- Cell height is measured in Tracks
  - A Track is one M1 pitch
  - E.g., An 8-Track Cell has room for 8 horizontal M1 wires.
- The more tracks, the wider the transistors, the faster the cells.
  - 7-8 *low-track* libraries for area efficiency
  - 11-12 tall-track libraries for performance, but have high leakage
  - 9-10 *standard-track* libraries for a reasonable area-performance tradeoff



| Parameter              | Symbol         |
|------------------------|----------------|
| Cell height (# tracks) | Н              |
| Power rail width       | W <sub>1</sub> |
| Vertical grid          | W <sub>2</sub> |
| Horizontal grid        | W <sub>3</sub> |
| N-Well height          | W <sub>4</sub> |

© Adam Teman, 2018

# **Technology LEF**

- Cells must fit into a predefined grid
  - The minimum Height X Width is called a SITE.
  - Must be a multiple of the minimum X-grid unit and row height.
  - Cells can be double-height, for example.

### • Pins should coincide with routing tracks

 This enables easy connection of higher metals to the cell.





X

Cell

PR

Grid

Grid



### **The Chip Hall of Fame**

# After checking out two Intel chips, we better not forget ACORN COMPUTERS ARM1 Processor



- Racking up Kahoot points on your smartphone? Then you probably should pay tribute to the granddaddy of that chip inside.
- Release date: April 1985
   Manufactured by VLSI Technology
- Transistor Count: 25,000 Process: 3 um CMOS
- 32-bit ARMv1 architecture
- ARM stands for "Acorn RISC Machine"
- The reference design was written in 808 lines of BASIC!
- Never sold as a commercial product, but as a co-processor for BBC Micro.

2017 Inductee to the IEEE Chip Hall of Fame







# Liberty (.lib): Introduction

- How do we know the delay through a gate in a logic path?
  - Running SPICE is way too complex.
  - Instead, create a *timing model* that will simplify the calculation.
- Goal:
  - For every timing arc, calculate:
    - Propagation Delay (t<sub>pd</sub>)
    - Output transition  $(t_{rise}, t_{fall})$
  - Based on:
    - Input net transition  $(t_{rise}, t_{fall})$
    - Output Load Capacitance (C<sub>load</sub>)

Note that every .1ib will provide timing/power/noise information for a single corner, i.e., process, voltage, temperature, RCX, etc.



 $t_{\rm f}$ 

 $t_{\rm pd}$ 

# Liberty (.lib): General

- Timing data of standard cells is provided in the Liberty format.
  - Library:
    - General information common to all cells in the library.
    - For example, operating conditions, wire load models, look-up tables
  - Cell:
    - Specific information about each standard cell.
    - For example, function, area.
  - Pin:
    - Timing, power, capacitance, leakage, functionality, etc. characteristics of each pin in each cell.



Timing

Timing Attributes Timing Constraints Syntax Analysis

Library

# Liberty (.lib): Timing Models

- Non-Linear Delay Model (NLDM)
  - Driver model:
    - Ramp voltage source
    - Fixed drive resistance
  - Receiver model:
    - Min/max rise/fall input caps
  - Very fast
  - Doesn't model cap variation during transition.
  - Loses accuracy beyond 130nm





# Liberty (.lib): Timing Models

Cell Fall

### Non-Linear Delay Model (NLDM)

• Delay calculation interpolation

| Cap\Tr | 0.05 | 0.2          | 0.5  |
|--------|------|--------------|------|
| 0.01   | 0.02 | 0.16         | 0.30 |
| 0.5    | 0.04 | 0.32         | 0.60 |
| 2.0    | 0.00 | <b>0</b> .64 | 1.20 |

0.1ns

0.12ns

Fall delay = 0.178ns

Rise transition = ...

Rise delay = 0.261ns Fall transition = 0.147ns

| Cell Rise |      |              |      |  |  |  |
|-----------|------|--------------|------|--|--|--|
| Cap\Tr    | 0.05 | 0.2          | 0.5  |  |  |  |
| 0.01      | 0.03 | 0.18         | 0.33 |  |  |  |
| 0.5       | 0.06 | 0 36<br>261) | 0.66 |  |  |  |
| 2.0       | 0.09 | 2012         | 1.32 |  |  |  |

#### Fall Transition

| 1 411 1 | 1 di l'Olti | <u><u> </u></u> |      |
|---------|-------------|-----------------|------|
| Cap\Tr  | 0.05        | 0.2             | 0.5  |
| 0.01    | 0.01        | 0.09            | 0.15 |
| 0.5     | 0.03        | 0.27<br>147     | 0.45 |
| 2.0     | 0.06        | 0.04            | 0.90 |



© Adam Teman, 2018

# Liberty (.lib): Timing Models

### • Current Source Models (CCS, ECSM)

- Model a cell's nonlinear output behavior as a current source
- Driver model:
  - Nonlinear current source
- Receiver model:
  - Changing capacitance
- Requires many more values
- Requires a bit more calculation
- Essential under 130nm
- Within 2% of SPICE.





41 Courtesy: Synopsys

dam Teman, 2018

Syntax Analysis

# Liberty (.lib): Wire Load Models

- How do you estimate the parasitics (RC) of a net before placement and routing?
- Wire Load Models estimate the parasitics based on the *fanout* of a net.

| Net<br>Fanout | Resistance<br>KΩ | Capacitance<br>pF |
|---------------|------------------|-------------------|
| 1             | 0.00498          | 0.00312           |
| 2             | 0.01295          | 0.00812           |
| 3             | 0.02092          | 0.01312           |
| 4             | 0.02888          | 0.01811           |



```
library (myLib) {
                                                                    Post-mapping
  wire load("WLM1")
                                                                    Optimization
    resistance: 0.0006 ; // R per unit length
                                                                     Report and
    capacitance: 0.0001 ; // C per unit length
                                                                      export
    area : 0.1 ; // Area per unit length
    slope : 1.5 ; // Used for linear extrapolation
   fanout length(1, 0.002) ; // for fo=1, Lwire=0.002
   fanout_length(2, 0.006) ; // for fo=2, Lwire=0.006
   fanout_length(3, 0.009) ; // for fo=3, Lwire=0.009
   fanout length(4, 0.015) ; // for fo=4, Lwire=0.015
   fanout length(5, 0.020) ; // for fo=5, Lwire=0.020
   fanout length(6, 0.028); // for fo=6, Lwire=0.028
 /* end of library */
```

Syntax Analysis

Library Definition

Elaboration and Binding

Pre-mapping Optimization

Constraint Definition

Technology Mapping

# **Physical-Aware Synthesis**

- Due to the lack of accuracy, wireload models lead to very poor correlation between synthesis and post-layout in nanometer technologies.
- Instead, use physical information during synthesis
  - Synopsys calls this "Topographical Mode"
  - Cadence calls this "Physical Synthesis"
- Physical-Aware Synthesis basically runs placement inside the synthesizer to obtain more accurate parasitic estimation:
  - Without a floorplan, just using .lef files
  - After first iterations, import a floorplan .def to the synthesizer.

syn\_opt -physical









# Other contents of SC Library

- Many other files and formats may be provided as part of a standard cell library:
  - GDS
  - Verilog
  - ATPG
  - Power Grid Models
  - OA Databases
  - Spice Models
  - etc.



### **Documentation and Datasheets**

- So, are we just supposed to look through and see what the vendor decided to provide us with?
  - Yes!
  - However they probably supplied some PDFs describing the library.
  - And usually there are data sheets with numbers for each corner.



#### Figure 9.6. Logic Symbol of NAND

Table 9.11. NAND Truth Table (n=2,3,4)

| IN2 |                             | INn                                                                                                      | QN                                                                                                                                                                                                                    |
|-----|-----------------------------|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| X   |                             | Х                                                                                                        | 1                                                                                                                                                                                                                     |
| 0   |                             | Х                                                                                                        | 1                                                                                                                                                                                                                     |
|     |                             |                                                                                                          | 1                                                                                                                                                                                                                     |
| Х   |                             | 0                                                                                                        | 1                                                                                                                                                                                                                     |
| 1   | 1                           | 1                                                                                                        | 0                                                                                                                                                                                                                     |
|     | IN2<br>X<br>0<br><br>X<br>1 | IN2            X            0            X            X            X            X            1         1 | IN2         INn           X         X           0         X           X         X           X         X           X         X           X         X           X         X           X         X           X         1 |

| Table 9.12. | NAND | Electrical | Parameters and Areas |
|-------------|------|------------|----------------------|
|-------------|------|------------|----------------------|

|           | Operating Frequent | itions: VDD=1.2 V DC, Temp=25 Deg.C,<br>iency: Freq=300 MHz,<br>dard Load: Csl=13 fF |                                              |         |                    |  |  |
|-----------|--------------------|--------------------------------------------------------------------------------------|----------------------------------------------|---------|--------------------|--|--|
|           |                    |                                                                                      | Power                                        |         | Area               |  |  |
| Cell Name | Cload              | Prop Delay (Avg)                                                                     | Leakage<br>(VDD=1.32 V DC,<br>Temp=25 Dec.C) | Dynamic |                    |  |  |
|           |                    | ps                                                                                   | nW                                           | nW/MHz  | (um <sup>2</sup> ) |  |  |
| NAND2X1   | 1 x Csl            | 51                                                                                   | 336                                          | 15      | 5.5296             |  |  |
| NAND2X2   | 2 x Csl            | 51                                                                                   | 673                                          | 28      | 9.2160             |  |  |
| NAND3X1   | 1 x Csl            | 130                                                                                  | 492                                          | 38      | 11.9808            |  |  |
| NAND3X2   | 2 x Csl            | 142                                                                                  | 770                                          | 59      | 12.9024            |  |  |
| NAND4X0   | 0.5 x Csl          | 66                                                                                   | 400                                          | 22      | 8.2944             |  |  |
| NAND4X1   | 1 x Csl            | 127                                                                                  | 716                                          | 57      | 12.9024            |  |  |



#### www.vlsi.ce.titech.ac.jp/kunieda/lecture © Adam Teman, 2018

### And what about other IPs?

- All IPs will be provided as a library, including most of the views a standard cell library will have.
- These are required for integration of the hard macros in the standard design flow (simulation, synthesis, P&R, verification, etc.)
- Memories (SRAMs) are a special case, as they usually come with a memory compiler that generates the particular memory cut the designer requires.

