

# Introduction to Physical Compiler and ILM Flow

Timothy Chiun
Simon Koval
Corporate Applications Engineering



# Agenda

- Introduction to Physical Compiler
- Hierarchical Physical Synthesis with Interface Logic Models



# Introduction to Physical Compiler

Timothy Chiun CAE, Physical Compiler



#### Introduction

- Problems
- Current Flows and Issues
- Synopsys Physical Synthesis Solution
- What is Physical Compiler?
- Running Physical Compiler
- Physically Integrated Methodologies
- Summary



#### Introduction

- Problems
- Current Flows and Issues
- Synopsys Physical Synthesis Solution
- What is Physical Compiler?
- Running Physical Compiler
- Physically Integrated Methodologies
- Summary



- Achieving Timing Closure
- Scaling the Design Process for Multi-Million Gate Chips



#### Synthesis

- WLMs are statistical
- Constraints are estimated
   set\_input\_delay,
   set\_output\_delay, set\_load,
   set\_driving\_cell,
   set\_clock\_skew, etc.





#### Place/Route

Estimates for wire delays are off!

Nets with the same fanout have very different delays in the placed design

ECO's are required
 Timing closure becomes
 a moving target





# Flows and Issues – Traditional Flow





# Unifying Synthesis & Placement

## 1. Front-end timing is becoming unreliable

 With traditional flows, all nets with the same fanout have the same estimated interconnect delay during front-end design





# Unifying Synthesis & Placement is the Best Technical Solution

- 1. Front-end timing is becoming unreliable
- 2. Placement can change timing dramatically
  - After placement, it is obvious that nets with the same fanout will not have the same interconnect delay





# Unifying Synthesis & Placement is the Best Technical Solution

- 1. Front-end timing is becoming unreliable
- 2. Placement can change timing dramatically
- 3. Detailed routing has only a minor effect when good global routing is done to model interconnect





# Placement is Key!





# Physical Compiler: Power of Synthesis + Placement

- Faster timing closure
- Better timing correlation between synthesis result and post-layout result
- Resultant netlist is more intelligently created, resulting in reduced die area
- World class cell placement technology produces highly routable designs that meet timing





# Synopsys Physical Synthesis

Complete RTL to GDSII Solution





- Introduction
  - Problems
  - Current Flows and Issues
  - Synopsys Physical Synthesis Solution
- What is Physical Compiler?
- Running Physical Compiler
- Physically Integrated Methodologies
- Summary



# What is Physical Compiler?





# Physical Compiler is...

Not using wireload models



- Unifies synthesis and placement
- Produces an optimized gate-level netlist AND cell placement from :
  - RTL description
  - Existing gate-level netlist
- Works on a single, flat block of physical hierarchy
  - Logical hierarchy is maintained



### RTL or Gates? Your choice

- RTL to Placed Gates (RTL2PG)
  - Creates an optimized netlist concurrent with placement starting from RTL
  - Provides the highest level of flexibility in architectural choices to meet design goals
- Gates to Placed Gates (G2PG)
  - Optimizes the netlist concurrent with placement starting from a gate-level netlist
  - Least impact to existing design flows
  - Higher capacity



# Physical Compiler RTL2PG Flow





# Physical Compiler G2PG Flow





## Incremental Optimizations

#### Placed Gates to Placed Gates

- Fine-tune after insertion of additional cells
- Invoke additional effort to deal with design-specific issues (timing/congestion)

#### After Routing

- Clear up post-extraction timing violations
- Maintain as much back-annotated timing information as possible
- Meant for last minute timing fixes



# Physical Compiler Package

- Translation Utilities
  - lef2plib
  - def2pdef
  - db2def5
- Shell Tool psyn\_shell
- GUI Tool psyn\_gui



## Physical Compiler GUI





## Viewing the Floorplan





# Analyzing the Histograms





## Schematic with Selection





## Floorplan with Selected Path





- Introduction
  - Problems
  - Current Flows and Issues
  - Synopsys Physical Synthesis Solution
- What is Physical Compiler?
- Running Physical Compiler
- Physically Integrated Methodologies
- Summary

# Data Required to Run PC: PC Input/Output files



# SAN JOSE 2002

# LEF - Cell Data (Physical Library)





## Physical Library Utility - lef2plib

- 'lef2plib' is part of the standard installation
- Converts LEF syntax to Synopsys physical library syntax (PLIB)
- Physical library file can be saved into a binary DB (PDB) using read\_lib /write\_lib commands



# SAN JOSE 2002

## Input Files – Floorplan Data





# Input Files – Logical and Physical Hierarchy

- The logical hierarchy is maintained throughout the Physical Compiler flow
- Physical Compiler manages both physical and logical hierarchies
- Both are persistently stored in the .db file



# Logical & Physical Hierarchy Example - Optimizing a Whole Chip

- Only 1 PDEF is annotated to 'TOP'
- Physical hierarchy is flat





# Logical & Physical Hierarchy Example - Optimizing Floorplan Blocks

- Three PDEFs are annotated to modules
- Floorplan hierarchy is maintained





#### Floorplanning Objects





#### Floorplan Utility - def2pdef

- 'def2pdef' is part of the standard installation
- Converts DEF floorplan information to IEEE PDEF syntax
- DEF v5.3 is supported
- Compiled technology file (.pdb) must exist





#### Handling Obstructions

- PDEF describes all floorplan data
- Power nets can be full or partial blockages
- RAMs can have placement keepouts
- User can add additional obstructions
  - Routing layers or placement



#### Floorplanning Commands

 PC provides several commands that can be used to help floorplan the design (if the incoming PDEF needs additional information).

- Command
- set\_placement\_area
- create site row
- create obstruction
- set cell location
- set\_port\_location

- -> creates the core area for coarse placement
- -> creates sites for detailed placement
- -> create a layer specific or placement obstruction
- -> sets the x y location of a specific cell
- -> sets the x y location of a specific port
- set\_dont\_touch\_placement-> creates the 'fixed\_placement' restriction for a cell
- set bounds
- set\_keepout\_margin
- set dont touch
- set ideal net

- -> controls grouping of specific cells
- -> creates keepouts for specific cells
- -> prevents optimization but allows placement
- -> specifies net as having zero weight for placement prevents clumping of cells. Typically used for clocks, resets and test enables, etc.



# Setting up for Physical Compiler

- Create your RTL as usual
- Apply your design constraints as usual
- Invoke your TCL scripts as usual
- Apply floorplan data
- Forget about wireload models



#### Example RTL2PG Script

```
# Compile RTL using floorplan information.
read verilog top bob.v
                            # constraints.tcl
current_design uncle
                            set_operating_conditions { WCCOM }
uniquify
                            set_load [ load_of Core/Buf1/A ] [all_outputs]
                            create_clock -period 8 CLK
link
                            set_dont_touch network [ get_clock CLK ]
source constraints.tcl
                            set_input_delay 0.5 -clock CLK [all_inputs]
                            set_output_delay 1.5 -clock CLK [all_outputs]
# Read Floorplan Info...
                            set_max_fanout 5 uncle
                            set max transition 0.8 uncle
read pdef bob.pdef
# Compile...
compile_physical -congestion
report_timing -input_pins -nets -physical
write pdef -v3.0 -output placed_bob.pdef
write -format db -hierarchy -output placed bob.db
exit
```



#### Example G2PG Script

```
# Gates to placed gates using floorplan information.
read db bob.db
                            # constraints.tcl
current_design uncle
                            set_operating_conditions { WCCOM }
link
                            set load [ load of Core/Buf1/A ] [all outputs]
                            create_clock -period 8 CLK
source constraints.tcl
                            set dont touch network [ get clock CLK ]
                            set_input_delay 0.5 -clock CLK [all_inputs]
# Read Floorplan Info...
                            set_output_delay 1.5 -clock CLK [all_outputs]
read pdef bob.pdef
                            set_max_fanout 5 uncle
                            set max transition 0.8 uncle
# Compile...
physopt -congestion
report_timing -input_pins -nets -physical
write pdef -v3.0 -output placed bob.pdef
write -format db -hierarchy -output placed bob.db
exit
```



#### Running compile\_physical (RTL2PG)

- Uses all floorplan information present
  - Optimizes RTL design based on placement
  - Produces a fully legal result
- Same switches as 'compile' plus
  - -congestion
  - -timing\_driven\_congestion
  - -congestion\_effort



#### Running physopt (G2PG)

- Uses all floorplan information present
  - Optimizes an existing gate level design based on placement
  - Produces a fully legal result
- Same switches as 'compile\_physical' plus
  - -check\_only



#### Incremental Optimizations

- physopt -incremental
  - Uses existing placement as starting point for physopt, initial placement is skipped
- physopt -incremental -eco
  - Allows the merge of new / changed leaf cells
- physopt -incremental -post\_route
  - Maintains backannotated timing
- -size\_only
  - Allows only cell sizing to take place
  - Works on all the above incremental modes



#### Incremental -size\_only





## Incremental -in\_place\_size\_only





#### Post-Route Physical Compiler Flow





#### Congestion-Driven Placement

- Additional checks provided during placement prevent routing congestion.
  - Congestion estimates used to modify placement
  - Techniques such as "adaptive tuning of cell density" in areas of high congestion
- Use congestion mode only when the design has a congestion problem.



#### **Tactical Commands**





#### Reporting Commands

- PC provides several 'report' commands that detail physical information
- report\_area -physical
- report\_lib -physical
- report\_cell -physical -only\_physical
- report net -physical -only physical
- report\_clusters
- report\_port -physical
- report\_design -physical
- report\_congestion
- report\_timing -physical
- report\_keepout\_margin
- report bounds

- -> shows the size of the core area and aspect ratio
- -> shows physical library information
- -> shows the cell location and orientation
- -> shows the net total length and pre-routes
- -> shows the physical cluster hierarchy
- -> shows the physical location of the port
- -> shows size, area, aspect ratio, orientation, utilization and obstruction information.
- -> shows the congestion prediction for the current placement.
- -> shows location of pins and capacitive loads on the nets in the reported timing path.
- -> lists keepout margins for specified cells
- -> lists type and size of cell groupings



#### Powerful and Flexible

- Different sets of commands/switches address different problems
  - Each design has it's own unique problems
  - The commands provide flexible, powerful solutions for all your designs
- Proceed step by step, and check your logs
  - Use the reporting commands and the GUI



#### Output Files





#### Preparing Routing Files

psyn\_shell-t>
 change\_names -rules verilog
 write -format db -hierarchy -output design.db
 write -format verilog -hierarchy -output design.v

 write\_pdef -v3.0 -output design.pdef



#### Creating DEF

- All floorplan data is written to DEF v5.2
- UNIX > db2def5

```
-search "lib_dir1> lib_dir2>"
        Directory containing library files
-pdb <pdb1> -pdb <pdb2>
        Name of PDB file(s) to use
-out <DEF_file>
<design_DB>
```



- Introduction
  - Problems
  - Current Flows and Issues
  - Synopsys Physical Synthesis Solution
- What is Physical Compiler?
- Running Physical Compiler
- Physically Integrated Methodologies
- Summary



# Physical Compiler in the Flow

- Floorplanning from Chip Architect
- Timing analysis with PrimeTime
- Scan Methodology
  - Placement-based scan ordering
- Power Optimization
  - Clock gating and logic optimization
- Datapath Optimization
  - Use structured placement from MC (or not)



- Introduction
  - Problems
  - Current Flows and Issues
  - Synopsys Physical Synthesis Solution
- What is Physical Compiler?
- Running Physical Compiler
- Physically Integrated Methodologies
- Summary



#### Summary - Physical Compiler

- Concurrent Synthesis + Placement Tool
  - Produces netlist & highly routable placement that meet timing
  - Improves Productivity, reduces iterations
- Easy to Adopt
  - Proven in Cadence, Avant! and IBM Flows
- Best Technology
- Proven Customer Success
  - 96+ 156+ 186+ 200+ Tapeouts



# Hierarchical Physical Synthesis with Interface Logic Models

Simon Koval Physical Synthesis CAE



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



- Provide design abstraction capability that can be used throughout the hierarchical design implementation flow
- Improve runtime and capacity compared to using original netlist
- Generate highly accurate model
- Make ILMs easy to use
- Easy to debug



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



### **Extracted Timing Models**





# Original block ■ X B **Y** CLK-ILM - X **B** = CLK-



#### ETM vs. ILM Comparison



#### **Extracted Timing Models**

- Moderate model generation times, runtime improvement, reasonable accuracy
- Use for IP Reuse, 3rd Party Tools, non-STA Tools
- Hides implementation details



#### **Interface Logic Models**

- Fast model generation times, highly accurate, context independent model
- Use for Hierarchical STA / Chip-level optimizations
- Improves runtime and decreases memory for chip-level tasks



#### PrimeTime ILM vs. Synthesis ILM

- PT ILM is flat
- Has no physical information
- Written out as a verilog netlist
- Supports distributed parasitics and SDF
- Used for STA

- PC ILM maintains original block logical hierarchy
- Can be generated with physical information
- Can be written out in DB format with no loss of attributes
- Supports set\_load and SDF
- Can be used for STA as well as design implementation
- Placement and back annotation data can be propagated up to the top-level



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



- Define clocks for the design
- Apply constraints on the design (optional)
- Apply back annotation data (set\_load & SDF) onto the design (optional step for post route flow)
- Create ILM for some/all top-level blocks
- Create pdbs for the blocks in CA (optional)
- Replace original designs with ILMs
- Run physopt at the top-level
- Write out the top-level design and top-level PDEF





# **block db** (post physopt)

#### Physical Compiler

read\_db block1.db; link create\_clock -period pd clk set\_clock\_skew -ideal clk



model db (replaces original design in memory)

## ILM script example

```
read db top.db
# Replace the original design with the ILM
remove_design block1
read_db block1_ilm.db
current_design top
link
read_pdef top.pdef
# Propagate placement and timing information for all ILMs
propagate placement up -adjust location -verbose
propagate annotated delay up
physopt
```

# SAN JOSE 2002

### ILM script example (continued)

```
# Save the top-level design
write -f db -out top_level_only_post_physopt.db

# Write out the top-level PDEF to route the design
# in a hierachical manner
write_pdef -no_hierarchy -o top_level_only_post_physopt.pdef
exit

%# To write the top-level DEF from unix shell
% db2def5 -no_hierarchy top_level_only_post_physopt.db \
-out top_level_only_post_physopt.def
```



#### ILM Commands/Reporting

```
    identify_interface_logic [-latch_levels levels] \
        [-ignore_ports port_list]
    extract_ilm -output filename [-physical] [-verbose] \
        [-include_side_load boundary | all | none]
    propagate_annotated_delay_up [cell_list]
    propagate_placement_up [-adjust_location] [-verbose] \
        [cell_list]
    report_area
    report_design
    report_cell -physical
    write_pdef [-no_hierarchy]
    db2def5 [-no hierarchy]
```



### Identifying Interface Logic

- identify\_interface\_logic [-latch\_levels levels] \ [-ignore\_ports port\_list]
- Default behavior includes all interface latches in the ILM
- Use –latch\_levels levels to limit the number of latch levels for which time borrowing can occur for latch chains that are part of the interface logic. For example, if levels = 1, then a path originating from an input port will continue through the first latch encountered, but will stop at the second latch in the path. The second latch is treated as an edge triggered register.
- Use –ignore\_ports for ports like reset and scan\_enable that fanout to all registers. Ignored ports are included in the ILM, but not fanout / fanin logic from input / output of these ports.



## ILM Sideload Description





## ILM Propagate Placement Up





#### ILM Propagate Placement Up



If the lower left co-ordinate of the reference design for Block1 is (0,0), when Block1 is placed in Top at (x2,y2), the location of the cell U1 placed at (x,y) in Block1 will become (x+x2, y+y2) after running the command propagate\_placement\_up -adjust\_location.



## ILM Delay Estimation

- Use propagate\_annotated\_delay\_up before running physopt
- physopt uses ILM delay and capacitance annotations for nets fully within the ILM (i.e. ILM2/n2).
- For nets crossing ILM boundary, ILM cell and port locations are used to compute wirelength and estimate the delay (example n1, n3)





#### Obstructions for ILMs

- Create physical lib cell for Synthesis ILM block for providing obstruction information to PC.
  - Can be done using Chip Architect (write\_abstraction)
     as .pdb or LEF.
  - Required for rectilinear blocks / over-the-block routing
- If an ILM block does not have a pdb cell, physopt will automatically derive the obstruction information
  - Derived obstruction is rectangular.
  - Creates a placement and all-layer routing obstruction over ILM blocks.
- Cells within the ILM are automatically marked as dont\_touch and dont\_touch\_placement



- Use –ignore\_ports option to identify\_interface\_logic for ports such as reset, scan\_enable that fanout to all registers (if not used then ILM could be excessively large).
- Specify the number of latch levels to be included by using the –latch\_level option to identify\_interface\_logic.
- Use the –physical option to extract\_ilm to create an ILM with physical information
- Use a pdb model for each ILM to not incur the restrictions caused by using a derived obstruction (routing obstruction on all layers & rectangular obstruction model).
- Designs with registered inputs and outputs will see the greatest reduction in size (original netlist vs. ILM)



<sup>\*</sup> Use PT ILMs if using distributed parasitics



## Full-Chip Analysis / Top-level Optimization with ILMs



- ILM allow accurate analysis of paths between blocks
  - User can efficiently find timing problems due to top-level routing, snake paths, etc.



## Block Analysis / Budgeting with ILM



 Efficiently analyze block level netlist timing in context of entire chip by representing surrounding blocks as ILM



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



## Using ILM in DFT Compiler

- DFT Compiler is using a similar approach to address capacity and performance issues
- Test behavior of a design is abstracted into a "Test Model"
  - Test model is based on IEEE Proposed Standard Core Test Language (CTL)
  - Contains information such as: scan-in, scan-out, scan enable, async set/reset, scan clock, chain count, chain length, scan shift timing, etc
- At top level, DFT Compiler uses information contained in Test Models to perform Test Design Rule Checking and scan insertion / assembly
- CTL is stored as an attribute attached to the design; use of test models is transparent to users



## ILM containing Test model

#### Block level scan insertion:





#### Block level scan insertion

```
set test_use_test_models true
rtldrc
...
set_scan_configuration ...
preview_dft -physical
insert_dft -physical
check_dft
list_test_models
write -f verilog
write -f db
...
extract ilm
```

#### Top level scan insertion

```
set test_use_test_models true
read_db <mdb>
list_test_models
...
set_scan_configuration ...
preview_dft -physical
insert_dft -physical
check_dft
list_test_models
...
write -f verilog
write -f db
```



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



Design 'A'

Full chip memory reduction: 63% (2.7X) Original: 2659 MB; With ILM: 994 MB





Design 'A'

Full chip reduction: 81%

Original: 510,009 ILM: 96,939





memory reduction: 74% (3.9X) original: 334 MB; ILM: 85 MB





memory reduction: 86% (7X) original: 1427 MB; ILM: 203 MB





## Results (cell count)

#### Design 'D'

|      | netlist | ilm   | % reduction |
|------|---------|-------|-------------|
| blka | 268918  | 8708  | 96.76       |
| blkb | 233655  | 3958  | 98.30       |
| blkc | 235448  | 31536 | 86.60       |
| blkd | 153656  | 7986  | 94.80       |
| blke | 200921  | 9757  | 95.14       |



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



#### 2001.08 release

- Provided ILM extraction capability as Beta feature
- Enhanced PC to use ILM

#### 2001.08-PSYN-JET release

- Support ILM extraction and usage in PC as a production feature
- Provide commands to propagate ILM data to top-level

#### **Future Enhancements**

- Top-level scan insertion using ILMs
- Top-level clock tree synthesis using ILMs
- Add commands to derive / remove obstructions
- Enable optimization within ILM (resizing / buffering)



- ILM Goals
- Modeling Concepts
- ILM Flow
- ILM and Test
- Modeling Results
- Roadmap
- Summary



#### Advantages of ILM

- High accuracy, easy to use/debug
- Model is context independent
- Contains physical info such as port and cell locations
- Preserves logical hierarchy and constraints
- Can be used by Physical Compiler, Design Compiler, Chip Architect, DFT Compiler, \*PrimeTime (\*use PT ILMs with distributed parasitics)
- Can be written out in DB format