# Hammer VLSI Flow and Scaling out with Chiplets

Vikram Jain UC Berkeley vikramj@berkeley.edu



Berkeley Architecture Research



#### **Tutorial Roadmap**





Berkeley Architecture Research

#### Agenda



- Hammer applications
- Overview of Hammer's abstractions
- Hammer community development
- Infrastructure for scale-out with chiplets
- Chiplet-yard for generating chiplets
  - Die-to-die interface generators

•

#### Agenda



- Hammer applications
- Overview of Hammer's abstractions
- Hammer community development
- Infrastructure for scale-out with chiplets
- Chiplet-yard for generating chiplets
- Die-to-die interface generators

#### Hammer for Real Tapeouts



Raven, Hurricane: ST 28nm FDSOI, SWERVE: TSMC 28nm EOS: IBM 45nm SOI, CRAFT: 16nm TSMC,

#### Berkeley Architecture Research

шш

### Many Different Chips!



|                  | Eagle [1]            | HugeFlyingSoC         | NavRx               | WaterSerpent            | MythicChip           | OsciBear [2]   | HDBinaryCore                 |
|------------------|----------------------|-----------------------|---------------------|-------------------------|----------------------|----------------|------------------------------|
| Description      | 9-core RISC-V<br>SoC | 22-core RISC-V<br>SoC | GPS<br>receiver SoC | MU-MIMO<br>baseband SoC | RISC-V SoC<br>for ML | Bluetooth SoC  | Hyperdim.<br>computing proc. |
| Foundry Node     | A 16nm               | A 16nm                | A 16nm              | B 22nm                  | C 12nm               | A 28nm, Sky130 | A 28nm                       |
| Signoff Freq.    | 1.05 GHz             | 1.05 GHz              | 500 MHz             | 2 GHz                   | 1.1 GHz              | 50 MHz         | -                            |
| Hierarchy levels | 3                    | 3                     | 1                   | 3                       | 2                    | 1              | 1                            |
| Person-months    | 22                   | 10                    | 6                   | 5                       | 4                    | 8, 1           | 8                            |







**MythicChip** 

[1] C. Schmidt, et. al, *ISSCC 2021*[2] D. Fritchman et. al, *IEEE SSCS Magazine, Spring 2022* 

# Hammer in Courses

- Introduced in undergraduate digital circuits and systems labs: •
  - http://github.com/EECS150 (ASAP7 and Sky130 plugins)
- Special topics 'tapeout' class
  - Spring 2024: 68 students with a mix of undergraduate and graduate students

2021 EE194/290C: OsciBear TSMC 28nm

2022 EE194/290C: BearlyML (left) & SCuM-V (right) Intel 16nm

SCuM-V'24

Sky130 MPW-2 Skywater 130nm

DSPChip'24

SCuM-V'23: 32b RISC-V core, BLE + 802.15.4, LDOs, references radar





BearlyML'24











| Tech plugins |                            |  |
|--------------|----------------------------|--|
| Foundry      | Node                       |  |
| A            | 16nm FinFET<br>28nm Planar |  |
| В            | 16nm FinFET<br>22nm FinFET |  |
| С            | 12nm FinFET<br>14nm FinFET |  |
| D            | 28nm SOI                   |  |
| Education    | ASAP7<br>FreePDK45         |  |
| Skywater     | 130nm                      |  |

| Tool plugins    |                                                                   |  |
|-----------------|-------------------------------------------------------------------|--|
| Action          | ΤοοΙ                                                              |  |
| Logic synthesis | Genus <sup>C</sup> , Yosys, Vivado <sup>X</sup> , DC <sup>S</sup> |  |
| Place and Route | Innovus <sup>C</sup> , Vivado, OpenROAD,<br>ICC <sup>S</sup>      |  |
| DRC/LVS         | Calibre <sup>M</sup> , ICV <sup>S</sup> , Magic/Netgen            |  |
| Simulation      | VCS <sup>S</sup> , Xcelium <sup>C</sup>                           |  |
| Power, EM/IR    | Joules <sup>C</sup> , Voltus <sup>C</sup>                         |  |
| LEC             | Conformal <sup>C</sup> , Yosys                                    |  |

<sup>c</sup>Cadence <sup>s</sup>Synopsys <sup>M</sup>Siemens Mentor <sup>X</sup>Xilinx



#### Next generation technology nodes





|     | Tech plugins                  |                                                   |  |
|-----|-------------------------------|---------------------------------------------------|--|
|     | Foundry                       | Node                                              |  |
|     | A                             | 16nm FinFET<br>28nm Planar                        |  |
|     | В                             | 16nm FinFET<br>22nm FinFET<br>18A Gate-All-Around |  |
|     | С                             | 12nm FinFET<br>14nm FinFET                        |  |
|     | D                             | 28nm SOI                                          |  |
|     | Education                     | ASAP7<br>FreePDK45                                |  |
|     | Skywater                      | 130nm                                             |  |
| Bei | erkeley Architecture Research |                                                   |  |

| Tool plugins                                                  |                                                                                              |  |
|---------------------------------------------------------------|----------------------------------------------------------------------------------------------|--|
| Action                                                        | ΤοοΙ                                                                                         |  |
| Logic synthesis                                               | Genus <sup>C</sup> , Yosys, Vivado <sup>X</sup> , DC <sup>S</sup>                            |  |
| Place and Route                                               | Innovus <sup>C</sup> , Vivado, OpenROAD,<br>ICC <sup>S</sup><br>Fusion Compiler <sup>S</sup> |  |
| DRC/LVS                                                       | Calibre <sup>M</sup> , ICV <sup>S</sup> , Magic/Netgen                                       |  |
| Simulation                                                    | VCS <sup>S</sup> , Xcelium <sup>C</sup>                                                      |  |
| Power, EM/IR                                                  | Joules <sup>C</sup> , Voltus <sup>C</sup>                                                    |  |
| LEC                                                           | Conformal <sup>C</sup> , Yosys                                                               |  |
| <sup>c</sup> Cadence <sup>s</sup> Synopsys <sup>M</sup> Sieme | ns Mentor <sup>×</sup> Xilinx                                                                |  |

#### Agenda



#### Hammer applications

- Overview of Hammer's abstractions
- Hammer community development
- Infrastructure for scale-out with chiplets
- Chiplet-yard for generating chiplets
- Die-to-die interface generators

### Berkeley Architecture Research

#### Hammer Design Principles

#### 1. Separation of Concerns

Decouple design-, tool-, and tech-specific concerns





### Hammer Design Principles

#### 1. Separation of Concerns

Decouple design-, tool-, and tech-specific concerns

#### 2. Standardization

· Data interchange schema for constraints, options, files







# Hammer Design Principles

#### 1. Separation of Concerns

Decouple design-, tool-, and tech-specific concerns

#### 2. Standardization

· Data interchange schema for constraints, options, files

#### 3. Modularity

Interchangeable & shareable tool & tech plugins







# Hammer Design Principles

#### 1. Separation of Concerns

Decouple design-, tool-, and tech-specific concerns

#### 2. Standardization

· Data interchange schema for constraints, options, files

my\_custom\_tcl

#### 3. Modularity

Interchangeable & shareable tool & tech plugins

#### 4. Incremental Adoption

Berkelev Architecture Research

• Mix reusable & custom solutions





Hammer is:

... a Python framework for abstracting and building standardized flows







Hammer is:

- ... a Python framework for abstracting and building standardized flows
- ... not a typical CAD tool—it generates scripts and manages tool execution







Hammer is:

- ... a Python framework for abstracting and building standardized flows
- ... not a typical CAD tool—it generates scripts and manages tool execution

... proven for architecture exploration, teaching, and research chips

elev **A**rchitecture **R**esearch





Hammer is:

- ... a Python framework for abstracting and building standardized flows
- ... not a typical CAD tool—it generates scripts and manages tool execution
- ... proven for architecture exploration, teaching, and research chips
- ... open-source!

kelev **A**rchitecture **R**esearch





Hammer is:

... a Python framework for abstracting and building standardized flows

... not a typical CAD tool—it generates scripts and manages tool execution

... proven for architecture exploration, teaching, and research chips

... open-source!

keley Architecture Research







#### Hammer Software Architecture





Berkeley Architecture Research

#### Hammer Intermediate Representation (IR)



- Standard data interchange format
  - Constraints, options, intermediate files, etc.
  - YAML for humans, JSON for programs (annotation format)
  - De-embeds designer intent and expertise from Tcl scripts
  - IR Metaprogramming

٠

- Modify any IR key with traceable history, type- and validity-checking
- Mechanism for partitioning and customizing design intent

| design.yml             |  |
|------------------------|--|
| vlsi.inputs:           |  |
| <pre>power_spec:</pre> |  |
| clocks:                |  |
|                        |  |



### Tool and Tech Plugins





Berkeley Architecture Research

#### Hooks and Drivers

Legend



Hammer IR Python Class **Generated Files** 

#### Hooks = customization

- Replace, modify, insert • flow steps (inject Tcl)
- Written by designer or • supplied by tech plugin



#### Hammer Driver

- Parses all IR, hooks •
- Auto-generates hierarchical flow graph as Makefile
- Easy-to-use CLI •

#### Agenda



- Hammer applications
- Overview of Hammer's abstractions
- Hammer community development
- Infrastructure for scale-out with chiplets
- Chiplet-yard for generating chiplets
- Die-to-die interface generators
   Berkeley Architecture Research



#### How: provide sensible defaults with methods to override

| Sensible default                                              | Override method                                   |
|---------------------------------------------------------------|---------------------------------------------------|
| A default set of flow steps for every action (syn, par, etc.) | Hooks - inject your own steps anywhere            |
| Auto-generated timing (SDC) & power (CPF) constraints         | Use your own custom SDC and CPF files             |
| Auto-generated power meshes from high-level parameters        | Use foundry-provided or your own mesh generator   |
| Auto-generated Makefile implementing flow graph               | Running Hammer via command line, custom Makefiles |

Result: gets you 80-90% of the way there out of the box

Easily learn the VLSI flow, get early design feedback
<u>Chipyard examples</u> with ASAP7, Sky130

### Tutorial: TinyRocket RTL-to-GDS

# 

#### https://chipyard.readthedocs.io/en/latest/VLSI/Sky130-OpenROAD-Tutorial.html



Berkeley Architecture Research

### Summary

- Physical design is hard—there are good reasons why most people try to avoid it.
  - Chips are growing in complexity
  - Un-natural evolution of the EDA/PDK stack
- Hammer helps separate design, tool, and technology concerns
  - Enables re-use
  - Enables advanced abstractions and generators
- Easy power and area evaluation
  - Using Hammer, open source PDK, commercial EDA









- Github: <u>https://github.com/ucb-bar/hammer/</u>
- Documentation: <u>https://hammer-vlsi.readthedocs.io/</u>
- Chipyard-specific documentation: <u>https://chipyard.readthedocs.io/en/latest/VLSI/index.html</u>
- Discussions/forum: <a href="https://github.com/ucb-bar/hammer/discussions">https://github.com/ucb-bar/hammer/discussions</a>
- Mentor plugin access request:
  - <u>hammer-plugins-access@lists.berkeley.edu</u>
- UCB Digital Design labs: <a href="https://github.com/EECS150/asic\_labs\_sp23">https://github.com/EECS150/asic\_labs\_sp23</a>
  - full lab releases coming soon!

#### Agenda



- Hammer applications
- Overview of Hammer's abstractions
- Hammer community development
- Infrastructure for scale-out with chiplets
- Chiplet-yard for generating chiplets
- Die-to-die interface generators

### Chiplets in academia



- Industry is embracing chiplets to optimize cost of high-volume performant products
  - Secondary consideration is to reduce the NRE of domain-specific solutions



- Academia Goal: Enable development of complete functional and performant domain-specific systems by de-risking critical pieces (focus on NRE)
  - Chiplets can be reused, if based on standard interfaces
  - Chiplets enables bottom-up approach to scaling
  - Design cost is lower, but not negligible
  - We need to keep innovating, while demonstrating complete systems

# Our motivation for chiplets



chiplet

• We have designed increasingly more complex chips to test our design methodologies



- EagleX: 20-core (X-tile) RISC-V SoC
  - 7.5mm x 7.5mm die in TSMC 16FFC
  - Boots Linux
  - Hard to justify repeatedly building chips like these in Academia
- Build a **base chiplet** that can be shared across multiple platforms
  - Innovation focused on partner chiplets
  - Build many small chiplets to scale up and scale out Berkeley Architecture Research



CHIPYARD

**GEMMINI** 

#### 33

#### Framework for Chiplet Systems

- Need systematic framework for building, evaluating, and testing complete chipletbased systems
- Need standard interfaces between chiplets for plug-and-play support with our and 3rd party IPs
- Tasks:
  - Chiplet-yard extending Chipyard SoC generator for generating chiplets
  - Die-to-die standard interface generators
  - Networks-on-package generators
  - FireSim based FPGA emulation of multichiplet solution





#### Agenda



- Hammer applications
- Overview of Hammer's abstractions
- Hammer community development
- Infrastructure for scale-out with chiplets
- Chiplet-yard for generating chiplets



### **Objective 1: Reusable Base chiplet**



- Build a base chiplet which can be reused across multiple system configurations
- Base chiplet allows for reusability of IPs, with innovation focused on partner chiplets
- Base chiplet would consists of mix of CPUs, NPUs, Memory and D2D interfaces





### **Objective 2: Multi-chiplet configurations**

- Chiplet-yard will enable:
  - scaling of chiplets
  - broad spectrum of mix-and-match systems







#### Objective 3: Full stack ecosystem



Chiplet-yard will enable full stack ecosystem for rapid evaluation and SoP generation



## Extending Chipyard for Chipletization

- Take advantage of existing IP
- Extend bringup infrastructure
  - Configurable bus connections off-chip
  - Make chip to FPGA APIs generic for multi-Chiptop







#### Symmetric Config - Homogeneous chiplets



#### \$CYDIR/generators/chipyard/src/main/scala/config/ChipletConfigs.scala

- 2 identical chips
  - 1 Rocket core
  - 2 Serial-TL Ports: chip<->ext mem, chip<->chip
  - Offchip bus (OBUS) off of system bus (SBUS)
- Each chip can access the memory of the other chip
- Manage 3 clock domains: chip-clock, outgoing-clock, incoming-clock
- Bring up multiple chips simultaneously





- We currently support multi-chip/chiplet configs and simulation in Chipyard
  - Current configs are using non-coherent data exchange

Future works:

- Enable full cache coherency across chiplets
- Integrate UCIe for chip-to-chip communication
- Support chiplet simulation in FireSim
- Network-on-package generator





#### Agenda



- Hammer applications
- Overview of Hammer's abstractions
- Hammer community development
- Infrastructure for scale-out with chiplets
- Chiplet-yard for generating chiplets
  - Die-to-die interface generators

### UCIe Protocol Stack

- Layered protocol
  - Each layer performs distinct function
- Three layers
  - Protocol
  - Die-to-Die Adapter (Link Layer)
  - Physical Layer
- Layers are connected with standardized interfaces
  - Flit-aware D2D Interface (FDI)
  - Raw D2D Interface (RDI)





#### UCIe-lite



We are building generators for a reduced version of UCIe called UCI-lite

Incremental development: UCIe-lite will be extended to full UCIe in later versions

| Protocol Layer                                                                                                                                                                                                                                                                 | D2D Adapter                                                                                                                                                                                                                                                                                    | Physical Layer                                                                                                                                                                                                                                                                                                                                                                                   | Sideband                                                                                                                                                                                                                                                                                                                          |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Streaming mode</li> <li>Raw 64B flit mode</li> <li>CRC added by protocol<br/>layer</li> <li>Single protocol stack</li> <li>Carries TileLink Uncached<br/>and Cached protocol</li> <li>No support for CXL/PCIe</li> <li>No support for other flit<br/>modes</li> </ul> | <ul> <li>Stall req/ack</li> <li>FDI/RDI State Machines</li> <li>Link testing with parity</li> <li>Retrain</li> <li>LinkError</li> <li>Dynamic clk gating</li> <li>Power Management</li> <li>No ARB/MUX</li> <li>No DLLP</li> <li>No Flit cancel mechanism</li> <li>No CRC and Retry</li> </ul> | <ul> <li>Link initialization/training</li> <li>Byte to lane mapping for<br/>data transmission over<br/>Lanes</li> <li>Transmitting and receiving<br/>sideband messages</li> <li>Scrambling and training<br/>pattern generation</li> <li>Free running clock mode<br/>only</li> <li>Streaming mode only</li> <li>No PHY retrain</li> <li>No lane reversal</li> <li>No multi-module link</li> </ul> | <ul> <li>Full spec implementation</li> <li>Reduced message set for<br/>streaming mode-raw only<br/>and for limited states of<br/>D2D/PHY</li> <li>Three types of packets:<br/>Register access packets,<br/>Message without data,<br/>Message with data payload</li> <li>Credit-based flow control</li> <li>8ms timeout</li> </ul> |



#### UCIe-Lite: Protocol Layer

- Protocol adapter connects to a bus/NoC in an SoC
- Sideband node uses memory-mapped registers for Sideband messaging
- We use TileLink as our frontend protocol, can be extended to any other bus protocols such as AXI, CHI, etc.
- Supports TL-Uncached and Cached packet translation
- Converts the TL messages to UCIe Raw Flit and sends over FDI interface to D2D adapter
- Adds checksum using hamming encoder/decoder
- Credit-based flow used to provide backpressure in multidie systems





#### UCIe-Lite: Flit Format v1.0



- Initial version of UCIe Raw 64B flit format used in our implementation
- Reserved bits provided for future expansion



Can be TL channel type, AXI, ReRoCC, Debug, reserved for future

**Credits:** Carries the credit return for the different channels

Reserved in cmd can be used for QoS, discovery, etc.

#### UCIe-Lite: D2D Adapter

- FDI/RDI state machines
- LinkInit module handles the Link Initialization steps of the UCIe specs
- Handles stall req/ack, clk req/ack and wake req/ack





#### 47

# UCIe-Lite: Logical PHY

- Maps bytes to lane
- Link Training we use a RISC-V core to trigger and handle training parameters
  - Link training parameters are stored as MMIO registers for the core to probe and update
- Link Initialization state machines for mainband and sideband
- Pattern generators for testing and scrambler functionality
- Error detector for linktraining





#### UCIe-Lite: Sideband

- Sideband module needs to be on every layer to send sideband messages
- Sideband node handles serdes and credit flow of SB packets
- Sideband switcher controls flow of messages to submodules or stack transfer

Berkeley Architecture Research



128 Decouple

## Integration into Chipyard

- UCIe stack can be attached to Sbus/Obus in Chipyard SoC
- It can also be instantiated as a Tile in Constellation NoC











- Die-to-Die interface are vital to enable performant and energy-efficient chiplet systems
- Open standards helps build an open ecosystem
- We want to provide an open-source generator for UCIe controller and PHY, to enable academia/industry to leverage this ecosystem
- In doing so, we want to showcase some interesting prototypes of heterogeneous chiplet systems
- UCIe-digital: <u>https://github.com/ucb-ucie/uciedigital</u>
- UCle-analog: <a href="https://github.com/ucb-ucie/ucieanalog">https://github.com/ucb-ucie/ucieanalog</a>



