Future Architecture Demands for More Aggressive Packaging





Josh Fryman, PhD Office of the CTO, Intel Fellow Feb 23, 2024



#### Moore's "Page 3" refocused upon by DARPA's ERI programs:



"The total cost of making a particular system function must be minimized. To do so, we could amortize the engineering over several identical items, or evolve flexible techniques for the engineering of large functions so that no disproportionate expense need be borne by a particular array.

It may prove to be more economical to build large systems out of smaller functions, which are separately **packaged and** interconnected. The availability of large functions, combined with functional design and construction, should allow the manufacturer of large systems to design and construct a considerable variety of equipment both rapidly and economically."

- Gordon Moore, Electronics, No. 38, Vol. 8, April 19, 1965







"... there is no spoon."

(There is no box ... package ... limit ...)

IEEE HIR 2024

Images licensed under <u>CC BY-NC</u>, -ND, and -SA



### It's about *systems*, not just ingredients





IEEE HIR 2024

© Intel Corporation

Image Credits: Intel Corp, TomsHardware, ServeTheHome, Argonne National Lab



### It's about *systems*, not just ingredients







Total Cost of Ownership → System Technology Co-Optimization

AI, HPC, Mobile, Medical, AR/VR, Contact Lens, ...

By 2030 scaling up to 50 TB/s DRAM BW, 20 TB/s IO BW, and 100 AI PFLOPS as a single module!



## Are we really going all-in?





Monolithic



#### Polylithic multi-chip







# Heterogenous polylithic tiled



### Are we really going all-in?





IEEE HIR 2024

© Intel Corporation

polylithic tiled



### What about big-opportunity thinking?





"We need to think about packaging 60,000 mm<sup>2</sup> for processing all those LLMs..."

- UCLA Prof. Subramanian Iyer, Director of NAPMP

Assume the equivalent of ~80 SotA GPUs at ~800 mm<sup>2</sup> each

| sq mm  | X,Y dims | pJ/b wires | pJ/b stops | pJ/bd2d | W intern | W extern | Notes                                    |
|--------|----------|------------|------------|---------|----------|----------|------------------------------------------|
| 64,000 | 253      | 1,265      | 316        | 5       | 46       | -        | 3 GHz, 12 TB/s mesh, 10% activity, no    |
| 800    | 28       | 141        | 35         | 1       | 5        | 36       | external switches, only counting fabrics |

#### If that 80-GPU part prices at \$2M, is that okay?

| # units | W fabrics total | OpEx \$M annually | OpEx 5 yr life \$M |
|---------|-----------------|-------------------|--------------------|
| 1       | 46              | 0.16              | 0.82               |
| 80      | 3,287           | 11.83             | 59.17              |



### What about big-opp



|                                                                                | but \$58<br>opportunity                                   | nplistic model<br>3M is a critical<br>7 for optimizat<br>Dr is it? | ion!                                              | -800 mm <sup>2</sup> each                 |
|--------------------------------------------------------------------------------|-----------------------------------------------------------|--------------------------------------------------------------------|---------------------------------------------------|-------------------------------------------|
| SQT.<br>64,000<br>800                                                          |                                                           | ncy, Substrate<br>mory,)                                           | GHz, 12 T Drs.<br>ernal switches, only            | s<br>% activity, no<br>/ counting fabrics |
| # ur                                                                           | ni<br>46<br>3,287                                         | OpEx \$Mra.inually<br>0.16<br>11.83                                | that onay?<br>OpEx 5 yr life \$M<br>0.82<br>59.17 |                                           |
| IEEE HIR 2024 Image credit <u>https://samue</u><br>engineering-professor-to-le | i.ucla.edu/chips-for-america-taps-ucla-<br>ad-rd-program/ | © Intel Corporation                                                |                                                   | intel                                     |



| layers | sq mm  | X,Y dims | pJ/b<br>walk | pJ/b<br>stops | pJ/b<br>d2d | W<br>intern | W<br>extern |
|--------|--------|----------|--------------|---------------|-------------|-------------|-------------|
| 1      | 64,000 | 253      | 1,265        | 316           | 5           | 46          | -           |
| 1      | 800    | 28       | 141          | 35            | 0           | 5           | 36          |
| 10     | 64,000 | 80       | 400          | 100           | 1           | 14          | _           |
| 100    | 64,000 | 25       | 126          | 32            | 2           | 5           | -           |
| 800    | 64,000 | 9        | 45           | 11            | 3           | 2           | -           |

| # units | W fabrics | \$M annually | 5 yr life \$M |
|---------|-----------|--------------|---------------|
| 1       | 46        | 0.16         | 0.82          |
| 80      | 3,287     | 11.83        | 59.17         |
| 1       | 14        | 0.05         | 0.26          |
| 1       | 5         | 0.02         | 0.08          |
| 1       | 2         | 0.01         | 0.03          |

Each layer thinned to ~10um

Real 3D

© Intel Corporation

# Real 3D



INTEGRATION ROADMAP

 $\mathcal{M}$ 

extern

W

intern

Now a \$60M opportunity that doesn't risk fundamental economic hurdles ... only technical hurdles. How far can we go? (Still grossly simplistic)



pJ/b

d2d

stops

| _ |    |       |               |  |  |  |
|---|----|-------|---------------|--|--|--|
|   | CS | ΨTr.  | 5 yr life \$M |  |  |  |
|   | 46 | 0.16  | 0.82          |  |  |  |
|   | 87 | 11.83 | 59.17         |  |  |  |
|   | 14 | 0.05  | 0.26          |  |  |  |
|   | 5  | 0.02  | 0.08          |  |  |  |
|   | 2  | 0.01  | 0.03          |  |  |  |
|   |    |       |               |  |  |  |

Each layer thinned to ~10um

IEEE HIR 2024

© Intel Corporation

80

### Package sprawl vs. package scrapers



- Assume ~8mm x ~8mm die size
  - 800 layers makes it ~8mm tall
- HBI-TSV density drives bandwidth by pitch
  - 9um → 12k/mm<sup>2</sup> → 800K total
  - 3um → 111k/mm<sup>2</sup> → 7.2M total
- Assume ~67% for power/gnd,~33% for IO
  - ~0.27-2.4MIO signals per1GHz
  - ~0.260 2.4 Pbps total bidir BW per GHz
- 16 TB/s/dir 150 TB/s/dir IO capable per GHz
  - Lots of floorplanning, thermal concerns
  - Future delivery of ≤1 um HBI-TSV resolves wiring challenges
  - Bottom die PHY area limits IO sustainable
- Some open hurdles
  - Time on tools
  - EDA, DFX, DFT, FA, etc.
  - Lack of lateral connectivity between high-rises

#### Each layer thinned to ~10um

IEEE HIR 2024



#### **Capacity per Core**

<sup>^</sup>Not counting overheads for access





# Challenges to achieving dense 3D

er / Ground

### Manufacturing:

- edge polish
- right-angle attach
- via density
- bonding speed
- rework support
- redundancy
- tooling Z-height
- wafer thinning

#### Power and Thermals:

- thermal density
- power delivery
- cooling layers
- heterogeneous material

High Bandwidth <u>Large</u> Memory Stack

Photonics



HETEROGENEOUS INTEGRATION ROADMAP EDA:

- rotated die
- taper point isolation
- non-planar libraries
- formal verification
- DF<all>1,000 layers
- scan chain time
- tests and coverage

#### Design:

- graceful degradation
- extreme interop
- built-in redundancy
- pluggable modules
- abstractions for all

Legacy I/O

# Challenges to achieving dense 3D

wer / Ground

### Manufacturing:

- edge polish
- right-angle attach
- via density
- bonding speed
- rework support
- redundancy
- tooling Z-height
- wafer thinning

#### Power and Thermals:

- thermal density
- power delivery
- cooling layers
- heterogeneous material

#### High Bandwidth Large Memory

Photonics

Modularity in design, reworkable components, interop standards, and time to develop these tools and flows are critical to enabling this 3D "solution in a cube" future



tests and coverage

RATION ROADMAP

#### Design:

- graceful degradation
- extreme interop
- built-in redundancy
- pluggable modules
- abstractions for all

Legacy I/O

### About roadmaps ... hope is not a strategy







**Building An Open Chiplet Economy** 

#### Introduction

High-performance, power-efficient threedimensional system-in-package designs with universal chiplet interconnect express

Debendra Das Sharma 🖾, Gerald Pasdast, Sathva Tlagarai & Kemel Avgüri

421 Accesses | 1 Altmetric | Metrics

Abstract

### • $BW/mm(2x), /mm^2(4x), /mm^3(8x)$

IO escape BW/mm (4x), /mm<sup>2</sup> (8x)

IO yrs to "catch up" and then reassess

Some packaging directions to 2035

Every 2 yr scaling factor targets

X, Y dimension (1.5x, 1.5x)

• Al will save us! Pixie dust on all!

AI will reduce design time

It will not innovate – yet

- Z dimension (4x)
- pJ/b/d2d xing (0.7x)
- Cooling W/mm<sup>2</sup> (4x), /mm<sup>3</sup> (8x)

© Intel Corporation