# Architecting Chiplets for Product Manufacturing Test Resiliency

Abram Detofsky, IEEE Member and Intel Principal Engineer Intel Corporation, 2200 Mission College Blvd, Santa Clara CA 95054; abram.m.detofsky@intel.com

Abstract—The number and variety of chiplet-based designs is expected to flourish in the coming years, driven by manufacturing optimization, R&D costs, product velocity and Design flexibility needs, amongst others. This article describes the architectural Design and Test challenges and proposed strategies to enable resilient products in a chiplet-based world.

Keywords— Semiconductor, Multi-die Packages, Heterogeneous Integration (HI), Test, Functional Test, Structural Test, System Level Test (SLT), Production Test Flow.

### I. INTRODUCTION

The relentless evolution of performance-hungry and diverse customer workloads can be addressed by innovations in silicon scaling, algorithm, software, and architectural changes and through advanced packaging [1]. Fig. 1 shows a number of major innovation vectors that need to be pursued for continued compute performance scaling. Historically, the emphasis has been primarily on the silicon and software-based vectors. In recent years there has been a renaissance of advanced packaging innovation to meet the needs of new customer workloads. Gordon Moore predicted this day back in 1965 when he stated, "It may prove to be more economical to build large systems out of smaller functions which are separately packaged and interconnected." [2]



Fig. 1: To continue to deliver compute performance improvements, scaling is needed along numerous hierarchy vectors. [3]

Chiplets are integrated circuit blocks that have been specifically designed to work with other similar chiplets to form larger more complex chips [4]. Common drivers for creating a "disaggregated" chiplet-based design are silicon vield optimization, product velocity and design flexibility. However, once chiplets are brought together to form 2.5D and 3D assemblies on a package substrate, creating robust package assembly, manufacturing test and component debug processes can be quite daunting. Package assembly is challenged by the sheer number and densities of interconnects that need to be assembled reliably on an organic substrate or silicon interposer and that guarantee reliability over the full range of customer use conditions. Manufacturing Test is challenged to find defects in chiplets and chiplet-to-chiplet interfaces which often span across differing process nodes, IP vendors, differing chiplet supplier quality goals and test coverage. Component debug is challenged to quickly provide power-on silicon health feedback to design and manufacturing as well as root-cause customer returns in a chiplet-based design where fault-isolation may not be possible without the right silicon design features in place.

One definition of "resiliency" is "the capacity to recover quickly from difficulties; toughness." [5] Compute systems that are resilient are tolerant of a certain number of failures without noticeably impacting system performance or the user experience. These failures can emerge during the manufacturing process itself or develop later in life once deployed in the field due to aging effects [6].

Resiliency in disaggregated heterogenous products is even more important than in monolithic IC designs. Assembling defective die into a package with good die will often require scrapping the entire package assembly. Using chiplets that can be individually, comprehensively, and efficiently tested prior to assembly together with a chiplet interconnect and packaging integration approach that enables efficient and comprehensive post-assembly testing can help mitigate this disadvantage. Manufacturing Test resiliency involves using different design and methodology techniques that can more easily detect failures, and quite often correct for them when they do occur. Intentionally designing for product resiliency is the key to chiplet manufacturing success.

In this paper, we will look at chiplet resiliency along several Manufacturing Test perspectives. Section II will look at resiliency in the context of Manufacturing Test in IC and packaging design. Section III will look at resiliency and how it enables component debug. Lastly, Section IV will discuss resiliency as it relates to a quality and reliability perspective.

### II. CHIPLET TEST RESILIENCY IN IC AND PACKAGING DESIGN

In some ways, heterogenous integrated packaging designs made from chiplets is similar to populating a motherboard out of a heterogenous collection of components, but at a much tighter level of integration. The level of integration, however, poses a series of challenges unique to chiplets.

First, chiplets are challenged by functional interoperability requirements between chiplets that demand a certain set of "rules-of-the-road" be adopted for them to operate as a harmonious system on package. Not only do they need compatible protocols, a package/interposer ball-pattern with high-speed IO definition and constraints, but they also need to have interoperable Test methodologies. This is where a properly constrained and widespread adoption of a standard is foundational to product success. Adoption of a standard such as the Universal Chiplet Interconnect Express (UCIe) [7], is a critical lowest-common-denominator criterion for a chipletbased design's success. The chosen standard must contain just enough Manufacturing Test design constraints to ensure unambiguous interpretation by all partners that use it so that a misinterpretation won't lead to Manufacturing Test's failure to execute. For example, when specifying in a standard for the support of pattern generation and checking Design for Test (DFT) for Manufacturing Test, it is insufficient to say that one of several pseudo-random test patterns may be used. In such a case, it is very likely that different chiplet suppliers may choose a different test pattern among them, thus causing the test methodology to fail when chiplets designed by different suppliers are brought together. Instead, it is important to specify one or more specific minimum-required test patterns that all chiplet suppliers much adhere to ensure the test methodology is successful. More generically, a robust minimum Testability requirements list is critical for a manufacturable chiplet-based design. Design IP providers such as Synopsys [8] and Cadence [9] are now providing full UCIe turnkey solutions which address many of these challenges, but there is still work to be done to guarantee full manufacturing testability interoperability.

Secondly, chiplet-based designs contain 10's to many 100's of thousands of parallel IO connections. Defect densities and process skew deltas between chiplets cause an ever-increasing probability that an interconnect lane will either contain a defect or will not be sufficiently performant to the chosen interconnect specification for the given die pairing. Unlike PCB technologies, the small highly integrated nature of chiplets make repair or replacement an impossible task, resulting in very difficult economics. As a result, the chiplet interfaces must contain a standardized DFT for defect isolation, lane redundancy to provide back-up options, a DFT scheme for in-line repair and mission-mode protocol repair awareness so that the repair change doesn't impact behavior on either side of the chiplet IO link. For example, the UCIe standard establishes clear requirements for redundancy, data and clock lane repair, lane reversal rules which are all designed to recover from faulty lanes in a manufacturingcompatible way as shown in Fig 2. Both transmitter and receiver need to implement this process in the same way for it to function correctly. [10]



Fig. 2: Single Lane Redundancy and Repair in UCIe [10]

# III. CHIPLET RESILIENCY IN COMPONENT DEBUG

A resilient design for component debug is one that allows for the visibility, access, and control of internal silicon logic while in the presence of multiple unknown defects and performance deficiencies. Even though a chiplet may be designed to be debuggable by including proper test modes and test access mechanisms, it still may not be resilient in the presence of real-world defects that are typical in a multichiplet package. A chiplet-based design will contain multiple silicon elements which may come from multiple design houses, fab processes and may be at various levels of manufacturing maturity. At initial power-on, it is critical to be able to compartmentalize the design as each of these elements may not be well-behaved or fully characterized.

The first best-practice is one of isolation. To the maximal extent possible, it is important to ensure for Debug that each chiplet does not have any complex dependencies between them. This means that each chiplet should have test modes that are not be dependent on local PLLs which may be faulty, firmware which may be buggy, complex regulator power sequencing which may have timing or trimming problems, and test ports that are daisy-chained between multiple chiplets causing dependent points of failure deep in your package assembly. If any of these non-isolating elements exist in your chiplet-based assembly, then you run the risk of a very fragile package assembly during Debug that may significantly lengthen the time and cost for debug by weeks or months. For critical signals or nets for test enabling, these should be routed out to die bumps that make their way directly to package bumps/pads for direct automated test equipment (ATE) access.

The second debug resiliency best-practice is proper internal observability DFT for debugging a chiplet-based assembly. Clock distribution and reset sequencing can be debugged with a PLL lock test DFT with internal state registers that be read using the test port. Analog monitors can be put on any internal regulators to allow for correct voltage verification and trimming correction if needed.

The third principle for resilient debug is to ensure the ability to do fault isolation across multiple chiplet-to-chiplet boundaries. Standardized interfaces such as UCIe need to contain test features such as IEEE 1149.1 EXTEST instructions that are captured both at the pad and via near-end loopback where the transmitter has a local functional or test-only receiver to measure the data that is being sent. If both die in a chiplet pairing contain such DFT, then isolating the interconnect defect/fault is possible [11, 12].

# IV. CHIPLET TEST RESILIENCY FOR QUALITY AND RELIABILITY

Quality and reliability resiliency is the ability of a heterogeneously integrated product to both guarantee a low defect rate being sent a customer as well as being able to ensure that various transistor and packaging aging affects don't compromise quality or performance once deployed at the customer.

Ensuring quality and reliability of complex chiplet-based designs can be a daunting task. Like all manufacturing processes, in a chiplet-based assembly process it is critical to understand the reliability and aging characteristics of the assembly and its impact to the embedded chiplet-to-chiplet interconnects [13]. Incorporating in-line sensors which can measure the IC over process, voltage and temperature over time allows us to monitor performance and degradation at many internal points and potentially react to this data in novel ways. Sensors like these can be used to detect process shift, process variation, local voltage droop/sag, and local temperature. As an example, Synopsys and Siemens each provide an example of such turnkey design IP [14, 15].

As another example, proteanTecs is an IP and analytics services provider that creates on-silicon telemetry DFT that monitors chiplet-to-chiplet interfaces during real-time mission-mode execution [16]. Siemens makes a related product called Tessent MissionMode for real-time analytics [17]. Sensors like these can not only be used to collect data to profile interfaces and help ID performance bottlenecks, but they can also be used to monitor performance degradation over time. One may imagine that such sensors could be used to monitor for aging effects on a chiplet-to-chiplet interconnect and perform in-line repair of such an interface in the manufacturing flow if a performance degradation threshold is exceeded or defect signature is detected. These same features could also be used for in-situ repairs in the field if so desired such as can be done for soft memory repair [18].

#### V. SUMMARY

Chiplet-based heterogenous integrated products are an emerging market that is starting to accelerate as cost and other barriers to customer entry continue to come down. It is imperative that Test and Debug resiliency techniques and features evolve with interface standards so that these complex products are manufacturable and are delivered on-time and at a desirable cost.

## VI. ACKNOWLEDGMENT

The author would like to thank Intel's Rob Munoz and Debendra das Sharma for their feedback and guidance. The author also would like to thank protanTec's Nir Sever and Siemens' Joe Reynick for inspiration and feedback on sensor DFT technologies and their usage.

### VII. REFERENCES

- A. B. Kelleher, "Celebrating 75 years of the transistor: A look at the evolution of Moore's Law innovation", International Electron Devices Meeting, IEEE, 2022.
- [2] <u>Gordon E. Moore, "Cramming More Components onto Integrated Circuits", Electronics, Volume 38, Number 8, April 19, 1965, pp. 114-117.</u>
- [3] <u>Heterogeneous Integration Roadmap [2021] Chapter 11 MEMS and</u> <u>Sensor Integration</u>
- [4] <u>https://en.wikipedia.org/wiki/Chiplet</u>
- [5] <u>https://www.merriam-webster.com/dictionary/resiliency</u>
- [6] Halak, Basel, "Ageing of Integrated Circuits Causes, Effects and Mitigation Techniques: Causes, Effects and Mitigation Techniques", Springer, 1st Ed. 2020
- [7] Debendra Das Sharma, Gerald Pasdast, Zhiguo Qian, and Kemal Aygun, "Universal Chiplet Interconnect Express (UCIe): An Open Industry Standard for Innovations With Chiplets at Package Level", IEEE Transactions on Components, Packaging, and Manufacturing Technology", September 2022, pp. 1423-1431.
- [8] <u>https://www.synopsys.com/designware-ip/technical-bulletin/ucie-multi-die-soc</u>
- [9] <u>https://www.cadence.com/en\_US/home/tools/ip/design-ip/chiplet-and-d2d-connectivity/ucie-phy-and-controller.html</u>
- [10] https://www.uciexpress.org/specification
- Letchumanan Devanraj, ITC 2020 3D Chiplet Test, "Who's At Fault? A Creative Way To Isolate and Debug Internal IO"
- [12] <u>https://cmte.ieee.org/eps-test/wpcontent/uploads/sites/132/2022/01/IEEE\_EPS\_Test\_Het\_Int\_Product</u> \_\_Testability\_BKM\_Final\_v1\_0-1-14-22-1.pdf
- [13] https://en.wikipedia.org/wiki/High-temperature\_operating\_life
- [14] https://www.synopsys.com/glossary/what-are-pvt-sensors.html
- [15] https://www.tessentembeddedanalytics.com/
- [16] <u>https://www.proteantecs.com/resources/a-novel-approach-to-in-field-in-mission-reliability-monitoring-based-on-deep-data-publication</u>
- [17] https://eda.sw.siemens.com/en-US/ic/tessent/test/missionmode/
- [18] https://eda.sw.siemens.com/en-US/ic/tessent/test/memorybist/