## **Semiconductor Test Thermal Management**

M. Jensen, Intel Corporation IEEE Electronics Packaging Society Newsletter July 2021

Semiconductor test is an exciting field and has over the past decades become increasingly challenging – requiring test engineers of all disciplines to push their limits in search of better, faster, and cheaper solutions.

Most packaged semiconductor products go thru a test flow that include elements of a burn-in process, a pattern-based structural test process, and - finally - a system level test process.



Figure 1 Simplified semiconductor test flow

While semiconductor devices that successfully make it thru the overall test flow are giftwrapped and sent off to the customer/end-users, devices that fail portions of the test flow become the object of detailed investigations to enable continuous improvements in manufacturing and test capabilities.

# Why is Thermal Management a Big Deal?

While the test process touches all engineering disciplines, thermal management is a fundamental challenge that requires the best of engineers in order to battle the limitations and contradictions in fundamental physics in pursuit of optimal performance.

For instance, in a burn in process, test development engineers juggle the trade-offs of how hard to stress semiconductor devices (i.e. voltage), while preventing thermal runaway. The harder a device can be stressed, the shorter a burn-in time – and test  $\cos t$  –  $\cos t$  be.

Looking at the pattern-based test instance, test engineers are challenged with a game of frequency binning. Overcooling a device during this test can result in achieving lower frequencies and thus cuts into the highly desirable profit margins of higher-frequency devices. Undercooling a device during this test can cause device instabilities or even thermal shut down, ultimately leading to test fails and yield losses.

As devices enter a final system-level test process – a test that often mimics end-user conditions more closely – the focus shifts to quality control and DPM (Defects Per Million) screens. Again, all eyes are on the thermal engineers to ensure that the semiconductor device temperature is controlled within a narrow window to prevent both overcooling – potentially resulting in undertesting (quality control compromise) - as well as undercooling – potentially resulting in over-testing (yield loss).

### **The Power Metrics**

While each type of test in the test process relies heavily on thermal management for different reasons, all test steps share a fundamental challenge of thermal power management.

Typically, thermal solutions offer active control capability with an ability to both cool and heat a semiconductor as needed to maintain a temperature set point despite the power fluctuations of the silicon itself. While there are different hardware technologies in the industry, a simplified, conceptual test thermal solution can be seen in



figure 2 (interfacing a bare-die semiconductor device).

#### Figure 2 Simplified test thermal stackup

In order to be able to control the temperature of the semiconductor during test, thermal engineers often look to three key, power-related metrics;

- Total power dissipation
- Spatial power distribution
- Transient power profile

Total power dissipation capability is often used to refer to the heat dissipation capability of a given thermal solution. This power dissipation capability of thermal solutions is often a function of a few main drivers such as the cooling technology utilized as well as the overall die size that needs to be cooled. From a semiconductor packaging standpoint, total power dissipation from a die can vary, but it often a function of core count. As such, large die – such as server products – will often have greater power dissipation demands.

Total power dissipation capability of a thermal solution may be a metric that is somewhat easy to relate to. However, it is not as straightforward as it may sound as there are several factors that can influence the effective total power dissipation capability of a given thermal solution – other than die size.

One factor that can influence the effective power dissipation capability is the spatial power distribution. Spatial power distribution across a die surface becomes critical if there are significant non-uniform power densities across a surface. In such cases, it becomes increasingly difficult for a thermal solution to remove the heat due to the localized power, and DUT (Device Under Test) temperatures can no longer be controlled at the same power levels.

Note that power density can be a problem even if the total die power does not change. In figure 3 are shown three spatial power distributions for a fictitious die surface. The die on the far left has a uniform power distribution with all sub-cells having identical power levels. The middle die has two zones with different power densities - an outer zone with a lower-than-average sub-cell power density and an inner zone with a higher than average sub-cell power density. Again, the combined power of all sub-cells (i.e. total power) is unchanged from the uniform case on the left. Lastly, on the right, is a case with four distinct regions of higher-than average sub-cell power density levels. In this case, the maximum sub-cell power density is far greater than the die of the left or middle. As such, the die on the right, will be more difficult to manage thermally and would likely end up with greater temperature increases during a test instance.



Figure 3 Hypothetical spatial power maps with identical total power.

As such, there may be test scenarios where the layout of the die will create high power densities, making it difficult to control temperature despite a relatively low, total power dissipation.

This phenomenon can also be shown in a simple plot correlating the power dissipation capability for a thermal solution as a function of the power density.



Figure 4 Power dissipation capability vs. spatial power density

Figure 4 shows a typical performance curve for a thermal solution. With the effective power dissipation representing a power at which the max temperature is below a given threshold, as power density approaches uniformity (low value), the power dissipation capability increases. Conversely, as power density increases, the effective total power dissipation capability decreases.

While the factors discussed earlier may have alluded to steady-state thermal environments in which power densities form based on static power maps, and a fixed total power across the die, the reality is that die power is anything but static.

In addition to spatial power distribution, another factor that can influence the effective power dissipation capability of a thermal solution is the transient behavior of the die power during test.

As such, the die power fluctuates based on test content being applied. Sometimes, spatial power distribution can even change as different functional areas of the die power on and off based on test patterns and applications.

The challenge with the transient power behavior becomes apparent when engineers compare the time constant of the thermal solutions with the time constant of the die power fluctuations. This reveals a mismatch that often is 1-2 orders of magnitude! With this, the transient power of the die can change significantly faster than the thermal solution can remove heat thus resulting in reduced ability to control temperature dynamically and ultimately limiting the effective power dissipation capability advertised by a given thermal solution.

To look at the impact of transient power capability in a graphical form, figure 5 shows a bode plot of a typical thermal solution capability (active control) in a frequency domain.

At very high-power fluctuation frequencies, die power pulses are so fast that there is little heat being generated in the die and the effective thermal dissipation capability is good.



Figure 5 Typical Thermal Solution control capability vs power frequency

At reducing frequencies – still faster than that of the thermal solution system - the ability to control temperature is worsened as the thermal solution simply cannot keep up, yet the power pulses are long enough to build heat in the die. As a result, the temperature will rise as shown by the dT/dP curve approaching a maximum.

As the frequency reduces further and enters the region at which the thermal solution *can* keep up, and the control capability improves as indicated by the drop in curve and an improvement in the resulting temperature rise.

At even lower temperatures – well within the dynamic capability of the thermal solution, the control capability is further improved as the active control system is now much faster than the changes in power.

With this, it is apparent that the effective power dissipation capability can drop significantly if the die power pulse frequency is faster than what the thermal solution time constant can keep up with - it is simply a hardware technology limitation.

### Thermal mass – Too much or too little?

On design trade-off that engineer battle with thermal solutions is the desire to have - and the desire to NOT have - thermal mass.

Reduced thermal mass typically allows for a faster response time (lower time constant), which effective shifts the performance curve shown to the right as indicated on figure 6 (maximum point will occur at higher frequency). However, with reduced thermal mass also comes a greater sensitivity to die power at the higher frequencies as there is less mass to absorb the (short) bursts in power – resulting in a negative impact to control capability.



Figure 6 Impact of thermal mass to control performance

With this, the challenge to the thermal industry continues; How can a thermal solution have fast response time capabilities and yet still have thermal mass to absorb short busts in power?

#### Summary

While thermal control during test may seem like a small problem area to many, it can have a strikingly large impact to semiconductor manufacturing company profits if not managed properly as test cost, test yields, and binning performance can take a sizeable hit.

And while test thermal solution vendors may advertise specific power dissipations capabilities, it is important to note that the effective capability is a function of many things that are not always within the control of the thermal vendors – including the die area, spatial die power distribution, and test content dynamics.