Introduction
EMBEDDED SYSTEMS are computers incorporated in consumer products or other devices to perform application-specific functions. The product user is usually not even aware of the existence of these systems. From toys to medical devices, from ovens to automobiles, the range of products incorporating microprocessor-based, software controlled systems has expanded rapidly since the introduction of the microprocessor in 1971. The lure of embedded systems is clear: They promise previously impossible functions that enhance the performance of people or machines. As these systems gain sophistication, manufacturers are using them in increasingly critical applications— products that can result in injury, economic loss, or unacceptable inconvenience when they do not perform as required.
Embedded systems can contain a variety of computing devices, such as microcontrollers, application-specific integrated circuits, and digital signal processors. A key requirement is that these computing devices continuously respond to external events in real time. Makers of embedded systems take many measures to ensure safety and reliability throughout the lifetime of products incorporating the systems. Here, we consider techniques for identifying faults during normal operation of the product—that is, online-testing techniques. We evaluate them on the basis of error coverage, error latency, space redundancy, and time redundancy.
Embedded-system test issues
Cost constraints in consumer products typically translate into stringent constraints on product components. Thus, embedded systems are particularly cost sensitive. In many applications, low production and maintenance costs are as important as performance. Moreover, as people become dependent on computer-based systems, their expectations of these systems’ availability increase dramatically. Nevertheless, most people still expect significant downtime with computer systems—perhaps a few hours per month. People are much less patient with computer downtime in other consumer products, since the items in question did not demonstrate this type of failure before embedded systems were added. Thus, complex consumer products with high availability requirements must be quickly and easily repaired. For this reason, automobile manufacturers, among others, are increasingly providing online detection and diagnosis, capabilities previously found only in very complex and expensive applications such as aerospace systems. Using embedded systems to incorporate functions previously considered exotic in low-cost, everyday products is a growing trend.
Since embedded systems are frequently components of mobile products, they are exposed to vibration and other environmental stresses that can cause them to fail. Embedded systems in automotive applications are exposed to extremely harsh environments, even beyond those experienced by most portable devices. These applications are proliferating rapidly, and their more stringent safety and reliability requirements pose a significant challenge for designers. Critical applications and applications with high availability requirements are the main candidates for online testing.
Embedded systems consist of hardware and software, each usually considered separately in the design process, despite progress in the field of hardware-software co design. A strong synergy exists between hardware and software failure mechanisms and diagnosis, as in other aspects of system performance. System failures often involve defects in both hardware and software. Software does not “break” in the common sense of the term. However, it can perform inappropriately due to faults in the underlying hardware or specification or design flaws in either hardware or software. At the same time, one can exploit the software to test for and respond to the presence of faults in the underlying hardware.
Online software testing aims at detecting design faults (bugs) that avoid detection before the embedded system is incorporated and used in a product. Even with extensive testing and formal verification of the system, some bugs escape detection. Residual bugs in well-tested software typically behave as intermittent faults, becoming apparent only in rare system states. Online software testing relies on two basic methods: acceptance testing and diversity [1]. Acceptance testing checks for the presence or absence of well-defined events or conditions, usually expressed as true-or-false conditions (predicates), related to the correctness or safety of preceding computations. Diversity techniques compare replicated computations, either with minor variations in data (data diversity) or with procedures written by separate, unrelated design teams (design diversity). This chapter focuses on digital hardware testing, including techniques by which hardware tests itself, built-in self-test (BIST).
Nevertheless, we must consider the role of software in detecting, diagnosing, and handling hardware faults. If we can use software to test hardware, why should we add hardware to test hardware? There are two possible answers. First, it may be cheaper or more practical to use hardware for some tasks and software for others. In an embedded system, programs are stored online in hardware-implemented memories such as ROMs (for this reason, embedded software is sometimes called firmware). This program storage space is a finite resource whose cost is measured in exactly the same way as other hardware. A function such as a test is “soft” only in the sense that it can easily be modified or omitted in the final implementation.
The second answer involves the time that elapses between a fault’s occurrence and a problem arising from that fault. For instance, a fault may induce an erroneous system state that can ultimately lead to an accident. If the elapsed time between the fault’s occurrence and the corresponding accident is short, the fault must be detected immediately. Acceptance tests can detect many faults and errors in both software and hardware. However, their exact fault coverage is hard to measure, and even when coverage is complete, acceptance tests may take a long time to detect some faults. BIST typically targets relatively few hardware faults, but it detects them quickly.
These two issues, cost and latency, are the main parameters in deciding whether to use hardware or software for testing and which hardware or software technique to use. This decision requires system-level analysis. We do not consider software methods here. Rather, we emphasize the appropriate use of widely implemented BIST methods for online hardware testing. These methods are components in the hardware-software trade-off.
Online testing
Faults are physical or logical defects in the design or implementation of a digital device. Under certain conditions, they lead to errors—that is, incorrect system states. Errors induce failures, deviations from appropriate system behavior. If the failure can lead to an accident, it is a hazard. Faults can be classified into three groups: design, fabrication, and operational. Design faults are made by human designers or CAD software (simulators, translators, or layout generators) during the design process. Fabrication defects result from an imperfect manufacturing process. For example, shorts and opens are common manufacturing defects in VLSI circuits. Operational faults result from wear or environmental disturbances during normal system operation. Such disturbances include electromagnetic interference, operator mistakes, and extremes of temperature and vibration. Some design defects and manufacturing faults escape detection and combine with wear and environmental disturbances to cause problems in the field. Operational faults are usually classified by their duration:
One generally uses online testing to detect operational faults in computers that support critical or high-availability applications. The goal of online testing is to detect fault effects, or errors, and take appropriate corrective action. For example, in some critical applications, the system shuts down after an error is detected. In other applications, error detection triggers a reconfiguration mechanism that allows the system to continue operating, perhaps with some performance degradation. Online testing can take the form of external or internal monitoring, using either hardware or software. Internal monitoring, also called self-testing, takes place on the same substrate as the circuit under test (CUT). Today, this usually means inside a single IC—a system on a chip. There are four primary parameters to consider in designing an online-testing scheme:
The ideal online-testing scheme would have 100% error coverage, error latency of 1 clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT and impose no functional or structural restrictions on it. Most BIST methods meet some of these constraints without addressing others. Considering all four parameters in the design of an onlinetesting scheme may create conflicting goals. High coverage requires high error latency, space redundancy, and/or time redundancy. Schemes with immediate detection (error latency equaling 1) minimize time redundancy but require more hardware. On the other hand, schemes with delayed detection (error latency greater than 1) reduce time and space redundancy at the expense of increased error latency. Several proposed delayed-detection techniques assume equiprobability of input combinations and try to establish a probabilistic bound on error latency [2]. As a result, certain faults remain undetected for a long time because tests for them rarely appear at the CUT’s inputs.
To cover all the operational fault types described earlier, test engineers use two different modes of online testing: concurrent and non-concurrent. Concurrent testing takes place during normal system operation, and non-concurrent testing takes place while normal operation is temporarily suspended. One must often overlap these test modes to provide a comprehensive online-testing strategy at acceptable cost.
47 videos|69 docs|65 tests
|
|
Explore Courses for Computer Science Engineering (CSE) exam
|