Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE) PDF Download

Safety and Reliability 

In traditional systems, safety and reliability are normally considered to be independent issues. It is therefore possible to identify a traditional system that is safe and unreliable and systems that are reliable but unsafe. Consider the following two examples. Word-processing software may not be very reliable but is safe. A failure of the software does not usually cause any significant damage or financial loss. It is therefore an example of an unreliable but safe system. On the other  hand,  a  hand  gun  can  be  unsafe  but  is reliable. A hand gun rarely fails. A hand gun is an unsafe system because if it fails for some reason, it can misfire or even explode and cause significant damage. It is an example of an unsafe but reliable system. These two examples show that for traditional systems, safety and reliability are independent concerns - it is therefore possible to increase the safety of a system without affecting its reliability and vice versa. 

In real-time systems on the other hand, safety and reliability are coupled together. Before analyzing why safety and reliability are no longer independent issues in real-time systems, we need  to first  understand what exactly is meant by a fail-safe state. 

A fail-safe state of a system is one which if entered when the system fails, no damage would result.

To give an example, the fail-safe state of a word processing program is one where the document being processed has been saved onto the disk.  All traditional non real-time systems do have one or more fail-safe states which help separate the issues of safety and reliability - even if a system is known to be unreliable, it can always be made to fail in a fail-safe state, and consequently it would still be considered to be a safe system. 

If no damage can result if a system enters a fail-safe state just before it fails, then through careful transit to a fail-safe state upon a failure, it is possible to turn an extremely unreliable and unsafe system into a safe system. In many traditional systems this technique is in fact frequently adopted to turn an unreliable system into a safe system. For example, consider a traffic light controller that controls the flow of traffic at a road intersection. Suppose the traffic light controller fails frequently and is known to be highly unreliable. Though unreliable, it can still be considered safe if whenever a traffic light controller fails, it enters a fail-safe state where all the traffic lights are orange and blinking. This is a fail-safe state, since the motorists on seeing blinking orange traffic light become aware that the traffic light controller is not working and proceed with caution. Of course, a fail-safe state may not be to make all lights green, in which case severe accidents could occur. Similarly, all lights turned red is also not a fail-safe state - it may not cause accidents, but would bring all traffic to a stand still leading to traffic jams. However, in many real-time systems there are no fail-safe states. Therefore, any failure of the system can cause severe damages. Such systems are said to be safety-critical systems. 

A safety-critical system is one whose failure can cause severe damages.

An example of a safety-critical system is a navigation system on-board an aircraft. An onboard navigation system has no fail-safe states. When the computer on-board an aircraft fails, a fail-safe state may not be one where the engine is switched-off!   In a safety-critical system, the absence of fail-safe states implies that safety can only be ensured through increased reliability.  Thus, for safety-critical systems the issues of safety and reliability become interrelated - safety can only be ensured through increased reliability. It should now be clear why safety-critical systems need to be highly reliable. 

Just to give an example of the level of reliability required of safety-critical systems, consider the following. For any fly-by-wire aircraft, most of its vital parts are controlled by a computer. Any failure of the controlling computer is clearly not acceptable. The standard reliability requirement for such aircrafts is at most 1 failure per 109 flying hours (that is, a million years of continuous flying!). We examine how a highly reliable system can be developed in the next section. 

How to Achieve High Reliability?  

If you are asked by your organization to develop software which should be highly reliable, how would you proceed to achieve it?  Highly reliable software can be developed by adopting all of the following three important techniques: 

  • Error Avoidance: For achieving high reliability, every possibility of occurrence of errors should be minimized during product development as much as possible. This can be achieved by adopting a variety of means:  using well-founded software engineering practices, using sound design methodologies, adopting suitable CASE tools, and so on. 
  • Error Detection and Removal: In spite of using the best available error avoidance techniques, many errors still manage to creep into the code.  These errors need to be detected and removed. This can be achieved to a large extent by conducting thorough reviews and testing.  Once errors are detected, they can be easily fixed. 
  • Fault-Tolerance: No matter how meticulously error avoidance and error detection techniques are used, it is virtually impossible to make a practical software system entirely error-free. Few errors still persist even after carrying out thorough reviews and testing. Errors cause failures. That is, failures are manifestation of the errors latent in the system. Therefore to achieve high reliability, even in situations where errors are present, the system should be able to tolerate the faults and compute the correct results. This is called fault-tolerance. Fault-tolerance can be achieved by carefully incorporating redundancy. 

 

Legend: C1, C2, C3: Redundant copies of the same component

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

Fig. 28.11 Schematic Representation of TMR 

It is relatively simple to design a hardware equipment to be fault-tolerant. The following are two methods that are popularly used to achieve hardware fault-tolerance: 

  • Error Detection and Removal:  In spite of using the best available error avoidance techniques, many errors still manage to creep into the code.  These errors need to be detected and removed.  This can be achieved to a large extent by conducting thorough reviews and testing.  Once errors are detected, they can be easily fixed.
  • Built  In  Self  Test  (BIST):  In  BIST,  the  system  periodically  performs  self  tests  of  its  components.  Upon detection of a failure, the system automatically reconfigures itself by switching out the faulty component and switching in one of the redundant good components.
  • Triple Modular Redundancy (TMR): In TMR, as the name suggests, three redundant copies of all critical components are made to run concurrently (see Fig. 28.11). Observe that in Fig. 28.11, C1, C2, and C3 are the redundant copies of the same critical component. The system performs voting of the results produced by the redundant components to select the majority result.  TMR can help tolerate occurrence of only a single failure at any time. (Can you answer why a TMR scheme can effectively tolerate a single component failure only?). An assumption that is  implicit  in  the  TMR  technique  is  that  at  any  time  only  one  of  the  three  redundant components can produce erroneous results. The majority result after voting would be erroneous if two or more components can fail simultaneously (more precisely, before a repair can be carried out). In situations where two or more components are likely to fail (or produce erroneous results), then greater amounts of redundancies would be required to be incorporated. A little thinking can show that at least 2n+1 redundant components are required to tolerate simultaneous failures of n component. 

As compared to hardware, software fault-tolerance is much harder to achieve. To investigate the reason behind this, let us first discuss the techniques currently being used  to achieve  software  fault-tolerance. We do this in the following subsection.

Software Fault-Tolerance Techniques 

Two methods are now popularly being used to achieve software fault-tolerance: N-version programming and recovery block techniques. These two techniques are simple adaptations of the basic techniques used to provide hardware fault-tolerance. We discuss these two techniques in the following. 

N-Version Programming: This technique is an adaptation of the TMR technique for hardware fault-tolerance. In the N-version programming technique, independent teams develop N different versions (value of N depends on the degree of fault-tolerance required) of a software component (module). The redundant modules are run concurrently (possibly on redundant hardware). The results produced by the different versions of the module are subjected to voting at run time and the result on which majority of the components agree is accepted. The central idea behind this scheme is that independent teams would commit different types of mistakes, which would be eliminated when the results produced by them are subjected to voting. However, this scheme is not very successful in achieving fault-tolerance, and the problem can be attributed to statistical correlation of failures. Statistical correlation of failures means that even though individual teams worked in isolation to develop the different versions of a software component, still the different versions fail for identical reasons. In other words, the different versions of a component show similar failure patterns. This does not mean that the different modules developed by independent programmers, after all, contain identical errors. The reason for this is not far to seek, programmers commit errors in those parts of a problem which they perceive to be difficult - and what is difficult to one team is usually difficult to all teams. So, identical errors remain in the most complex and least understood parts of a software component. 

Recovery Blocks: In the recovery block scheme, the redundant components are called try blocks. Each try block computes the same end result as the others but is intentionally written using a different algorithm compared to the other try blocks. In N-version programming, the different versions of a component are written by different teams of programmers, whereas in recovery block different algorithms are used in different try blocks. Also, in contrast to the Nversion programming approach where the redundant copies are run concurrently, in the recovery block approach they are (as shown in Fig. 28.12) run one after another. The results produced by a try block are subjected to an acceptance test (see Fig. 28.12). If the test fails, then the next try block is tried. This is repeated in a sequence until the result produced by a try block successfully passes the acceptance test. Note that in Fig. 28.12 we have shown acceptance tests separately for different try blocks to help understand that the tests are applied to the try blocks one after the other, though it may be the case that the same test is applied to each try block. 

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

Fig. 28.12 A Software Fault-Tolerance Scheme Using Recovery Blocks 

As was the case with N-version programming, the recovery blocks approach also does not achieve much success in providing effective fault-tolerance. The reason behind this is again statistical correlation of failures. Different try blocks fail for identical reasons as was explained in case of N-version programming approach. Besides, this approach suffers from a further limitation that it can only be used if the task deadlines are much larger than the task computation times (i.e. tasks have large laxity), since the different try blocks are put to execution one after the other when failures occur.  The recovery block approach poses special difficulty when used with real-time tasks with very short slack time (i.e. short deadline and considerable execution time), 

as the try blocks are tried out one after the other deadlines may be missed. Therefore, in such cases the later try-blocks usually contain only skeletal code. 

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

Fig. 28.13 Checkpointing and Rollback Recovery

Of course, it is possible that the later try blocks contain only skeletal code,  produce only approximate results and therefore take much less time for computation than the first try block. 

Checkpointing and Rollback Recovery: Checkpointing and roll-back recovery is another popular technique to achieve fault-tolerance. In this technique as the computation proceeds, the system state is tested each time after some meaningful progress in computation is made. Immediately after a state-check test succeeds, the state of the system is backed up on a stable storage (see Fig. 28.13). In case the next test does not succeed, the system can be made to rollback to the last checkpointed state. After a rollback, from a checkpointed state a fresh computation can be initiated. This technique is especially useful, if there is a chance that the system state may be corrupted as the computation proceeds, such as data corruption or processor failure. 

Types of Real-Time Tasks

We have already seen that a real-time task is one for which quantitative expressions of time are needed to describe its behavior. This quantitative expression of time usually appears in the form of a constraint on the time at which the task produces results. The most frequently occurring timing constraint is a deadline constraint which is used to express that a task is required to compute its results within some deadline. We therefore implicitly assume only deadline type of timing constraints on tasks in this section, though other types of constraints (as explained in Sec.) may occur in practice. Real-time tasks can be classified into the following three broad categories: 

A  real-time  task  can  be  classified  into  either  hard,  soft,  or  firm  real-time  task depending  on  the consequences of a task missing its deadline.

It is not necessary that all tasks of a real-time application belong to the same category. It is possible that different tasks of a real-time system can belong to different categories. We now elaborate these three types of real-time tasks. 

Hard Real-Time Tasks 

A hard real-time task is one that is constrained to produce its results within certain predefined time bounds. The system is considered to have failed whenever any of its hard real-time tasks does not produce its required results before the specified time bound. 

An example of a system having hard real-time tasks is a robot. The robot cyclically carries out a number of activities including communication with the host system, logging all completed activities, sensing the environment to detect any obstacles present, tracking the objects of interest, path planning, effecting next move, etc. Now consider that the robot suddenly encounters an obstacle. The robot must detect it and as soon as possible try to escape colliding with it. If it fails to respond to it quickly (i.e. the concerned tasks are not completed before the required time bound) then it would collide with the obstacle and the robot would be considered to have failed. Therefore detecting obstacles and reacting to it are hard real-time tasks. 

Another application having hard real-time tasks is an anti-missile system.  An anti-missile system consists of the following critical activities (tasks).  An anti-missile system must first detect all incoming missiles, properly position the anti-missile gun, and then fire to destroy the incoming missile before the incoming missile can do any damage. All these tasks are hard realtime in nature and the anti-missile system would be considered to have failed, if any of its tasks fails to complete before the corresponding deadlines. 

Applications having hard real-time tasks are typically safety-critical (Can you think an example of a hard real-time system that is not safety-critical?1) This  means  that  any failure  of  a  real-time  task,  including  its  failure  to meet the associated deadlines, would result in severe consequences. This makes hard real-time tasks extremely critical. Criticality of a task can range from extremely critical to not so critical. Task criticality therefore is a different dimension than hard or soft characterization of a task.  Criticality of a task is a measure of the cost of a failure - the higher the cost of failure, the more critical is the task.

For hard real-time tasks in practical systems, the time bounds usually range from several micro seconds to a few milli seconds. It may be noted that a hard real-time task does not need to be completed within the shortest time possible, but it is merely required that the task must complete within the specified time bound. In other words, there is no reward in completing a hard real-time task much ahead of its deadline. This is an important observation and this would take a central part in our discussions on task scheduling in the next two chapters. 

Firm Real-Time Tasks 

Every firm real-time task is associated with some predefined deadline before which it is required to produce its results. However, unlike a hard real-time task, even when a firm real-time task does not complete within its deadline, the system does not fail. The late results are merely discarded. In other words, the utility of the results computed by a firm real-time task becomes zero after the deadline. Fig. 28.14 schematically shows the utility of the results produced by a firm real-time task as a function of time. In Fig. 28.14 it can be seen that if the response time of a task exceeds the specified deadline, then the utility of the results becomes zero and the results are discarded. 

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

Fig. 28.14 Utility of Result of a Firm Real-Time Task with Time 

Firm real-time tasks typically abound in multimedia applications. The following are two examples of firm real- time tasks: 

  • Video conferencing: In  a  video  conferencing  application,  video  frames  and  the  accompanying  audio  are converted into packets and transmitted to the receiver over a network. However, some frames may get delayed at different nodes during transit on a packet-switched network due to congestion at different nodes. This may result in varying queuing delays experienced by packets traveling along different routes. Even when packets traverse the same route, some packets can take much more time than the other packets due to the specific transmission strategy used at the nodes. When a certain frame is being played, if some preceding frame arrives at the receiver, then this frame is of no use and is discarded. Due to this reason, when a frame is delayed by more than say one second, it is simply discarded at the receiver-end without carrying out any processing on it.
  • Satellite-based tracking of enemy movements: Consider  a  satellite  that  takes  pictures  of  an  enemy territory  and  beams it to a  ground  station  computer  frame  by  frame. The ground computer processes each frame to find the positional difference of different objects of interest with respect to their position in the previous frame to determine the movements of the enemy.  When the ground computer is overloaded, a new image may be received even before an older image is taken up for processing. In this case, the older image is of not much use. Hence the older images may be discarded and the recently received image could be processed. 

For firm real-time tasks, the associated time bounds typically range from a few milli seconds to several hundreds of milli seconds. 

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

Fig. 28.15 Utility of the Results Produced by a Soft Real-Time Task as a Function of Time 

Soft Real-Time Tasks 

Soft real-time tasks also have time bounds associated with them.  However, unlike hard and firm real-time tasks, the timing constraints on soft real-time tasks are not expressed as absolute values.  Instead, the constraints are expressed either in terms of the average response times required. 

An example of a soft real-time task is web browsing.  Normally, after an URL (Uniform Resource Locater) is clicked, the corresponding web page is fetched and displayed within a couple of seconds on the average. However, when  it  takes  several  minutes  to display  a  requested page, we still do not consider the system to have failed, but merely express that the performance of the system has degraded. 

Another example of a soft real-time task is a task handling a request for a seat reservation in a railway reservation application. Once a request for reservation is made, the response should occur within 20 seconds on the average. The response may either be in the form of a printed ticket or an apology message on account of unavailability of seats.  Alternatively,  we  might  state  the  constraint  on  the  ticketing  task  as:  At  least  in  case  of  95%  of  reservation requests, the ticket should be processed and printed in less than 20 seconds. 

Let us now analyze the impact of the failure of a soft real-time task to meet its deadline, by taking the example of the railway reservation task. If the ticket is printed in about 20 seconds, we feel that the system is working fine and get a feel of having obtained instant results. As already stated, missed deadlines of soft real-time tasks do not result in system failures.  However, the utility of the results produced by a soft real-time task falls continuously with time after the expiry of the deadline as shown in Fig. 28.15. In Fig. 28.15, the utility of the results produced are 100% if produced before the deadline, and after the deadline is passed the utility of the results slowly falls off with time. For soft real-time tasks that typically occur in practical applications, the time bounds usually range from a fraction of a second to a few seconds. 

Non-Real-Time Tasks 

A non-real-time task is not associated with any time bounds. Can you think of any example of a non-real-time task?  Most of the interactive computations you perform nowadays are handled by soft real-time tasks. However, about two or three decades back, when computers were not interactive almost all tasks were non-real-time. A few examples of non-real-time tasks are:  batch processing jobs, e-mail, and back ground tasks such as event loggers.  You may however argue that even these tasks, in the strict sense of the term, do have certain time bounds.  For example, an e-mail is expected to reach its destination at least within a couple of hours of being sent. Similar is the case with a batch processing job such as pay-slip printing.  What then really is the difference between a non-real-time task and a soft real-time task?  For non-real-time tasks, the associated time bounds are typically of the order of a few minutes, hours or even days.  In contrast, the time bounds associated with soft real-time tasks are at most of the order of a few seconds. 

The document Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE) is a part of the Computer Science Engineering (CSE) Course Embedded Systems (Web).
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)
47 videos|69 docs|65 tests

Top Courses for Computer Science Engineering (CSE)

FAQs on Introduction to Real Time Systems - 3 - Embedded Systems (Web) - Computer Science Engineering (CSE)

1. What is a real-time system?
Ans. A real-time system is a type of computer system that is designed to respond to events or inputs within a specified time constraint. It is characterized by its ability to provide a predictable and timely response, ensuring that critical tasks are completed within their deadlines.
2. What are the main components of a real-time system?
Ans. The main components of a real-time system include the hardware, software, and the real-time operating system (RTOS). The hardware consists of the physical components such as processors, memory, input/output devices, and sensors. The software comprises the programs and algorithms that control the system's behavior. The RTOS manages the scheduling and execution of tasks to ensure timely responses.
3. What are hard real-time systems?
Ans. Hard real-time systems are those that have strict timing constraints and must meet their deadlines without any compromise. In these systems, missing a deadline can lead to catastrophic consequences, such as system failure or endangering human lives. Examples of hard real-time systems include aircraft control systems, medical devices, and nuclear power plant controls.
4. What are soft real-time systems?
Ans. Soft real-time systems are those that have timing constraints, but missing a deadline does not necessarily lead to catastrophic consequences. These systems can tolerate occasional missed deadlines, but they still aim to provide timely responses as much as possible. Examples of soft real-time systems include multimedia streaming, online gaming, and traffic signal control systems.
5. What are the challenges in designing real-time systems?
Ans. Designing real-time systems poses several challenges. Some of the key challenges include ensuring task scheduling and resource allocation to meet timing constraints, managing system overload and prioritizing tasks, minimizing response times, and handling system failures and recovery. Additionally, real-time systems often require specialized hardware and software design techniques to guarantee predictable and timely behavior.
47 videos|69 docs|65 tests
Download as PDF
Explore Courses for Computer Science Engineering (CSE) exam

Top Courses for Computer Science Engineering (CSE)

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

Free

,

Objective type Questions

,

study material

,

MCQs

,

shortcuts and tricks

,

practice quizzes

,

Summary

,

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

,

Extra Questions

,

Previous Year Questions with Solutions

,

ppt

,

past year papers

,

pdf

,

Important questions

,

video lectures

,

Semester Notes

,

mock tests for examination

,

Exam

,

Sample Paper

,

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

,

Introduction to Real Time Systems - 3 | Embedded Systems (Web) - Computer Science Engineering (CSE)

,

Viva Questions

;